Training Iterations & Configurations
This document summarizes the different training configurations found in the configs/ directory. The primary models experimented with are DinoV2 and SigLIP.
DinoV2 Experiments
Base Model: facebook/dinov2-with-registers-base-imagenet1k-1-layer
| Config File | Training Dataset | Unfrozen Layers | Augmentations | Notes |
|---|---|---|---|---|
dino_linear.yaml | _dataset_train.json (3k) | [] (None, linear probe) | Basic (hflip) | Initial baseline with frozen backbone. |
dino_linear_aug.yaml | _dataset_train.json (3k) | [] (None, linear probe) | Full list* | Tests impact of heavy augmentation. |
dino_hft.yaml | _dataset_train.json (3k) | [11] | Basic (hflip) | Fine-tunes the last transformer block. |
dino_hft_aug.yaml | _dataset_train.json (3k) | [11] | Full list* | Fine-tuning with heavy augmentation. |
dino_hft_6k.yaml | _dataset_train_5k.json | [11] | Basic (hflip) | Scales up dataset size. |
dino_hft_6k_aug.yaml | _dataset_train_5k.json | [11] | Full list* | Scales up dataset with heavy augmentation. |
dino_hft_6k_crop.yaml | _dataset_train_5k.json | [11] | Basic (hflip) | Uses return_cropped: true in dataloader. |
dino_hft_12k.yaml | _dataset_train_12k.json | [11] | Basic (hflip) | Further scales up dataset size. |
* Full list of augmentations typically includes: resize_crop, rotation, hflip, jitter, blur, bw.
SigLIP Experiments
Base Model: google/siglip2-base-patch16-224
| Config File | Training Dataset | Unfrozen Layers | Augmentations | Notes |
|---|---|---|---|---|
siglip_linear.yaml | _dataset_train.json (3k) | [] (None, linear probe) | Basic | SigLIP baseline with frozen backbone. |
siglip_head.yaml | _dataset_train.json (3k) | ['head'] | Basic | Fine-tunes only the final classification head. |
siglip_head_aug.yaml | _dataset_train.json (3k) | ['head'] | Full list* | Fine-tuning head with heavy augmentation. |
siglip_head_6k.yaml | _dataset_train_5k.json | ['head'] | Basic (hflip) | Scales up dataset size for SigLIP. |
Other Models / Evaluation Configs
aes_anatomy.yaml: Uses the pre-trainedincantor/aes-pixai-1.2-anatomy-large-xgbmodel, likely for zero-shot evaluation, not for training within this repo.dino_hft_infer.yaml: A configuration file specifically designed for running inference (c_infer.py) on new, unlabeled datasets to perform auto-labeling.