Comparative study of deep generative models for fine-grained bird image synthesis using the CUB-200-2011 dataset (11,788 images, 200 species).
| # | Model | Type | Resolution |
|---|---|---|---|
| 1 | DCGAN | Unconditional GAN | 64×64 |
| 2 | cDCGAN | Class-conditional GAN | 64×64 |
| 3 | Stable Diffusion v1.5 + LoRA | Text-guided diffusion | 256×256 |
CUB-200-2011 - Caltech-UCSD Birds-200-2011
11,788 images across 200 fine-grained bird species.
Standard Deep Convolutional GAN without class conditioning. Uses soft label smoothing and evaluates FID / IS every 10 epochs over 150 epochs of training.
Architecture:
- Generator: 5-layer transposed convolution stack, input noise dim = 100, output 64×64×3
- Discriminator: 5-layer convolution stack with LeakyReLU
Class-conditional extension. The Generator concatenates a 100-dim noise vector with a learned 200-dim class embedding. The Discriminator projects the class label as an extra spatial channel (1×64×64) concatenated to the image input.
Fine-tunes Stable Diffusion v1.5 on CUB-200-2011 using LoRA adapters (rank 4) applied to UNet cross-attention layers (to_q, to_k, to_v, to_out). Training uses HuggingFace PEFT + Accelerate with FP16 mixed precision over 20 epochs.
Metrics are computed using torchmetrics:
| Metric | Description |
|---|---|
| FID | Fréchet Inception Distance - measures distributional similarity between real and generated images (lower is better) |
| IS | Inception Score - measures quality and diversity of generated images (higher is better) |
For statistical analysis, each model is evaluated over 20 independent runs (500–1000 images per run). Results are saved to CSV for downstream hypothesis testing.
pip install torch torchvision torchmetrics
pip install diffusers transformers peft accelerate
pip install pytorch-fid torch-fidelity torchvizOpen main.ipynb in Google Colab (GPU recommended). The notebook is organized into the following sections:
- Setup - Mount Google Drive, extract dataset, install dependencies
- DCGAN - Unconditional model training
- cDCGAN - Conditional model training
- Stable Diffusion + LoRA - Fine-tuning, inference, and epoch-by-epoch evaluation
- Statistical Evaluation - 20-run FID & IS measurement for all models
Trained model checkpoints and metric logs are saved during training. CSV output files:
| File | Contents |
|---|---|
metrics_log.csv |
Per-epoch G/D loss, FID, IS, duration (GAN models) |
lora-sd15-cub200/training_metrics_SD.csv |
Per-epoch loss, FID, IS (Stable Diffusion) |
fid_and_is_scores_dcgan.csv |
20-run FID & IS for DCGAN |
fid_and_is_scores_cdcgan.csv |
20-run FID & IS for cDCGAN |
fid_and_is_scores_stablediffusion.csv |
20-run FID & IS for SD + LoRA |
Feel free to contact me. The thesis is written in Polish.