Name	Name	Last commit message	Last commit date
parent directory ..
configs	configs
data	data
htmls	htmls
imgs	imgs
stargan-v2	stargan-v2
tboard_logs	tboard_logs
Assignment4.ipynb	Assignment4.ipynb
README.md	README.md
Session4.ipynb	Session4.ipynb
ccvae.py	ccvae.py
cvae.py	cvae.py
download.sh	download.sh
output.gif	output.gif
trainer.py	trainer.py
utils.py	utils.py

Assignment 4: Variational Autoencoders (VAEs)

📋 Overview

This assignment focuses on implementing and training Variational Autoencoders (VAEs) for image generation and reconstruction. The project implements both standard Convolutional VAEs (CVAE) and Conditional Convolutional VAEs (CCVAE) on the AFHQ (Animal Faces-High Quality) dataset, with experiments exploring the effect of KL divergence weighting on model performance.

🎯 Objectives

Implement Convolutional Variational Autoencoder (CVAE) from scratch
Implement Conditional Convolutional Variational Autoencoder (CCVAE) with class conditioning
Understand the reparameterization trick and ELBO (Evidence Lower BOund) optimization
Experiment with different KL divergence weights (λ_KLD) to balance reconstruction and regularization
Visualize latent space representations and generate novel images
Analyze the trade-off between reconstruction quality and latent space regularization

📊 Dataset

AFHQ (Animal Faces-High Quality) - High-quality animal face dataset

Training samples: ~15,000 images
Test samples: ~1,500 images
Image size: 64×64×3 (RGB)
Classes: 3 categories (cat, dog, wildlife)
Download: Automatically downloaded via download.sh script

The dataset is organized in ImageFolder format with train/test splits.

🏗️ Models Implemented

1. Convolutional VAE (CVAE)

A standard Variational Autoencoder with convolutional encoder-decoder architecture:

Encoder:

Input: (3, 64, 64)
Conv layers: 3→16→32→64→128 channels
Output: 2048-dimensional flattened features
Fully connected layers: 2048 → μ, log(σ²) (latent_dim dimensions)

Latent Space:

Reparameterization trick: z = μ + ε·σ, where ε ~ N(0,1)
Default latent dimension: 64 (configurable)

Decoder:

Input projection: latent_dim → 2048
Reshape: 2048 → (128, 4, 4)
Deconv layers: 128→64→32→16→3 channels
Output: (3, 64, 64) with Sigmoid activation

Key Features:

Batch normalization after each conv/deconv layer
LeakyReLU(0.2) activations
Reparameterization trick for differentiable sampling
MSE reconstruction loss + KL divergence regularization

2. Conditional Convolutional VAE (CCVAE)

An extension of CVAE that conditions both encoder and decoder on class labels:

Architecture:

Same encoder-decoder structure as CVAE
Class conditioning: One-hot encoded class labels concatenated with:
- Encoder output (before μ, log(σ²) computation)
- Latent vector (before decoder input projection)
Supports controlled generation by specifying class labels

Key Features:

Class-conditional encoding and decoding
Same architecture as CVAE with additional class embeddings
Enables class-specific image generation
Useful for controlled generation tasks

🔬 Experiments

The project includes multiple experiments exploring different KL divergence weights:

Experiment	Model	λ_KLD	Latent Dim	Description
CVAE1	CVAE	0.001	64	Baseline with moderate KL weight
CVAE2	CVAE	0.0	64	No KL regularization (pure autoencoder)
CVAE3	CVAE	0.01	64	Higher KL weight (stronger regularization)
CVAE4	CVAE	0.0001	64	Lower KL weight (weaker regularization)
CVAE_new1	CVAE	0.0001	64	Variant with different architecture
CCVAE1	CCVAE	0.0001	64	Conditional VAE with class labels

Training Configuration

Default training parameters:

Optimizer: AdamW
Learning rate: 0.001 (configurable)
Batch size: 64
Epochs: 50
Weight decay: 1e-4
Scheduler: ReduceLROnPlateau (patience=7, factor=0.5)
Loss function: MSE reconstruction + λ_KLD × KL divergence

Loss Function

The VAE loss combines reconstruction and regularization terms:

Loss = MSE(reconstruction, target) + λ_KLD × KL(q(z|x) || p(z))

Where:

Reconstruction loss: Mean Squared Error between input and reconstructed images
KL divergence: Regularization term encouraging latent distribution to match prior N(0,I)
λ_KLD: Weight controlling the trade-off between reconstruction quality and latent space regularization

🛠️ Key Features

Training Infrastructure

TensorBoard logging: Training/validation loss, reconstruction loss, KL divergence, learning rate
Image visualization: Automatic saving of reconstruction comparisons every N epochs
Model checkpointing: Saves model states with training statistics
Progress tracking: Real-time training progress with tqdm progress bars
Config management: YAML-based configuration files for experiment reproducibility

Visualization Tools

Reconstruction comparison: Side-by-side original vs reconstructed images
Latent space visualization: PCA projection of latent representations colored by class
Image generation: Sample from latent space to generate novel images
Latent space traversal: Visualize how changes in latent dimensions affect generated images

Utility Functions

denormalize_images(): Convert images from [-1, 1] to [0, 1] range
vae_loss_function(): Combined reconstruction and KL divergence loss
train_model(): Complete training loop with validation
eval_model(): Model evaluation with image saving
vis_latent(): Visualize latent space using PCA
inference(): Generate images from random latent vectors
save_model() / load_model(): Model checkpoint management

📁 Project Structure

Assignment4/
├── Assignment4.ipynb          # Main assignment notebook
├── Session4.ipynb             # Lab session materials
├── cvae.py                    # CVAE model implementation
├── ccvae.py                   # CCVAE model implementation
├── trainer.py                 # Training script
├── utils.py                   # Utility functions (training, evaluation, visualization)
├── download.sh                # Dataset download script
├── configs/                   # Experiment configurations
│   ├── CVAE1_KLD_0.001/
│   ├── CVAE2_KLD_0.0/
│   ├── CVAE3_KLD_0.01/
│   ├── CVAE4_KLD_0.0001/
│   ├── CVAE_new1_KLD_0.0001/
│   └── CCVAE1_KLD_0.0001/
├── data/
│   └── AFHQ/                  # AFHQ dataset (downloaded)
│       ├── train/
│       └── test/
├── models/                    # Saved model checkpoints
├── imgs/                      # Generated images and visualizations
│   ├── CVAE1/                 # Experiment outputs
│   ├── CVAE2/
│   ├── inference/             # Generated samples
│   └── ...
├── tboard_logs/               # TensorBoard log files
│   ├── CVAE1_KLD_0.001/
│   └── ...
└── htmls/                     # HTML exports of notebooks

📈 Analysis & Results

KL Divergence Weight Analysis

The λ_KLD parameter controls the trade-off between:

Reconstruction quality: Lower λ_KLD → better reconstruction, but less regularized latent space
Latent space structure: Higher λ_KLD → more structured latent space, but potentially worse reconstruction

Key Findings:

λ_KLD = 0.0: Acts as a pure autoencoder, excellent reconstruction but unstructured latent space
λ_KLD = 0.0001: Weak regularization, good reconstruction with some latent structure
λ_KLD = 0.001: Balanced trade-off (default)
λ_KLD = 0.01: Strong regularization, well-structured latent space but may sacrifice reconstruction quality

Latent Space Properties

Disentanglement: Higher KL weights encourage more disentangled representations
Interpolation: Well-regularized latent spaces enable smooth interpolation between samples
Generation quality: Conditional VAEs enable class-specific generation with better control

🚀 Usage

Setup

Install dependencies:

pip install torch torchvision numpy matplotlib tqdm pyyaml tensorboard scikit-learn

Download dataset:
```
chmod +x download.sh
./download.sh
```
This will download and extract the AFHQ dataset to ./data/AFHQ/

Training a Model

Using the Training Script

from trainer import main
from cvae import CVAE

configs = {
    "model_name": "CVAE",
    "exp": "1",
    "latent_dim": 64,
    "batch_size": 64,
    "num_epochs": 50,
    "lr": 0.001,
    "scheduler": "ReduceLROnPlateau",
    "use_scheduler": True,
    "lambda_kld": 0.001,
}

main(configs)

Using the Notebook

Open Assignment4.ipynb in Jupyter
Run cells sequentially to:
- Load and inspect the dataset
- Define and initialize models
- Train experiments with different configurations
- Evaluate models and visualize results
- Generate and analyze samples

Viewing TensorBoard Logs

tensorboard --logdir=tboard_logs

Then open http://localhost:6006 in your browser to view:

Training/validation loss curves
Reconstruction vs KL divergence components
Learning rate schedule
Image reconstructions

Loading and Using Trained Models

import torch
from cvae import CVAE
from utils import load_model

# Initialize model
model = CVAE(latent_dim=64)
optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)

# Load checkpoint
model, optimizer, epoch, stats = load_model(
    model, optimizer, 
    'models/CVAE1/checkpoint_KLD_0.001_epoch_49.pth'
)

# Generate samples
model.eval()
with torch.no_grad():
    z = torch.randn(16, 64).to(device)
    z = model.decoder_input(z)
    z = z.view(-1, 128, 4, 4)
    samples = model.decoder(z)

Conditional Generation (CCVAE)

from ccvae import CCVAE

model = CCVAE(latent_dim=64, num_classes=3)

# Generate samples for specific class (0=cat, 1=dog, 2=wildlife)
class_label = torch.tensor([0] * 16)  # Generate 16 cat faces
samples = model.sample(num_samples=16, c=class_label)

🔧 Configuration Files

Each experiment has a YAML configuration file in configs/:

batch_size: 64
exp: '1'
lambda_kld: 0.001
latent_dim: 64
lr: 0.001
model_name: CVAE
num_epochs: 50
scheduler: ReduceLROnPlateau
use_scheduler: true

📝 Key Concepts

Variational Autoencoder

A VAE is a generative model that learns to encode data into a latent distribution and decode samples from that distribution. Unlike standard autoencoders, VAEs learn a probabilistic latent representation.

Reparameterization Trick

Enables backpropagation through random sampling:

z = μ + ε · σ, where ε ~ N(0,1)

This makes the sampling process differentiable.

ELBO (Evidence Lower BOund)

The VAE objective function:

ELBO = E[log p(x|z)] - KL(q(z|x) || p(z))

Maximizing ELBO is equivalent to maximizing the data likelihood while regularizing the latent distribution.

KL Divergence

Measures how different the learned latent distribution q(z|x) is from the prior p(z) = N(0,I). Encourages the encoder to produce latent codes that match the standard normal distribution.

🔗 References

💬 Support

If you found this project helpful, you can support my work by buying me a coffee or via paypal!

This assignment demonstrates variational inference, generative modeling, and the trade-offs between reconstruction quality and latent space regularization in deep learning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Assignment 4: Variational Autoencoders (VAEs)

📋 Overview

🎯 Objectives

📊 Dataset

🏗️ Models Implemented

1. Convolutional VAE (CVAE)

2. Conditional Convolutional VAE (CCVAE)

🔬 Experiments

Training Configuration

Loss Function

🛠️ Key Features

Training Infrastructure

Visualization Tools

Utility Functions

📁 Project Structure

📈 Analysis & Results

KL Divergence Weight Analysis

Latent Space Properties

🚀 Usage

Setup

Training a Model

Using the Training Script

Using the Notebook

Viewing TensorBoard Logs

Loading and Using Trained Models

Conditional Generation (CCVAE)

🔧 Configuration Files

📝 Key Concepts

Variational Autoencoder

Reparameterization Trick

ELBO (Evidence Lower BOund)

KL Divergence

🔗 References

💬 Support

FilesExpand file tree

Assignment4

Directory actions

More options

Directory actions

More options

Latest commit

History

Assignment4

Folders and files

parent directory

README.md

Assignment 4: Variational Autoencoders (VAEs)

📋 Overview

🎯 Objectives

📊 Dataset

🏗️ Models Implemented

1. Convolutional VAE (CVAE)

2. Conditional Convolutional VAE (CCVAE)

🔬 Experiments

Training Configuration

Loss Function

🛠️ Key Features

Training Infrastructure

Visualization Tools

Utility Functions

📁 Project Structure

📈 Analysis & Results

KL Divergence Weight Analysis

Latent Space Properties

🚀 Usage

Setup

Training a Model

Using the Training Script

Using the Notebook

Viewing TensorBoard Logs

Loading and Using Trained Models

Conditional Generation (CCVAE)

🔧 Configuration Files

📝 Key Concepts

Variational Autoencoder

Reparameterization Trick

ELBO (Evidence Lower BOund)

KL Divergence

🔗 References

💬 Support