This assignment focuses on building and training basic neural networks from scratch on the CIFAR-10 dataset. The project implements and compares Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs), with experiments on regularization techniques and custom learning rate scheduling.
- Implement and train MLP and CNN classifiers on CIFAR-10
- Compare model performance with and without dropout regularization
- Implement custom learning rate warmup scheduler
- Analyze learning curves, confusion matrices, and model predictions
- Investigate failure cases and model behavior
CIFAR-10 - 10-class image classification dataset
- Training samples: 50,000
- Test samples: 10,000
- Image size: 32×32×3 (RGB)
- Classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
The dataset is automatically downloaded using PyTorch's torchvision.datasets.CIFAR10.
A fully connected neural network with the following architecture:
- Input: Flattened 32×32×3 = 3,072 features
- Hidden layers:
- Layer 1: 3,072 → 1,024 (ReLU)
- Layer 2: 1,024 → 512 (ReLU)
- Layer 3: 512 → 256 (ReLU)
- Layer 4: 256 → 128 (ReLU)
- Output: 128 → 10 (logits)
- Total parameters: ~3.8M
Features:
- Optional dropout layers (p=0.5) after each hidden layer
- Supports custom learning rate scheduling
A convolutional neural network with the following architecture:
- Conv Block 1: 3 → 32 → 64 channels (3×3 conv, ReLU, MaxPool)
- Conv Block 2: 64 → 128 → 128 channels (3×3 conv, ReLU, MaxPool)
- Conv Block 3: 128 → 256 → 256 channels (3×3 conv, ReLU, MaxPool)
- Fully Connected:
- 256×4×4 → 1,024 (ReLU)
- 1,024 → 512 (ReLU)
- 512 → 10 (logits)
Features:
- Optional dropout layers (p=0.5) after pooling and FC layers
- Padding to preserve spatial dimensions
- Progressive channel expansion
The project includes 6 experiments comparing different configurations:
| Experiment | Model | Dropout | LR Scheduler | Description |
|---|---|---|---|---|
| Exp1 | MLP | ❌ | ❌ | Baseline MLP without regularization |
| Exp2 | MLP | ✅ | ❌ | MLP with dropout (p=0.5) |
| Exp3 | CNN | ❌ | ❌ | Baseline CNN without regularization |
| Exp4 | CNN | ✅ | ❌ | CNN with dropout (p=0.5) |
| Exp5 | MLP | ✅ | ✅ | MLP with dropout + custom LR scheduler |
| Exp6 | CNN | ✅ | ✅ | CNN with dropout + custom LR scheduler |
All experiments use:
- Optimizer: Adam
- Learning rate: 0.0001
- Batch size: 1024
- Epochs: 100
- Loss function: CrossEntropyLoss
- Validation: Every 10 epochs
A custom linear warmup scheduler is implemented (no PyTorch schedulers used):
def warmup_lr(optimizer, current_epoch, warmup_epochs, target_lr, init_lr=1e-6):
"""
Linear warmup schedule: gradually increases LR from init_lr to target_lr
over warmup_epochs, then maintains target_lr.
"""Parameters:
warmup_epochs: 25init_lr: 1e-6target_lr: 0.0001
- TensorBoard logging: Training/validation loss and learning rate curves
- Model checkpointing: Saves best models with training configurations
- Progress tracking: Real-time training progress with tqdm
- Evaluation metrics: Accuracy, confusion matrices, per-class performance
Assignment1/
├── Assignment1.ipynb # Main assignment notebook
├── Session1.ipynb # Lab session materials
├── data/
│ ├── cifar-10-batches-py/ # CIFAR-10 dataset
│ └── MNIST/ # MNIST dataset (if used)
├── models/
│ ├── Exp1/ # Experiment 1 checkpoints
│ ├── Exp2/ # Experiment 2 checkpoints
│ ├── Exp3/ # Experiment 3 checkpoints
│ ├── Exp4/ # Experiment 4 checkpoints
│ ├── Exp5/ # Experiment 5 checkpoints
│ └── Exp6/ # Experiment 6 checkpoints
├── log_dir/
│ ├── Exp1/ # TensorBoard logs for Exp1
│ ├── Exp2/ # TensorBoard logs for Exp2
│ └── ... # Logs for other experiments
└── imgs/ # Visualization images
├── MLP.png
├── CNN.png
├── softmax.png
└── ...
The notebook includes comprehensive analysis:
- Learning curves: Training vs validation loss over epochs
- Confusion matrices: Per-class classification performance
- Accuracy metrics: Overall and per-class accuracy
- Failure case analysis: Visualization of misclassified images
- Overfitting analysis: Comparison of models with/without dropout
- Dropout Regularization: Reduces overfitting gap between training and validation loss
- Learning Rate Scheduling: Custom warmup helps stabilize training in early epochs
- CNN vs MLP: CNNs generally outperform MLPs on image classification tasks
- Failure Cases: Models struggle with similar classes (e.g., cat vs dog, bird vs airplane)
-
Install dependencies:
pip install torch torchvision numpy matplotlib seaborn tqdm pyyaml tensorboard torchmetrics
-
Open the notebook:
jupyter notebook Assignment1.ipynb
-
Run experiments: Execute cells sequentially to:
- Download and inspect the dataset
- Define models (MLP and CNN)
- Train experiments (Exp1-Exp6)
- Evaluate models and visualize results
tensorboard --logdir=log_dirThen open http://localhost:6006 in your browser to view training curves.
checkpoint = torch.load('models/Exp1/checkpoint_Exp1.pth')
model.load_state_dict(checkpoint['model_state_dict'])If you found this project helpful, you can support my work by buying me a coffee or via paypal!
This assignment demonstrates fundamental deep learning concepts including neural network architectures, regularization techniques, and training optimization strategies.