Authors: Avital Fine (ID: 208253823), Noa Lazar (ID: 322520339)
Course: Deep Learning, 2025-Semester B, RUNI
This project compares two deep learning architectures for binary classification of pneumonia from chest X-ray images:
- CNN (Convolutional Neural Network): Standard VGG-like model for local feature extraction.
- ViT (Vision Transformer): Transformer-based architecture capturing global context.
We analyze model performance, training behavior, and generalization on a relatively small dataset.
- Source: Chest X-ray Pneumonia Dataset
- Dataset sizes:
- Training set: ~5,200 images
- Validation set: 16 images
- Test set: ~600 images
- Preprocessing: Grayscale X-rays resized to 224×224 pixels and normalized to [0,1].
Note: The validation set is very small compared to the test set, which made early stopping and hyperparameter tuning challenging.
- Optimizer: Adam, learning rate = 1e-4
- Batch size: 32
- Early stopping: Monitored validation loss
- Epochs to converge: CNN ~10, ViT ~15
- Platform: Local macOS M4
| Model | Accuracy | Precision | Recall | F1-score |
|---|---|---|---|---|
| CNN | 0.85 | 0.86 | 0.85 | 0.85 |
| ViT | 0.78 | 0.78 | 0.77 | 0.75 |
Notes:
- CNN: ~404k parameters, faster convergence, strong baseline for small datasets.
- ViT: ~14.4M parameters, longer training, captures global context, but underperforms CNN on this dataset.
Key challenges in this project included:
- Preventing overfitting given the relatively small dataset.
- Ensuring fair comparison by keeping training procedures consistent across models.
- Dealing with computational efficiency and resource constraints, as ViT required significantly more time and memory to train.
- Working with a very small validation set (16 images) compared to the test set (600 images), which made early stopping and hyperparameter tuning more difficult.
Clone the repository:
git clone https://github.com/Avital-Fine/deep-learning-final-project.git
cd deep-learning-final-project