This project implements a deep learning solution for pneumonia detection from chest X-ray images. The goal is to classify chest X-ray images into two categories:
- NORMAL: Healthy lungs
- PNEUMONIA: Pneumonia-affected lungs
The project explores multiple CNN architectures and compares their performance and the out-performing model is selected for pneumonia detection.
The dataset used is the Chest X-ray Images from Kaggle, downloaded using the kagglehub library.
chest_xray/
├── train/
│ ├── NORMAL/
│ └── PNEUMONIA/
└── test/
├── NORMAL/
└── PNEUMONIA/
-
Training Set:
- NORMAL: 1,341 images
- PNEUMONIA: 3,875 images
- Total: 5,216 images
-
Test Set:
- NORMAL: 234 images
- PNEUMONIA: 390 images
- Total: 624 images
The dataset shows class imbalance with pneumonia cases being more prevalent than normal cases (~3:1 ratio in training set).
basic_transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5], std=[0.5])
])train_transform = transforms.Compose([
transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(15),
transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.RandomAdjustSharpness(sharpness_factor=2, p=0.5),
transforms.RandomAutocontrast(p=0.3),
transforms.RandomApply([transforms.GaussianBlur(3)], p=0.3),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5], std=[0.5])
])Architecture:
- 2 Convolutional layers (32, 64 filters)
- ReLU activation
- MaxPooling layers
- Fully connected classifier
Architecture:
- 2 Convolutional layers with Batch Normalization
- Dropout (0.2) for regularization
- Improved generalization over basic CNN
Improvements:
- Batch normalization for stable training
- Dropout to prevent overfitting
Architecture:
- 3 Convolutional layers (32, 64, 128 filters)
- Batch normalization after each conv layer
- Dropout (0.3) in classifier
- Deeper feature extraction
Improvements:
- More complex feature learning
- Better representation capability
Architecture:
- ResNet18 backbone pretrained on ImageNet
- Modified first layer for grayscale input (1 channel)
- Fine-tuned for binary classification
- Transfer learning approach
- Batch Size: 32
- Optimizer: Adam
- Learning Rate: 1e-3 (custom CNNs), 1e-4 (pretrained)
- Loss Function: CrossEntropyLoss
- Early Stopping: Patience of 5 epochs
- Device: CUDA if available, else CPU
- Early stopping to prevent overfitting
- Model checkpointing (saves best model based on validation loss)
- Learning rate scheduling for pretrained model
- Comprehensive logging of training/validation metrics
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| CNN2 | 0.8654 | 0.8621 | 0.8654 | 0.8631 |
| CNN2_BN_D | 0.9038 | 0.9015 | 0.9038 | 0.9021 |
| CNN3_BN_D | 0.9231 | 0.9211 | 0.9231 | 0.9218 |
| Pretrained ResNet18 | 0.9712 | 0.9710 | 0.9712 | 0.9710 |
- Regularization Impact: Batch normalization and dropout improved the model performance
- Transfer Learning Success: Pretrained ResNet18 achieved the best results with 97.12% accuracy
- Generalization: The final model shows excellent performance across all metrics
- The confusion matrices reveal:
- High true positive rate for pneumonia detection
- Low false negative rate which is critical for medical fields
- Excellent overall classification performance
- Data Augmentation: Improved model robustness
- Transfer Learning: Leveraged pretrained features for medical imaging
- Regularization: Prevented overfitting while maintaining performance
- Comprehensive Evaluation: Multiple metrics ensure reliable assessment
In conclusion, pretrained model (ResNet18) are outperformed compared to other models because it achieves highest accuracy, precision, recall and F1 Score compared to other models. In fact, ResNet18 always having better performance compared to classic network architectures. This is because of its residual connections (skip connections) that able solve the vanishing gradients problem when neural network grows deeper and deeper. So, rather than build the model from scratch, fine-tuning the pretrained model would be a better choices.
- Dataset Expansion: Include more diverse X-ray images so that model could be more generalize
- Multi-class Classification: Distinguish between bacterial and viral pneumonia
- Deployment: Integrating the model into a an user-friendly application