This repository contains a deep learning course project focused on medical image classification for detecting and grading diabetic retinopathy from retinal fundus images.
The project explores how different ImageNet-pretrained convolutional neural networks (CNNs) perform on a limited medical dataset and evaluates techniques to improve model robustness and interpretability.
- Evaluate and fine-tune pretrained CNN architectures for medical image analysis
- Improve performance using data augmentation, attention mechanisms, and ensemble learning
- Analyze and interpret model decisions using explainable AI (Grad-CAM)
- Models: ResNet18, VGG16, DenseNet121, DenseNet161, EfficientNet
- Transfer Learning: Fine-tuning ImageNet-pretrained models
- Datasets: DeepDRiD, APTOS-2019
- Attention Mechanisms: Channel attention (Squeeze-and-Excitation), Spatial attention
- Ensemble Methods: Bagging, stacking, boosting
- Evaluation Metrics: Cohen’s Kappa (primary), accuracy, precision, recall
- Explainability: Grad-CAM visualizations for model interpretability
- Python, PyTorch, Torchvision
- NumPy, Pandas, Scikit-learn
- OpenCV, PIL
- Matplotlib
- Jupyter Notebooks
The best-performing single model was VGG16 with channel and spatial attention, achieving strong Cohen Kappa scores while maintaining high accuracy. Ensemble methods provided robustness but did not outperform the best single attentive model.
- Applied deep learning for real-world medical imaging tasks
- Model fine-tuning and hyperparameter optimization
- Handling small and imbalanced datasets
- Model evaluation beyond accuracy (medical-relevant metrics)
- Explainable AI for trust and transparency
Santeri Heikkinen, Joona Mustonen, Joonatan Salo
Course project for Deep Learning (2024), University of Oulu.