For this project, we applied a Vision Transformer model to a multi-label classification task on retinal fundus images. Class imbalance and the small size of the dataset were the biggest roadblocks for this particular problem. As a result, proper data augmentation techniques were key to achieve better performance. We observed that the Vision Transformer was able to outperform our baseline model, ResNet V2.0. Feel free to read the full report for more information.
sf-nf-ai/Deep-Learning-CS230
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|