Skip to content

Latest commit

 

History

History
5 lines (5 loc) · 468 Bytes

File metadata and controls

5 lines (5 loc) · 468 Bytes

For this project, we applied a Vision Transformer model to a multi-label classification task on retinal fundus images. Class imbalance and the small size of the dataset were the biggest roadblocks for this particular problem. As a result, proper data augmentation techniques were key to achieve better performance. We observed that the Vision Transformer was able to outperform our baseline model, ResNet V2.0. Feel free to read the full report for more information.