Skip to content

SpeeritX/dl-geolocation-vit

 
 

Repository files navigation

Image Geo-Location with Visual Transformers

This is the project repository for Deep Learning for Visual Recognition (2023 fall). The course was 10 ECTS, focusing mainly on the project work. This project received the maximum grade - 12/12.

Abstract

This project report delves into the challenging domain of image geolocation, aiming to predict the location of an image solely based on its visual content. Employing Visual Transformers (ViT) instead of traditional Convolutional Neural Networks (CNN), the study explores the limitations of utilizing limited resources for training. The dataset, primarily sourced from Google Street View, focuses on Denmark, leveraging administrative boundaries for geocell classification. The research incorporates insights from related works, including PIGEON and TransLocator, while experimenting with data augmentation, learning rate adjustment, and a custom loss function to enhance model accuracy. Results indicate notable improvements in accuracy, yet the study identifies limitations and proposes avenues for future work, such as refining geocell generation, exploring continuous location prediction, and investigating multi-task network architectures.

To read more about it here is our project report.

About

Image Geo-Location with Visual Transformers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 99.7%
  • Python 0.3%