This is a combined repository that starts from
- processing visual data with classical technique
and goes on to deep-learning concepts like
- MLPs and CNNs for object identification,
- Vision Transformers (ViTs) and CNN-LSTMs for image-captioning.
The model classes are defined in src/models. The training scripts can be found in src/trainers. The implementations and demos can be found in src/notebooks.