In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset. Two of these are made on top of Facebook AI Reasearch's Multi-Modal Framework (MMF).
| Model Name | Accuracy | Number of Epochs |
|---|---|---|
| Hierarchical Question-Image Co-attention | 48.32% | 42 |
| MMF Transformer | 51.76% | 30 |
| MMBT | 86.78% | 30 |
Download the dataset from here and place it in a directory named /dataset/med-vqa-data/ in the directory where this repository is cloned.
mmf_run config=projects/hateful_memes/configs/mmf_transformer/defaults.yaml model=mmf_transformer dataset=hateful_memes training.checkpoint_interval=100 training.max_updates=3000mmf_run config=projects/hateful_memes/configs/mmbt/defaults.yaml model=mmbt dataset=hateful_memes training.checkpoint_interval=100 training.max_updates=3000cd hierarchical \
python main.pyDataset used for training the models was the VQA-MED dataset taken from "ImageCLEF 2019: Visual Question Answering in Medical Domain" competition. Following are few plots of some statistics of the dataset.
| Distribution of the type of questions in the dataset. |
| Plot of frequency of words in answer. |