Visual attention models have demonstrated a growing capability in predicting scanpaths, which are sequences of fixations and eye movements. Specifically, ScanDDM introduced a DDM-based approach for predicting goal-directed scanpaths in a zero-shot modality, while ART focused on the incremental prediction of attention during language-guided object referral tasks. The present work explores the combination of these two approaches, modifying ScanDDM with the integration of GroundingDINO to address the incremental object referral task. The resulting model has been named DDM-DINO.
More detailed informations and examples of usage can be found in the attached PDF report.
Install all the requirements with pip install -r requirements.txt
- In
main.pydefine thepromptand theimage path - Run
python main.py
- Uncomment the commented libraries in
requirements.txt - Install the new requirements with
pip install -r requirements.txt - In
calculate_all_metrics.pydefine theparameters - Run
python calculate_all_metrics.py
Project for Natural Interaction and Affective Computing courses, UNIMI @ PHuSe Lab, AY 2024/2025, by Hari Calzi and Salvatore Ferrara.
