This is the repository for our paper:
VLOD-TTA: Test-Time Adaptation of Vision-Language Object Detectors
Atif Belal, Heitor R. Medeiros, Marco Pedersoli, Eric Granger
VLOD-TTA adapts VL-ODs (e.g., YOLO-World, Grounding DINO) at inference with IoU-weighted entropy and image-conditioned prompt selection, optimizing lightweight adapters while preserving zero-shot capability.
- Paper is under review, code will be released soon.
- arXiv - Paper
If you find our work helpful for your research, please consider citing the following BibTeX entry.
@misc{belal2025vlodtta,
title={VLOD-TTA: Test-Time Adaptation of Vision-Language Object Detectors},
author={Atif Belal and Heitor R. Medeiros and Marco Pedersoli and Eric Granger},
year={2025},
eprint={2510.00458},
archivePrefix={arXiv},
}

