The official implementation of CVPR 2025 paper: NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval.
TL;DR: NeighborRetr tackles the hubness problem in cross-modal retrieval by distinguishing between good hubs (relevant) and bad hubs (irrelevant) during training, offering a direct solution rather than relying on post-processing methods that require prior data distributions.
If you find this paper useful, please consider starring 🌟 this repo and citing 📑 our paper:
@article{lin2025neighborretr,
title={NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval},
author={Lin, Zengrong and Wang, Zheng and Qian, Tianwen and Mu, Pan and Chan, Sixian and Bai, Cong},
journal={arXiv preprint arXiv:2503.10526},
year={2025}
}The hubness problem in cross-modal retrieval refers to the phenomenon where certain items (hubs) frequently emerge as the nearest neighbors to many other samples, while the majority of samples rarely appear as neighbors. This leads to biased representations and degraded retrieval accuracy. Unlike previous approaches that apply post-hoc normalization techniques during inference, NeighborRetr introduces a novel approach that:
- Distinguishes between good hubs (semantically relevant) and bad hubs (semantically irrelevant)
- Applies adaptive neighborhood adjustment during training
- Employs uniform regularization to balance hub formation
Our method significantly improves the quality of nearest neighbors, reducing irrelevant hubs and promoting more meaningful semantic relationships:
- [2025/04/13]: Code released! 🎉
- [2025/03/14]: Initial version submitted to arXiv.
- [2025/02/27]: Our paper is accepted to CVPR 2025!
# Create and activate conda environment
conda create -n NeighborRetr python=3.8 -y
conda activate NeighborRetr
# Install dependencies
pip install -r requirements.txtcd NeighborRetr/models
wget https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt
# Optional: for ViT-B-16
# wget https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.ptCUDA_VISIBLE_DEVICES=0,1,2,3 \
python -m torch.distributed.launch \
--master_port 4501 \
--nproc_per_node=4 \
main_retrieval.py \
--do_train 1 \
--workers 8 \
--epochs 5 \
--batch_size 128 \
--batch_size_val 128 \
--anno_path ${ANNO_PATH} \
--video_path ${VIDEO_PATH} \
--datatype msrvtt \
--max_words 24 \
--max_frames 12 \
--output_dir ${OUTPUT_PATH} \
--mb_batch 15 \
--memory_size 512CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torch.distributed.launch \
--master_port 4501 \
--nproc_per_node=8 \
main_retrieval.py \
--do_train 1 \
--workers 8 \
--epochs 10 \
--batch_size 128 \
--batch_size_val 128 \
--anno_path ${ANNO_PATH} \
--video_path ${VIDEO_PATH} \
--datatype activity \
--max_words 64 \
--max_frames 64 \
--output_dir ${OUTPUT_PATH} \
--mb_batch 15 \
--memory_size 1024This repository is released under the Apache License 2.0. This permissive license allows users to freely use, modify, distribute, and sublicense the code while maintaining copyright and license notices.
Our work is primarily built upon HBI, CLIP, CLIP4Clip. We extend our gratitude to all these authors for their generously open-sourced code and their significant contributions to the community.

