✨SAM based Region-Word Clustering and Inference Score Adjusting for Open-Vocabulary Object Detection
This repository is the official implementation of SAM based Region-Word Clustering and Inference Score Adjusting for Open-Vocabulary Object Detection accepted by ACM MM 2025.
Qiuyu Liang, Yongqiang Zhang*
- Linux with Cuda == 11.1.
- Python == 3.8.20.
- PyTorch == 1.9.0. Install them together at pytorch.org to make sure of this. Note, please check PyTorch version matches that is required by Detectron2.
- Detectron2: follow Detectron2 installation instructions.
conda create --name cadet python=3.8.20 -y
conda activate cadet
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html -i https://pypi.tuna.tsinghua.edu.cn/simple
# under your working directory
git clone https://github.com/llqy123/CADet.git
cd CADet
cd detectron2
pip install -e .
cd ..
pip install -r requirements.txtPlease first prepare datasets.
Similar to baseline VLDet, our CADet models are finetuned on the corresponding Box-Supervised models (indicated by MODEL.WEIGHTS in the config files). Please train or download the Box-Supervised model and place them under CADet_ROOT/models/ before training the CADet models.
To train a model with OV-COCO dataset, run
python train_net.py --num-gpus 8 --config-file configs/CADet_OVCOCO_CLIP_R50_1x_caption.yaml
To evaluate a model with a trained weight, run
python train_net.py --num-gpus 8 --config-file configs/CADet_OVCOCO_CLIP_R50_1x_caption.yaml --eval-only MODEL.WEIGHTS /path/to/weight.pth
Download the trained network weights.
| OV-COCO | Novel AP50 | Base AP50 | Overall AP50 | Weight |
|---|---|---|---|---|
| config_RN50 | 36.4 | 50.6 | 46.9 | paper |
| config_RN50 | 36.4 | 51.0 | 47.2 | weight |
If you find this project useful for your research, please use the following BibTeX entry.
@inproceedings{liang2025sam,
title={SAM based Region-Word Clustering and Inference Score Adjusting for Open-Vocabulary Object Detection},
author={Liang, Qiuyu and Zhang, Yongqiang},
booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
pages={2596--2605},
year={2025}
}
This repository was built on top of Detectron2, Detic, RegionCLIP, OVR-CNN, VLDet and Segment Anything Model. We thank for their hard work.
