Text-guided Visual Prompt DINO for Generic Segmentation (Prompt-DINO)

This repo is the official implementation of Text-guided Visual Prompt DINO for Generic Segmentation, by Yuchen Guan, Chong Sun, Canmiao Fu, Zhipeng Huang, Chun Yuan, Chen Li.

Prompt-DINO is a unified model for open vocabulary detection and segmentation, capable of simultaneously outputting detection bounding boxes and segmentation masks. It accepts both text prompts and visual prompts, allowing it to perform detection and segmentation of the specified categories based on the given prompts. Prompt-DINO has been trained using over 10 million datasets and hundreds of millions of target instance boxes. It demonstrates strong performance in the field of open vocabulary detection and segmentation.

Highlights

We have constructed annotations for hundreds of millions of instances, each accompanied by detailed concept descriptions. In the domain of open-source open-set object detection/segmentation, we used the highest number of concept-annotated instances.
We possess excellent detection and segmentation capabilities, achieving highly competitive results on mainstream evaluation datasets such as COCO, LVIS, and ADE20K.
We support both text prompts and visual prompts, and simultaneously enable detection and segmentation.

Usage Recommendations

We accept (text, image) or (box, image) pair as input, and simultaneously output detection boxes, as well as segmentation mask.
Since we use early fusion, the fewer the input prompt words, the better the results. It is recommended to keep the number of prompt words within 16.
Temporally, we only accept English words, the words should be split with ".". For example, an input prompt could be "apple.pair".
Since some instances in English can be expressed using different synonyms, you can try replacing them with synonyms to achieve a better experience. For example, "person" and "people".

🛠️ Install

Compile MultiScaleDeformbleAttention:

cd WeVisionOne/pixel_decoder/ops && sh make.sh

Compile Detectron2

cd /path_to_detectron2/detectron2/ && python setup.py install

Compile MMCV

cd /path_to_mmcv/mmcv/ && MMCV_WITH_EXT=1 MMCV_WITH_OPS=1 MAX_JOBS=8 python setup.py build_ext && MMCV_WITH_OPS=1 python setup.py develop

Install other necessary libraries

pip3 install timm -i https://mirrors.tencent.com/pypi/simple/ 
pip3 install pycocotools -i https://mirrors.tencent.com/pypi/simple/
pip3 install omegaconf==2.4.0.dev2 -i https://mirrors.tencent.com/pypi/simple/
pip3 install shapely -i https://mirrors.tencent.com/pypi/simple/
pip3 install transformers -i https://mirrors.tencent.com/pypi/simple/
pip3 install panopticapi -i https://mirrors.tencent.com/pypi/simple/

Demo

We provide both a script and a Gradio demo:

Script Demo

cd Inference
IMG_PATH=resources/ships.jpg
python text_prompt.py --config-file configs/text_model_cfgs.yaml --img_path $IMG_PATH --text_prompts "ship"

Two outputs will be produced in "./output" folder, they should be like:

Gradio Demo

cd Inference
python gradio_demo.py

Results

Quantative Resutls

Qualitative Resutls

✒️ Citation

If you find our work helpful for your research, please consider citing the following BibTeX entry.

@inproceedings{guan2025text,
  title={Text-guided Visual Prompt DINO for Generic Segmentation},
  author={Guan, Yuchen and Sun, Chong and Fu, Canmiao and Huang, Zhipeng and Yuan, Chun and Li, Chen},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={21288--21298},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.asset		.asset
Inference		Inference
WeVisionOne		WeVisionOne
third_party		third_party
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-guided Visual Prompt DINO for Generic Segmentation (Prompt-DINO)

Highlights

Usage Recommendations

🛠️ Install

Demo

Results

✒️ Citation

About

Uh oh!

Releases

Packages

Languages

WeChatCV/WeVisionOne

Folders and files

Latest commit

History

Repository files navigation

Text-guided Visual Prompt DINO for Generic Segmentation (Prompt-DINO)

Highlights

Usage Recommendations

🛠️ Install

Demo

Results

✒️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages