This repository contains the official implementation of our paper,
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models,which has been accepted to CVPR 2025.
- We introduce Image-Object Cross-Level Trusted Intervention (ICT), a light weight and training-free method that calculates an intervention direction to shift the model's focus towards different levels of visual information, enhancing its attention to high-level and fine-grained visual details.
- The new ICT is formulated as follows:
$$\boldsymbol{H}^{(l+1)} = \boldsymbol{H}^{(l)}+ \sum_{n=1}^{N} \Big( Attn _n^{(l)} (\boldsymbol{H}^{(l)}) + \mathbb{I} _{\text{img},n}^{(l)} \alpha \boldsymbol{S} _{n}^{(l)} + \mathbb{I} _{\text{obj},n}^{(l)} \beta \boldsymbol{S} _{\text{obj},n}^{(l)} \Big) \cdot W _o^{(l)}.$$ - The proposed ICT effectively reduces the harmful over-reliance on language prior , a major cause of hallucinations in LVLMs, while preserveing the benifits of the useful ones.
- Image Datasets: Required image datasets for the Pope benchmark.
- Pope Question-Answer Pairs: Ensure you have the necessary question and answer files for Pope.
- set up by runnning
conda env create -f environment.yml
conda activate ictRun the following scripts to generate different types of intervention vectors using your model and dataset.
python get_base_vector.py --model-path path/to/llava-v1.5 \
--question-file path/to/pope/question-file \
--image-folder path/to/your/coco/images \
--seed ${1:-55} --length 1500 \
--output ./basepython get_hallucinated_vector.py --model-path path/to/llava-v1.5 \
--question-file path/to/pope/question-file \
--image-folder path/to/your/coco/images \
--seed ${1:-55} --length 1500 \
--output ./hallucinatedpython get_object_vector.py --model-path path/to/llava-v1.5 \
--question-file path/to/pope/question-file \
--image-folder path/to/your/coco/images \
--seed ${1:-55} --length 1500 \
--output ./objectpython val_ict_pope.py --question_file path/to/pope/question-file \
--num_heads 256 --alpha 8 --seed ${1:-55} \
--length 1500 --target_dataset coco \
--type bothEvaluate the generated answers against ground truth annotations.
python eval_pope.py --gt_files path/to/groundtruth/pope/answers \
--gen_files answer.jsonlMMMU Benchmark
- To evaluate ICT on the MMMU benchmark, first clone the MMMU repository:
git clone https://github.com/MMMU-Benchmark/MMMU.git- Then, place the necessary files in the /MMMU directory and run:
python val_ict_MMMU.py- MMMU Evaluation for 13B Models
python val_ict_13b_MMMU.pyPhD Benchmark Evaluation
- To run ICT on the PhD benchmark, execute:
python val_ict_phd.pyIf you find our project useful, we hope you can star our repo and cite our paper as follows:
@article{chen2024ict,
title={ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models},
author={Chen, Junzhe and Zhang, Tianshu and Huang, Shiyu and Niu, Yuwei and Zhang, Linfeng and Wen, Lijie and Hu, Xuming},
journal={arXiv preprint arXiv:2411.15268},
year={2024}
}

