TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection

News

[2025/10/20]🔥🔥🔥TRUST-VL-13B checkpoint and TRUST-Instruct dataset are now publicly available!
[2025/09/06]🚀🚀🚀TRUST-VL is realsed. Checkout the paper for more details.

Quickstart

Take your first steps with the TRUST-VL model.

Clone this repository and install package

git clone https://github.com/YanZehong/TRUST-VL.git
cd TRUST-VL
conda create -n trustvl python=3.10 -y
conda activate trustvl
pip install --upgrade pip
pip install -e .

(Optional) Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn==2.6.3 --no-build-isolation #--no-cache-dir

Model Weights

Please check out 🤗 Huggingface Models for public TRUST-VL checkpoints.

git lfs install
git clone https://huggingface.co/NUSryan/TRUST-VL-13b-task

Training

TRUST-VL training consists of three stages: In Stage 1, we begin by training the projection module for one epoch on 1.2 million image–text pairs (653K news samples from VisualNews and 558K samples from the LLaVA training corpus). This stage aligns the visual features with the language model. In Stage 2, we jointly train the LLM and the projection module for one epoch using 665K synthetic conversation samples from the LLaVA training corpus to improve the model’s ability to follow complex instructions. In Stage 3, we fine-tune the full model on 198K reasoning samples from TRUST-Instruct for three epochs to further enhance its misinformation-specific reasoning capabilities.

Similar to LLaVA, TRUST-VL is trained on 8 A100 GPUs with 80GB memory. To train on fewer GPUs, you can reduce the per_device_train_batch_size and increase the gradient_accumulation_steps accordingly. Always keep the global batch size the same: per_device_train_batch_size x gradient_accumulation_steps x num_gpus.

Stage 1: Language-Image Alignment + News Domain Alignment

Please download the 1211K subset we use in the paper here, which is based on the LAION-CC-SBU dataset.

Training script with DeepSpeed ZeRO-2: trust_vl_stage1.sh.

--mm_projector_type mlp2x_gelu: the two-layer MLP vision-language connector.
--vision_tower openai/clip-vit-large-patch14-336: CLIP ViT-L/14 336px.

Stage 2: Visual Instruction Tuning

Please download the annotation of the final mixture our instruction tuning data llava_v1_5_mix665k.json, and download the images from constituting datasets:

COCO: train2017
GQA: images
OCR-VQA: download script, we save all files as .jpg
TextVQA: train_val_images
VisualGenome: part1, part2

After downloading all of them, organize the data as follows in ./data,

├── coco
│   └── train2017
├── gqa
│   └── images
├── ocr_vqa
│   └── images
├── textvqa
│   └── train_images
└── vg
    ├── VG_100K
    └── VG_100K_2

Training script with DeepSpeed ZeRO-3: trust_vl_stage2.sh.

Stage 3: Misinformation Tuning

Please download the annotation of the final mixture our instruction tuning data TRUST-Instruct_task198k.json, and download the images from constituting datasets.

VisualNews:
1. Request the VisualNews Dataset at here.
2. Place the files under the ./data folder.
NewsCLIPpings:
1. Git clone the news_clippings repository.
2. Run ./download.sh.
3. More details can be found in here.
4. Download already-collected evidence according to the instrustions in here.
DGM4:
Download the DGM4 dataset through this link: DGM4.
Factify2:
Download the Factify2 dataset according to the instruction here.
MMFakeBench:
You should strictly follow the data usage guidelines by filling in Data Usage Protocol on Huggingface from MMFakeBench.

After downloading all of them, organize the data as follows in ./data,

├── origin
│   ├── bbc
│   ├── guardian
│   ├── usa_today
│   ├── washington_post
│   └── data.json
├── DGM4
│   ├── manipulation
│   ├── metadata
│   └── origin
├── Factify2
│   ├── data
│   └── images-train
├── MMFakeBench
│   ├── fake
│   ├── real
    └── source

Training script with DeepSpeed ZeRO-3: trust_vl_stage3.sh.

Evals

In TRUST-VL, we evaluate models on a diverse set of 7 misinformation benchmarks.

# Single GPU inference.
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/mmfakebench.sh
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/ood.sh

# Multi-GPU inference.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/eval/newsclippings.sh

Note: Please ensure that the corresponding image data for each evaluation dataset has been properly downloaded before running the evaluation.

Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝 :)

@inproceedings{yan-etal-2025-trust,
    title = "{TRUST}-{VL}: An Explainable News Assistant for General Multimodal Misinformation Detection",
    author = "Yan, Zehong  and
      Qi, Peng  and
      Hsu, Wynne  and
      Lee, Mong-Li",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.284/",
    pages = "5588--5604",
    ISBN = "979-8-89176-332-6",
}

Acknowledgement

We would like to thank LLaVA and Vicuna for their amazing works. We also appreciate the benchmarks: MMFakeBench, Factify2, DGM⁴, NewsCLIPpings, MOCHEG, Fakeddit, VERITE and VisualNews.

Usage and License Notices: This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses, including but not limited to the OpenAI Terms of Use for the dataset and the specific licenses for base language models. This project does not impose any additional constraints beyond those stipulated in the original licenses. Furthermore, users are reminded to ensure that their use of the dataset and checkpoints is in compliance with all applicable laws and regulations.

Name		Name	Last commit message	Last commit date
Latest commit History 490 Commits
.devcontainer		.devcontainer
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
data/eval		data/eval
images		images
llava		llava
outputs		outputs
scripts		scripts
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection

News

Contents

Quickstart

Model Weights

Training

Stage 1: Language-Image Alignment + News Domain Alignment

Stage 2: Visual Instruction Tuning

Stage 3: Misinformation Tuning

Evals

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 46

Uh oh!

Languages

License

YanZehong/TRUST-VL

Folders and files

Latest commit

History

Repository files navigation

TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection

News

Contents

Quickstart

Model Weights

Training

Stage 1: Language-Image Alignment + News Domain Alignment

Stage 2: Visual Instruction Tuning

Stage 3: Misinformation Tuning

Evals

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 46

Uh oh!

Languages

Packages