We introduce DualMindVLM, a dual-mode thinking VLM that can automatically switch between fast and slow thinking modes based on the difficulty level of the problem. DualMindVLM is optimized using a simple RL approach built only on questionโanswer pairs. The approach consists of two stages: The first stage utilizes the output length variation of the pretrained VLM to assign each sample a thinking mode label. The second stage develops dual-mode thinking in the model through GRPO-based reinforcement learning, where half the sampled candidates are guided by the assigned label. Despite its simplicity, DualMindVLM significantly outperforms the base model and achieves performance on par with state-of-the-art visual reasoning models, while maintaining exceptionally high token efficiency.
| Component | Status | Notes |
|---|---|---|
| ๐งฉ Model | โ๏ธ Released | Available on ๐ค HuggingFace |
| โ๏ธ Inference + Evaluation Code | โ๏ธ Released | vLLM-based inference, string-matching evaluation |
| ๐ฅ Training Code | ๐ Coming Soon | GRPO-based training framework |
If you find this work useful, please cite our paper:
@article{lin2025dualmindvlm,
title = {Learning to Think Fast and Slow for Visual Language Models},
author = {Chenyu Lin and Cheng Chi and Jinlin Wu and Sharon Li and Kaiyang Zhou},
journal = {arXiv preprint arXiv:2511.16670},
year = {2025}
}