Skip to content

maifoundations/DualMindVLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

17 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Learning to Think Fast and Slow for Visual Language Models

๐Ÿ’ก Overview

We introduce DualMindVLM, a dual-mode thinking VLM that can automatically switch between fast and slow thinking modes based on the difficulty level of the problem. DualMindVLM is optimized using a simple RL approach built only on questionโ€“answer pairs. The approach consists of two stages: The first stage utilizes the output length variation of the pretrained VLM to assign each sample a thinking mode label. The second stage develops dual-mode thinking in the model through GRPO-based reinforcement learning, where half the sampled candidates are guided by the assigned label. Despite its simplicity, DualMindVLM significantly outperforms the base model and achieves performance on par with state-of-the-art visual reasoning models, while maintaining exceptionally high token efficiency.


๐Ÿš€ Release Progress

Component Status Notes
๐Ÿงฉ Model โœ”๏ธ Released Available on ๐Ÿค— HuggingFace
โš™๏ธ Inference + Evaluation Code โœ”๏ธ Released vLLM-based inference, string-matching evaluation
๐Ÿ”ฅ Training Code ๐Ÿ•’ Coming Soon GRPO-based training framework

๐Ÿ”— Citation

If you find this work useful, please cite our paper:

@article{lin2025dualmindvlm,
  title     = {Learning to Think Fast and Slow for Visual Language Models},
  author    = {Chenyu Lin and Cheng Chi and Jinlin Wu and Sharon Li and Kaiyang Zhou},
  journal   = {arXiv preprint arXiv:2511.16670},
  year      = {2025}
}

About

Learning to Think Fast and Slow for Visual Language Model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages