Huanjin Yao2,3*, Qixiang Yin4*, Jingyi Zhang1, Min Yang2, Yibo Wang3, Wenhao Wu5,
Fei Su4, Li Shen1, Minghui Qiu2, Dacheng Tao1 Jiaxing Huang1✉️
1Nanyang Technological University, 2ByteDance, 3Tsinghua University, 4BUPT, 5USYD
*Equal Contribution, ✉️Corresponding Author
-
Sep 19, 2025.R1-Share has been accepted at NeurIPS 2025! -
Jul 15, 2025.We release our ShareGRPO Training code, 52K Training Data and R1-ShareVL-7B Model! -
May 23, 2025.We release our paper in arxiv.
We introduce Share-GRPO, a novel reinforcement learning framework for MLLMs that addresses the challenges of sparse rewards and advantage vanishing in reasoning tasks. For a given question, Share-GRPO first applies semantically consistent transformations to generate a set of varied but semantically equivalent questions, thereby expanding the question space. It then encourages the MLLM to explore diverse reasoning paths across this expanded space and facilitates the sharing of discovered reasoning trajectories and their rewards among these question variants during the reinforcement learning process. This approach enables more effective exploration, denser reward signals, and more robust training.
git clone https://github.com/HJYao00/R1-ShareVL.git
cd R1-ShareVL
pip install -e .bash examples/qwen2_5_vl_7b_sharegrpo.shpython3 scripts/model_merger.py --local_dir checkpoints/easy_r1/exp_name/global_step_1/actorWe evaluate R1-ShareVL using VLMEvalKit! Please make sure to include a thinking prompt after the question here.
R1_PROMPT = r"""You FIRST think about the reasoning process as an internal monologue and then provide the final answer.
The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \boxed{}."""
item = {'type': 'text', 'text': s['value'] + " " + R1_PROMPT}If you find this repository is useful, please star🌟 this repo and cite🖇️ our paper.
@misc{yao2025r1sharevl,
title={R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO},
author={Huanjin Yao and Qixiang Yin and Jingyi Zhang and Min Yang and Yibo Wang and Wenhao Wu and Fei Su and Li Shen and Minghui Qiu and Dacheng Tao and Jiaxing Huang},
year={2025},
eprint={2505.16673},
archivePrefix={arXiv},
primaryClass={cs.CV},
}Our work is primarily based on the following codebases. We are sincerely grateful for their work.
- EasyR1: We use EasyR1 to fine-tune R1-ShareVL Models.
- VLMEvalKit: We use VLMEvalKit for evaluation.
