R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO

If you find this project useful, please give us a star🌟.

Huanjin Yao^2,3, Qixiang Yin^4, Jingyi Zhang¹, Min Yang², Yibo Wang³, Wenhao Wu⁵,

Fei Su⁴, Li Shen¹, Minghui Qiu², Dacheng Tao¹ Jiaxing Huang^1✉️

¹Nanyang Technological University, ²ByteDance, ³Tsinghua University, ⁴BUPT, ⁵USYD

^*Equal Contribution, ^✉️Corresponding Author

🎙️ News

Sep 19, 2025. R1-Share has been accepted at NeurIPS 2025!
Jul 15, 2025. We release our ShareGRPO Training code, 52K Training Data and R1-ShareVL-7B Model!
May 23, 2025. We release our paper in arxiv.

💡 About R1-ShareVL

We introduce Share-GRPO, a novel reinforcement learning framework for MLLMs that addresses the challenges of sparse rewards and advantage vanishing in reasoning tasks. For a given question, Share-GRPO first applies semantically consistent transformations to generate a set of varied but semantically equivalent questions, thereby expanding the question space. It then encourages the MLLM to explore diverse reasoning paths across this expanded space and facilitates the sharing of discovered reasoning trajectories and their rewards among these question variants during the reinforcement learning process. This approach enables more effective exploration, denser reward signals, and more robust training.

🚀 Training

Installation

git clone https://github.com/HJYao00/R1-ShareVL.git
cd R1-ShareVL
pip install -e .

GRPO Training

bash examples/qwen2_5_vl_7b_sharegrpo.sh

Merge Checkpoint in Hugging Face Format

python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/exp_name/global_step_1/actor

🚗 Evaluation

We evaluate R1-ShareVL using VLMEvalKit! Please make sure to include a thinking prompt after the question here.

R1_PROMPT = r"""You FIRST think about the reasoning process as an internal monologue and then provide the final answer.
 The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \boxed{}."""

item = {'type': 'text', 'text': s['value'] + " " + R1_PROMPT}

🔗 Citation

If you find this repository is useful, please star🌟 this repo and cite🖇️ our paper.

@misc{yao2025r1sharevl,
      title={R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO}, 
      author={Huanjin Yao and Qixiang Yin and Jingyi Zhang and Min Yang and Yibo Wang and Wenhao Wu and Fei Su and Li Shen and Minghui Qiu and Dacheng Tao and Jiaxing Huang},
      year={2025},
      eprint={2505.16673},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
}

🙏 Acknowledgment

Our work is primarily based on the following codebases. We are sincerely grateful for their work.

EasyR1: We use EasyR1 to fine-tune R1-ShareVL Models.
VLMEvalKit: We use VLMEvalKit for evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
examples		examples
figure		figure
scripts		scripts
verl.egg-info		verl.egg-info
verl		verl
Dockerfile		Dockerfile
Dockerfile.legacy		Dockerfile.legacy
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO

If you find this project useful, please give us a star🌟.

🎙️ News

💡 About R1-ShareVL

🚀 Training

Installation

GRPO Training

Merge Checkpoint in Hugging Face Format

🚗 Evaluation

🔗 Citation

🙏 Acknowledgment

About

Uh oh!

Releases

Packages

Languages

License

HJYao00/R1-ShareVL

Folders and files

Latest commit

History

Repository files navigation

R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO

If you find this project useful, please give us a star🌟.

Huanjin Yao2,3*, Qixiang Yin4*, Jingyi Zhang1, Min Yang2, Yibo Wang3, Wenhao Wu5, Fei Su4, Li Shen1, Minghui Qiu2, Dacheng Tao1 Jiaxing Huang1✉️ 1Nanyang Technological University, 2ByteDance, 3Tsinghua University, 4BUPT, 5USYD *Equal Contribution, ✉️Corresponding Author

🎙️ News

💡 About R1-ShareVL

🚀 Training

Installation

GRPO Training

Merge Checkpoint in Hugging Face Format

🚗 Evaluation

🔗 Citation

🙏 Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages