Skip to content

HJYao00/R1-ShareVL

Repository files navigation

R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO

If you find this project useful, please give us a star🌟.

🎙️ News

💡 About R1-ShareVL

We introduce Share-GRPO, a novel reinforcement learning framework for MLLMs that addresses the challenges of sparse rewards and advantage vanishing in reasoning tasks. For a given question, Share-GRPO first applies semantically consistent transformations to generate a set of varied but semantically equivalent questions, thereby expanding the question space. It then encourages the MLLM to explore diverse reasoning paths across this expanded space and facilitates the sharing of discovered reasoning trajectories and their rewards among these question variants during the reinforcement learning process. This approach enables more effective exploration, denser reward signals, and more robust training.

image

🚀 Training

Installation

git clone https://github.com/HJYao00/R1-ShareVL.git
cd R1-ShareVL
pip install -e .

GRPO Training

bash examples/qwen2_5_vl_7b_sharegrpo.sh

Merge Checkpoint in Hugging Face Format

python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/exp_name/global_step_1/actor

🚗 Evaluation

We evaluate R1-ShareVL using VLMEvalKit! Please make sure to include a thinking prompt after the question here.

R1_PROMPT = r"""You FIRST think about the reasoning process as an internal monologue and then provide the final answer.
 The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \boxed{}."""

item = {'type': 'text', 'text': s['value'] + " " + R1_PROMPT}

🔗 Citation

If you find this repository is useful, please star🌟 this repo and cite🖇️ our paper.

@misc{yao2025r1sharevl,
      title={R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO}, 
      author={Huanjin Yao and Qixiang Yin and Jingyi Zhang and Min Yang and Yibo Wang and Wenhao Wu and Fei Su and Li Shen and Minghui Qiu and Dacheng Tao and Jiaxing Huang},
      year={2025},
      eprint={2505.16673},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
}

🙏 Acknowledgment

Our work is primarily based on the following codebases. We are sincerely grateful for their work.

  • EasyR1: We use EasyR1 to fine-tune R1-ShareVL Models.
  • VLMEvalKit: We use VLMEvalKit for evaluation.

About

[NeurIPS 2025] Reasoning MLLM, Share-GRPO, advantage vanishing, sparse reward

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages