support transformers multi-modal grpo by hjh0119 · Pull Request #131 · modelscope/twinkle

hjh0119 · 2026-03-29T13:26:57Z

Summary

Support transformers-standard multimodal message format (content: List[Dict]) in template pipeline
Fix _check_max_length to only truncate input_ids/labels, keeping multimodal tensors intact
Clean Arrow-serialized None keys from content blocks to prevent Jinja template misparsing
Add multi_modal_data / mm_processor_kwargs pass-through in vLLM sampling pipeline
Forward encode() kwargs in LazyDataset for proper add_generation_prompt support
Add CLEVRProcessor, MultiModalAccuracyReward, multimodal GRPO demo

Changed files

template/base.py — _build_mm_messages supports both List[Dict] and legacy str content;
_apply_chat_template strips null keys; _check_max_length only truncates sequence fields
sampler/vllm_sampler/vllm_engine.py — sample() accepts multi_modal_data dict directly
sampler/vllm_sampler/vllm_sampler.py — extract multi_modal_data from message content blocks
for vLLM; add mm_processor_kwargs forwarding
dataset/lazy_dataset.py — forward encode() kwargs to batch_encode

gemini-code-assist

Code Review

This pull request introduces multimodal GRPO training capabilities, featuring a new training script for the Qwen3.5 VL model on the CLEVR dataset. Core framework enhancements include a new MultiModalAccuracyReward, a CLEVRProcessor, and a refactored vLLM sampler that better supports multimodal data and standard message formats. Review feedback identifies a logic error in the training loop where metrics are reset prematurely, as well as opportunities to optimize reward function instantiation and fix a potential typo in the reward computation.

cookbook/rl/mm_grpo.py

Resolved conflicts in template/base.py: - _check_max_length: adopt upstream's cleaner _truncate_feature approach - _process_mm_messages: merge our List[Dict] content support into upstream's refactored method structure Made-with: Cursor

refactor mm

fad818f

gemini-code-assist bot reviewed Mar 29, 2026

View reviewed changes

cookbook/rl/mm_grpo.py Show resolved Hide resolved

cookbook/rl/mm_grpo.py Show resolved Hide resolved

hjh0119 added 2 commits March 30, 2026 16:09

merge main

ed91e72

tastelikefeet approved these changes Apr 1, 2026

View reviewed changes

tastelikefeet merged commit a222914 into modelscope:main Apr 1, 2026
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support transformers multi-modal grpo#131

support transformers multi-modal grpo#131
tastelikefeet merged 3 commits intomodelscope:mainfrom
hjh0119:refactor-mm

hjh0119 commented Mar 29, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hjh0119 commented Mar 29, 2026

Summary

Changed files

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants