fix short math grpo cookbook by Yunnglin · Pull Request #149 · modelscope/twinkle

Yunnglin · 2026-04-10T08:26:11Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

gemini-code-assist

Code Review

This pull request transitions several math training examples to use the GSM8K dataset and GRPO, incorporating a new brevity reward and SwanLab logging. Key improvements include fixing a potential crash in the template base class when handling multi-modal token types as numpy arrays and correcting a logic error in how default padding values are applied. Additionally, feedback was provided to ensure consistency in the 'enable_thinking' configuration across different cookbook implementations.

src/twinkle/template/base.py

cookbook/client/twinkle/self_host/short_math_grpo.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Copilot

Pull request overview

This PR updates the “short math GRPO” cookbooks and related serving/config helpers, aligning examples with the newer twinkle.* APIs and GSM8K-focused reward/preprocessing while making small robustness tweaks in the template layer.

Changes:

Make Template.concat_input_feature() more tolerant of non-tensor mm_token_type_ids, and adjust _apply_chat_template() defaults via processor_kwargs.
Revise GRPO cookbook examples to use GSM8K processing + a brevity reward, update imports to twinkle modules, and add SwanLab logging in the Twinkle self-host example.
Update Megatron cookbook launch/config: run server via python -m twinkle.server, add save-dir/server-config flags, and extend sampler engine args.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/twinkle/template/base.py	Robustifies mm token-type concatenation and adjusts chat-template padding defaults via `processor_kwargs`.
src/twinkle/server/model/tinker_handlers.py	Adds a TODO note around LoRA config flexibility.
cookbook/client/twinkle/self_host/short_math_grpo.py	Switches to GSM8K processor + brevity reward; adds SwanLab logging; updates imports to `twinkle.*`.
cookbook/client/twinkle/self_host/self_congnition.py	Updates imports from `twinkle_client.` to `twinkle.`.
cookbook/client/twinkle/modelscope/self_congnition.py	Updates imports from `twinkle_client.` to `twinkle.`.
cookbook/client/tinker/self_host/short_math_grpo.py	Uses `Qwen3_5Template` explicitly and tweaks training config defaults.
cookbook/client/tinker/modelscope/short_math_grpo.py	Reworks example from “Math” to GSM8K with brevity reward + GSM8K processor and template updates.
cookbook/client/server/megatron/server.py	Removes legacy launcher script (replaced by module CLI usage in `run.sh`).
cookbook/client/server/megatron/server_config.yaml	Adds `enable_tower_connector_lora` to sampler engine args.
cookbook/client/server/megatron/server_config_4b.yaml	Adjusts sampler `nproc_per_node` and adds `enable_tower_connector_lora`.
cookbook/client/server/megatron/run.sh	Adds save-dir/server-config flags, exports `TWINKLE_DEFAULT_SAVE_DIR`, switches to `python -m twinkle.server`, and expands cleanup logic.

Comments suppressed due to low confidence (1)

cookbook/client/twinkle/self_host/short_math_grpo.py:125

swanlab.login(api_key=os.environ.get('SWANLAB_API_KEY', '')) will attempt to log in with an empty API key when the env var is missing, which is likely to fail with a confusing error and prevents the script from running with USE_SWANLAB=True. Consider either (a) requiring the env var (like other cookbook scripts) and raising a clear error, or (b) skipping SwanLab initialization when the key is absent.

src/twinkle/template/base.py

cookbook/client/server/megatron/run.sh

update short math grpo

d722128

gemini-code-assist bot reviewed Apr 10, 2026

View reviewed changes

src/twinkle/template/base.py Outdated Show resolved Hide resolved

src/twinkle/template/base.py Show resolved Hide resolved

cookbook/client/twinkle/self_host/short_math_grpo.py Show resolved Hide resolved

Yunnglin and others added 6 commits April 10, 2026 16:56

update run.sh

fe28585

update run.sh

641c179

update run.sh

0f4f46f

Merge branch 'main' into update_cookbook_0410

53fde24

update

66b8b02

Update src/twinkle/template/base.py

6645211

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Yunnglin marked this pull request as ready for review April 10, 2026 09:52

Copilot AI review requested due to automatic review settings April 10, 2026 09:52

Copilot started reviewing on behalf of Yunnglin April 10, 2026 09:52 View session

tastelikefeet approved these changes Apr 10, 2026

View reviewed changes

Copilot AI reviewed Apr 10, 2026

View reviewed changes

src/twinkle/template/base.py Show resolved Hide resolved

cookbook/client/server/megatron/run.sh Show resolved Hide resolved

hjh0119 approved these changes Apr 10, 2026

View reviewed changes

Yunnglin merged commit c713631 into main Apr 10, 2026
6 of 8 checks passed

Yunnglin deleted the update_cookbook_0410 branch April 10, 2026 10:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix short math grpo cookbook#149

fix short math grpo cookbook#149
Yunnglin merged 7 commits intomainfrom
update_cookbook_0410

Yunnglin commented Apr 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Yunnglin commented Apr 10, 2026

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants