Skip to content

fix short math grpo cookbook#149

Merged
Yunnglin merged 7 commits intomainfrom
update_cookbook_0410
Apr 10, 2026
Merged

fix short math grpo cookbook#149
Yunnglin merged 7 commits intomainfrom
update_cookbook_0410

Conversation

@Yunnglin
Copy link
Copy Markdown
Collaborator

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request transitions several math training examples to use the GSM8K dataset and GRPO, incorporating a new brevity reward and SwanLab logging. Key improvements include fixing a potential crash in the template base class when handling multi-modal token types as numpy arrays and correcting a logic error in how default padding values are applied. Additionally, feedback was provided to ensure consistency in the 'enable_thinking' configuration across different cookbook implementations.

Yunnglin and others added 6 commits April 10, 2026 16:56
@Yunnglin Yunnglin marked this pull request as ready for review April 10, 2026 09:52
Copilot AI review requested due to automatic review settings April 10, 2026 09:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the “short math GRPO” cookbooks and related serving/config helpers, aligning examples with the newer twinkle.* APIs and GSM8K-focused reward/preprocessing while making small robustness tweaks in the template layer.

Changes:

  • Make Template.concat_input_feature() more tolerant of non-tensor mm_token_type_ids, and adjust _apply_chat_template() defaults via processor_kwargs.
  • Revise GRPO cookbook examples to use GSM8K processing + a brevity reward, update imports to twinkle modules, and add SwanLab logging in the Twinkle self-host example.
  • Update Megatron cookbook launch/config: run server via python -m twinkle.server, add save-dir/server-config flags, and extend sampler engine args.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/twinkle/template/base.py Robustifies mm token-type concatenation and adjusts chat-template padding defaults via processor_kwargs.
src/twinkle/server/model/tinker_handlers.py Adds a TODO note around LoRA config flexibility.
cookbook/client/twinkle/self_host/short_math_grpo.py Switches to GSM8K processor + brevity reward; adds SwanLab logging; updates imports to twinkle.*.
cookbook/client/twinkle/self_host/self_congnition.py Updates imports from twinkle_client.* to twinkle.*.
cookbook/client/twinkle/modelscope/self_congnition.py Updates imports from twinkle_client.* to twinkle.*.
cookbook/client/tinker/self_host/short_math_grpo.py Uses Qwen3_5Template explicitly and tweaks training config defaults.
cookbook/client/tinker/modelscope/short_math_grpo.py Reworks example from “Math” to GSM8K with brevity reward + GSM8K processor and template updates.
cookbook/client/server/megatron/server.py Removes legacy launcher script (replaced by module CLI usage in run.sh).
cookbook/client/server/megatron/server_config.yaml Adds enable_tower_connector_lora to sampler engine args.
cookbook/client/server/megatron/server_config_4b.yaml Adjusts sampler nproc_per_node and adds enable_tower_connector_lora.
cookbook/client/server/megatron/run.sh Adds save-dir/server-config flags, exports TWINKLE_DEFAULT_SAVE_DIR, switches to python -m twinkle.server, and expands cleanup logic.
Comments suppressed due to low confidence (1)

cookbook/client/twinkle/self_host/short_math_grpo.py:125

  • swanlab.login(api_key=os.environ.get('SWANLAB_API_KEY', '')) will attempt to log in with an empty API key when the env var is missing, which is likely to fail with a confusing error and prevents the script from running with USE_SWANLAB=True. Consider either (a) requiring the env var (like other cookbook scripts) and raising a clear error, or (b) skipping SwanLab initialization when the key is absent.

@Yunnglin Yunnglin merged commit c713631 into main Apr 10, 2026
6 of 8 checks passed
@Yunnglin Yunnglin deleted the update_cookbook_0410 branch April 10, 2026 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants