Conversation
There was a problem hiding this comment.
Code Review
This pull request transitions several math training examples to use the GSM8K dataset and GRPO, incorporating a new brevity reward and SwanLab logging. Key improvements include fixing a potential crash in the template base class when handling multi-modal token types as numpy arrays and correcting a logic error in how default padding values are applied. Additionally, feedback was provided to ensure consistency in the 'enable_thinking' configuration across different cookbook implementations.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the “short math GRPO” cookbooks and related serving/config helpers, aligning examples with the newer twinkle.* APIs and GSM8K-focused reward/preprocessing while making small robustness tweaks in the template layer.
Changes:
- Make
Template.concat_input_feature()more tolerant of non-tensormm_token_type_ids, and adjust_apply_chat_template()defaults viaprocessor_kwargs. - Revise GRPO cookbook examples to use GSM8K processing + a brevity reward, update imports to
twinklemodules, and add SwanLab logging in the Twinkle self-host example. - Update Megatron cookbook launch/config: run server via
python -m twinkle.server, add save-dir/server-config flags, and extend sampler engine args.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/twinkle/template/base.py | Robustifies mm token-type concatenation and adjusts chat-template padding defaults via processor_kwargs. |
| src/twinkle/server/model/tinker_handlers.py | Adds a TODO note around LoRA config flexibility. |
| cookbook/client/twinkle/self_host/short_math_grpo.py | Switches to GSM8K processor + brevity reward; adds SwanLab logging; updates imports to twinkle.*. |
| cookbook/client/twinkle/self_host/self_congnition.py | Updates imports from twinkle_client.* to twinkle.*. |
| cookbook/client/twinkle/modelscope/self_congnition.py | Updates imports from twinkle_client.* to twinkle.*. |
| cookbook/client/tinker/self_host/short_math_grpo.py | Uses Qwen3_5Template explicitly and tweaks training config defaults. |
| cookbook/client/tinker/modelscope/short_math_grpo.py | Reworks example from “Math” to GSM8K with brevity reward + GSM8K processor and template updates. |
| cookbook/client/server/megatron/server.py | Removes legacy launcher script (replaced by module CLI usage in run.sh). |
| cookbook/client/server/megatron/server_config.yaml | Adds enable_tower_connector_lora to sampler engine args. |
| cookbook/client/server/megatron/server_config_4b.yaml | Adjusts sampler nproc_per_node and adds enable_tower_connector_lora. |
| cookbook/client/server/megatron/run.sh | Adds save-dir/server-config flags, exports TWINKLE_DEFAULT_SAVE_DIR, switches to python -m twinkle.server, and expands cleanup logic. |
Comments suppressed due to low confidence (1)
cookbook/client/twinkle/self_host/short_math_grpo.py:125
swanlab.login(api_key=os.environ.get('SWANLAB_API_KEY', ''))will attempt to log in with an empty API key when the env var is missing, which is likely to fail with a confusing error and prevents the script from running withUSE_SWANLAB=True. Consider either (a) requiring the env var (like other cookbook scripts) and raising a clear error, or (b) skipping SwanLab initialization when the key is absent.
PR type
PR information
Write the detail information belongs to this PR.
Experiment results
Paste your experiment result here(if needed).