Fix bugs by tastelikefeet · Pull Request #139 · modelscope/twinkle

tastelikefeet · 2026-04-04T08:16:04Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

gemini-code-assist

Code Review

This pull request introduces a GRPO training script for the GSM8K dataset using Ray, incorporating a brevity reward mechanism to promote concise reasoning. It also adds a patch to the Megatron PEFT implementation to disable merge checks. Key feedback includes correcting the log probability extraction logic for vLLM outputs, moving initialization calls into the main function to prevent side effects, and optimizing performance by instantiating reward functions outside the training loop. Additionally, the commented-out LoRA adapter configuration should be addressed to ensure the intended training behavior.

cookbook/rl/short_math_grpo.py

wip

1e4f53e

gemini-code-assist bot reviewed Apr 4, 2026

View reviewed changes

cookbook/rl/short_math_grpo.py Show resolved Hide resolved

cookbook/rl/short_math_grpo.py Show resolved Hide resolved

cookbook/rl/short_math_grpo.py Show resolved Hide resolved

cookbook/rl/short_math_grpo.py Outdated Show resolved Hide resolved

tastelikefeet added 4 commits April 4, 2026 23:54

wip

d07d01b

wip

936a8e3

lint

d4bcbdd

fix

4158ccf

hjh0119 approved these changes Apr 5, 2026

View reviewed changes

tastelikefeet merged commit 01dc476 into modelscope:main Apr 5, 2026
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bugs#139

Fix bugs#139
tastelikefeet merged 5 commits intomodelscope:mainfrom
tastelikefeet:fix/0404-1

tastelikefeet commented Apr 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tastelikefeet commented Apr 4, 2026

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants