Skip to content

Update sample#50

Merged
Yunnglin merged 9 commits intodevfrom
update_sample
Feb 11, 2026
Merged

Update sample#50
Yunnglin merged 9 commits intodevfrom
update_sample

Conversation

@Yunnglin
Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings February 11, 2026 02:56
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Yunnglin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on a significant upgrade of the underlying large language model from a 3B to a 7B Qwen variant, impacting multiple training and sampling examples. It also includes a refactoring effort to reorganize client-side scripts and integrates a new experiment tracking system. Furthermore, the changes refine how training metrics are collected and reported within the server infrastructure, and introduce updated server configurations to support the new model.

Highlights

  • Model Upgrade: The base model used across various client scripts and server configurations has been upgraded from Qwen/Qwen2.5-3B-Instruct to Qwen/Qwen2.5-7B-Instruct to leverage a more capable model.
  • File Restructuring and Renames: Several Python scripts related to GRPO, LoRA, and sampling have been moved and renamed within the cookbook/client/tinker/ and cookbook/client/twinkle/ directories, streamlining the project structure.
  • Experiment Tracking Integration: SwanLab experiment tracking has been integrated into the gsm8k_grpo.py script, enabling better monitoring and logging of training metrics.
  • GRPO Training Parameter Tuning: Key training parameters for the GRPO example, such as MAX_NEW_TOKENS, LEARNING_RATE, and BATCH_SIZE, have been adjusted in gsm8k_grpo.py for potentially improved performance or stability.
  • Enhanced Metric Collection: Improved logic for collecting and cleaning metrics from optimization steps has been implemented across the Tinker server components, ensuring more accurate and usable metric reporting.
  • Server Configuration Updates: A new server configuration file (server_config_7b.yaml) has been added for the 7B model, and the default HTTP port in server_config.yaml was changed from 8000 to 9000.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • cookbook/client/tinker/grpo.py
    • Renamed file from cookbook/client/tinker/transformer/grpo.py.
    • Updated the BASE_MODEL configuration from Qwen/Qwen2.5-3B-Instruct to Qwen/Qwen2.5-7B-Instruct.
  • cookbook/client/tinker/gsm8k_grpo.py
    • Renamed file from cookbook/client/tinker/transformer/gsm8k.py.
    • Imported the os module.
    • Updated the BASE_MODEL configuration from Qwen/Qwen2.5-3B-Instruct to Qwen/Qwen2.5-7B-Instruct.
    • Adjusted MAX_NEW_TOKENS from 2048 to 1024, LEARNING_RATE from 1e-5 to 1e-4, and BATCH_SIZE from 2 to 4.
    • Integrated SwanLab for experiment tracking, including login and initialization with project and model configuration.
    • Added a weights field to the loss_fn_inputs in types.Datum.
    • Simplified the metric calculation and logging process by directly updating log_dict with optim_result.metrics.
  • cookbook/client/tinker/lora.py
    • Renamed file from cookbook/client/tinker/transformer/lora.py.
    • Updated references to the base model in comments and the base_model variable from Qwen/Qwen2.5-3B-Instruct to Qwen/Qwen2.5-7B-Instruct.
  • cookbook/client/tinker/megatron/server_config.yaml
    • Changed the HTTP listener port from 8000 to 9000.
  • cookbook/client/tinker/megatron/server_config_7b.yaml
    • Added a new server configuration file tailored for the Qwen2.5-7B-Instruct model, including model and sampler service definitions.
  • cookbook/client/tinker/sample.py
    • Renamed file from cookbook/client/tinker/transformer/sample.py.
    • Updated the base_model configuration from Qwen/Qwen2.5-3B-Instruct to Qwen/Qwen2.5-7B-Instruct.
  • cookbook/client/tinker/self_congnition.py
    • Renamed file from cookbook/client/tinker/transformer/self_congnition.py.
    • Updated the base_model configuration from Qwen/Qwen2.5-3B-Instruct to Qwen/Qwen2.5-7B-Instruct.
  • cookbook/client/tinker/transformer/server.py
    • Removed commented-out os.environ['RAY_DEBUG'] = '1' line.
  • cookbook/client/tinker/transformer/server_config.yaml
    • Updated the model service configuration to use models-Qwen2.5-7B-Instruct and model_id: "ms://Qwen/Qwen2.5-7B-Instruct".
    • Added max_length: 10240 to the model service arguments.
    • Adjusted device_mesh configuration to use dp_size: 2 instead of mesh: [0, 1] and mesh_dim_names: ['dp'].
    • Updated tps_limit for the model service from 10000 to 100000.
    • Updated the sampler service configuration to use sampler-Qwen2.5-7B-Instruct and model_id: "ms://Qwen/Qwen2.5-7B-Instruct".
    • Increased nproc_per_node for the sampler service from 1 to 2.
    • Updated ranks for the sampler service from [0] to [2].
    • Added logprobs_mode: processed_logprobs to sampler engine_args.
    • Adjusted device_mesh configuration for the sampler service to use dp_size: 1.
    • Added queue_config with rps_limit and tps_limit to the sampler service.
    • Added runtime_env with TWINKLE_TRUST_REMOTE_CODE and DEVICE_COUNT_PER_PHYSICAL_NODE environment variables to both model and sampler service deployments.
  • cookbook/client/twinkle/grpo.py
    • Renamed file from cookbook/client/twinkle/transformer/grpo.py.
  • cookbook/client/twinkle/sample.py
    • Renamed file from cookbook/client/twinkle/transformer/sampler.py.
  • cookbook/client/twinkle/self_congnition.py
    • Renamed file from cookbook/client/twinkle/transformer/lora.py.
  • src/twinkle/infra/_ray/resource_manager.py
    • Added a commented-out breakpoint() in the __init__ method.
  • src/twinkle/loss/grpo.py
    • Added a commented-out breakpoint() in the __call__ method.
  • src/twinkle/model/transformers/multi_lora_transformers.py
    • Added a new remote function calculate_metric to compute and return metrics.
  • src/twinkle/server/tinker/common/megatron_model.py
    • Added a _clean_metrics helper function to parse string-based metric values into floats.
    • Implemented a calculate_metric remote function that uses _clean_metrics to process results.
  • src/twinkle/server/tinker/common/transformers_model.py
    • Added a _clean_metrics helper function to parse string-based metric values into floats.
    • Added a commented-out breakpoint() in the forward_backward method.
    • Implemented a calculate_metric remote function that uses _clean_metrics to process results.
  • src/twinkle/server/tinker/model.py
    • Modified the _do_optim asynchronous function to call self.model.calculate_metric and return the collected metrics in the OptimStepResponse.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates several sample scripts and server configurations, primarily to switch from a 3B to a 7B parameter model. It also introduces experiment tracking using SwanLab and refactors some metric calculation logic. The changes are generally positive, improving the examples and adding useful features. My review focuses on improving code robustness, removing leftover debugging code, and reducing code duplication for better maintainability.

Comment on lines +63 to +66
swanlab.login(api_key=os.environ['SWANLAB_API_KEY'])
swanlab.init(project="twinkle-gsm8k", config={
'model_id': BASE_MODEL,
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Accessing os.environ['SWANLAB_API_KEY'] directly will raise a KeyError if the environment variable is not set, causing the script to crash. It's safer to use os.environ.get() and handle the case where the key is missing by raising a more informative error.

    api_key = os.environ.get('SWANLAB_API_KEY')
    if not api_key:
        raise ValueError("SWANLAB_API_KEY environment variable not set, but USE_SWANLAB is True.")
    swanlab.login(api_key=api_key)
    swanlab.init(project="twinkle-gsm8k", config={
        'model_id': BASE_MODEL,
    })

self.min_node_idx = 0
self.nnodes = math.ceil(cpu_proc_count / ncpu_proc_per_node)

# breakpoint()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This breakpoint() seems to be a leftover from debugging and should be removed.

Returns:
loss: Scalar loss value
"""
# breakpoint()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This breakpoint() seems to be a leftover from debugging and should be removed.

# Convert Datum to InputFeature
input_features = datum_to_input_feature(inputs, template)

# breakpoint()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This breakpoint() seems to be a leftover from debugging and should be removed.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Tinker-compatible training flow to return training metrics from optim_step, adds server-side metric cleaning for compatibility, and refreshes cookbook examples/configs (notably moving several examples to Qwen2.5-7B-Instruct and adding GRPO demos).

Changes:

  • Return metrics in /optim_step responses by calling calculate_metric() after optimizer steps.
  • Add calculate_metric() wrappers in Tinker-compat model adapters and introduce metric “cleaning” for numeric logging.
  • Add/update cookbook client examples and server configs (Twinkle + Tinker, Transformers + Megatron).

Reviewed changes

Copilot reviewed 13 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/twinkle/server/tinker/model.py Returns metrics from optim_step by calling calculate_metric() post-step.
src/twinkle/server/tinker/common/transformers_model.py Adds _clean_metrics and a calculate_metric() remote wrapper; includes a debug artifact.
src/twinkle/server/tinker/common/megatron_model.py Adds _clean_metrics and a calculate_metric() remote wrapper.
src/twinkle/model/transformers/multi_lora_transformers.py Exposes calculate_metric() as a remote function with adapter validation.
src/twinkle/loss/grpo.py Adds a commented debug hook in GRPO loss.
src/twinkle/infra/_ray/resource_manager.py Adds a commented debug hook during resource manager init.
cookbook/client/twinkle/self_congnition.py New Twinkle client LoRA training example script.
cookbook/client/twinkle/sample.py New Twinkle client inference/sampler example script.
cookbook/client/twinkle/grpo.py New Twinkle client GRPO training example script.
cookbook/client/tinker/transformer/server_config.yaml Updates transformer backend server config to Qwen2.5-7B-Instruct and related settings.
cookbook/client/tinker/transformer/server.py Removes commented Ray debug env var setup.
cookbook/client/tinker/self_congnition.py Updates example base model to Qwen2.5-7B-Instruct.
cookbook/client/tinker/sample.py Updates example base model to Qwen2.5-7B-Instruct.
cookbook/client/tinker/megatron/server_config_7b.yaml Adds a new Megatron 7B server config example.
cookbook/client/tinker/megatron/server_config.yaml Changes Megatron example server port to 9000.
cookbook/client/tinker/lora.py Updates example base model and commented resume path to 7B.
cookbook/client/tinker/gsm8k_grpo.py Updates GSM8K GRPO example (incl. SwanLab logging and using returned optim metrics).
cookbook/client/tinker/grpo.py Updates example base model to Qwen2.5-7B-Instruct.
Comments suppressed due to low confidence (2)

cookbook/client/tinker/gsm8k_grpo.py:64

  • With USE_SWANLAB = True by default, this script will crash with KeyError if SWANLAB_API_KEY is not set. Consider defaulting USE_SWANLAB to False, or using os.getenv + a guard that logs a warning and disables SwanLab when the key is missing.
    cookbook/client/tinker/gsm8k_grpo.py:388
  • Variable fwdbwd_result is not used.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover debug artifact: the commented # breakpoint() should be removed from the GRPO loss implementation before merging (it adds noise to a hot path and suggests interactive debugging in library code).

Suggested change

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover debug artifact: the commented # breakpoint() should be removed from ResourceManager initialization; keeping debugger hooks (even commented) in infra code makes future debugging harder and can slip into production changes.

Suggested change

Copilot uses AI. Check for mistakes.
@Yunnglin Yunnglin merged commit 8dec1d5 into dev Feb 11, 2026
0 of 4 checks passed
@tastelikefeet tastelikefeet deleted the update_sample branch February 13, 2026 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants