Skip to content

Conversation

@zijiexia
Copy link
Contributor

Summary

  • Add evaluations and W&B link for every examples .
  • Reorganized README.md into a table with W&B links and added the SWE-agent entry.
  • Standardized example scripts with clearer dataset download hints, W&B project names, and parameter tweaks for consistency.
  • Updated several run scripts for new defaults (paths, Ray GPU counts, rollout settings, and sglang CUDA graph usage) for easier reproduction.

Changes

  • README table layout + W&B links for each example (README.md).
  • W&B project naming normalized across example scripts (examples/**).
  • Default paths and runtime settings updated in multiple scripts (e.g., GPU counts, rollout length/temperature, CUDA graph disabling).

cc: @zhaochenyang20

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @zijiexia, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the clarity, organization, and reproducibility of the example scripts and documentation. By standardizing experiment tracking with Weights & Biases, restructuring the main examples README, and refining various script parameters and paths, the changes aim to provide users with a more streamlined and reliable experience when working with the provided examples.

Highlights

  • Documentation Enhancement: The examples/README.md has been significantly updated to a table format, now including direct links to Weights & Biases (W&B) runs for most examples, and a new entry for SWE-agent.
  • Standardized W&B Integration: All example scripts have been updated to consistently use W&B for experiment tracking, with standardized project names for better organization and traceability across different examples.
  • Improved Example Reproducibility: Key parameters and paths in various example run scripts have been adjusted, including dataset download hints, default model paths, Ray GPU allocations, and SGLang CUDA graph settings, to ensure easier and more reliable reproduction of results.
  • Script Cleanup and Optimization: Several run scripts now include pre-execution cleanup commands for SGLang and Ray, and some rollout parameters (like max-response-len and temperature) have been tuned for specific examples to improve performance or consistency.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a series of valuable updates to the example scripts and README, significantly improving their clarity, organization, and reproducibility. The changes include reorganizing the README into a table with W&B links, standardizing W&B project names across scripts, and updating various runtime parameters like GPU counts and CUDA graph usage for easier execution. I've found one minor redundancy in a cleanup script, which I've commented on. Overall, these are great improvements.

Comment on lines +10 to +12
sleep 3
pkill -9 ray
pkill -9 python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of cleanup commands appears to be redundant. The pkill -9 ray and pkill -9 python commands are repeated. A single set of cleanup commands should be sufficient to ensure a clean environment.

--rollout-num-gpus-per-engine 1
--sglang-mem-fraction-static 0.6
--sglang-cuda-graph-bs 1 2 4 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160 168 176 184 192 200 208 216 224 232 240 248 256
--sglang-disable-cuda-graph
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why disable cuda graph here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When CUDA Graph is enabled, VLM training runs into OOM after a certain number of steps. Reducing the maximum batch size does not resolve the issue, so I disabled CUDA Graph.

"--sglang-mem-fraction-static 0.6 "
f"--sglang-cuda-graph-bs {' '.join(map(str, [1, 2, 4, 8] + list(range(16, 257, 8))))} "
)
sglang_args = "--rollout-num-gpus-per-engine 1 " "--sglang-mem-fraction-static 0.6 " "--sglang-disable-cuda-graph "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, why disable cuda graph?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as above.

| **[DrGRPO](./DrGRPO)** | Custom reducer for Dr.GRPO algorithm. | |
| **[eval](./eval)** | Documentation and setup for evaluation environments using NeMo-Skills. | [link](https://wandb.ai/zijie_xia-n-a/miles-eval) |
| **[eval_multi_task](./eval_multi_task)** | Example for supporting OOD evaluation tasks, e.g., GPQA, IFBench. | [link](https://wandb.ai/zijie_xia-n-a/miles-eval-multi-task) |
| **[formal_math](./formal_math)** | Examples related to formal math reasoning tasks, including a single round demo. | [link](https://wandb.ai/zijie_xia-n-a/miles-formal-math-run-minimal) |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one has no rewards gain. Please double check it.

Comment on lines +14 to +15
| **[geo3k_vlm](./geo3k_vlm)** | Training VLMs with FSDP on a single-turn reasoning task using GRPO on the GEO3K dataset. | [link](https://wandb.ai/zijie_xia-n-a/miles-geo3k-vlm) |
| **[geo3k_vlm_multi_turn](./geo3k_vlm_multi_turn)** | VLM multi-turn training (FSDP backend) on Geo3k dataset. | [link](https://wandb.ai/zijie_xia-n-a/miles-geo3k-vlm-multi-turn) |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For geo 3k mutli-turn, I think both FSDP and Megatron converges in 2hour. Could you help check this?

https://github.com/THUDM/slime/pull/1378/changes

| **[fully_async](./fully_async)** | Demonstrates fully asynchronous rollout generation for higher efficiency. | [link](https://wandb.ai/zijie_xia-n-a/miles-fully-async) |
| **[geo3k_vlm](./geo3k_vlm)** | Training VLMs with FSDP on a single-turn reasoning task using GRPO on the GEO3K dataset. | [link](https://wandb.ai/zijie_xia-n-a/miles-geo3k-vlm) |
| **[geo3k_vlm_multi_turn](./geo3k_vlm_multi_turn)** | VLM multi-turn training (FSDP backend) on Geo3k dataset. | [link](https://wandb.ai/zijie_xia-n-a/miles-geo3k-vlm-multi-turn) |
| **[low_precision](./low_precision)** | Examples of FP8 training and inference for improved throughput and stability. | [link](https://wandb.ai/zijie_xia-n-a/miles-low-precision) |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check with the author whether this is as expected? @GeLee-Q

| **[strands-agents](./strands-agents)** | Integration example with the Strands-Agents scaffolding framework. | [link](https://wandb.ai/zijie_xia-n-a/miles-strands-agents) |
| **[swe-agent](./swe-agent)** | Example of SWE-agent training using Nvidia's Nemo-Gym and SWE-Gym. | [link](https://wandb.ai/zijie_xia-n-a/miles-swe-agent) |
| **[tau-bench](./tau-bench)** | Training in an agentic multi-turn tool use environment (Tau-bench). | |
| **[train_infer_mismatch_helper](./train_infer_mismatch_helper)** | Algorithmic methods for rollout correction (e.g., TIS, MIS). | [link](https://wandb.ai/zijie_xia-n-a/miles-train-infer-mismatch-helper) |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| **[swe-agent](./swe-agent)** | Example of SWE-agent training using Nvidia's Nemo-Gym and SWE-Gym. | [link](https://wandb.ai/zijie_xia-n-a/miles-swe-agent) |
| **[tau-bench](./tau-bench)** | Training in an agentic multi-turn tool use environment (Tau-bench). | |
| **[train_infer_mismatch_helper](./train_infer_mismatch_helper)** | Algorithmic methods for rollout correction (e.g., TIS, MIS). | [link](https://wandb.ai/zijie_xia-n-a/miles-train-infer-mismatch-helper) |
| **[true_on_policy](./true_on_policy)** | Ensures strictly equal log probabilities between inference (SGLang) and training engines. | [link](https://wandb.ai/zijie_xia-n-a/miles-true-on-policy) |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ppo kl are always zero, but raw rewards are also 0. Does the raw reward follows our expectation?

| **[tau-bench](./tau-bench)** | Training in an agentic multi-turn tool use environment (Tau-bench). | |
| **[train_infer_mismatch_helper](./train_infer_mismatch_helper)** | Algorithmic methods for rollout correction (e.g., TIS, MIS). | [link](https://wandb.ai/zijie_xia-n-a/miles-train-infer-mismatch-helper) |
| **[true_on_policy](./true_on_policy)** | Ensures strictly equal log probabilities between inference (SGLang) and training engines. | [link](https://wandb.ai/zijie_xia-n-a/miles-true-on-policy) |
| **[true_on_policy_vlm](./true_on_policy_vlm)** | "True On-Policy" training demonstration for VLM (Qwen3-VL). | |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not been provied by nan? @nanjiangwill

@zijiexia zijiexia marked this pull request as draft January 13, 2026 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants