Update example scripts and README for improved clarity and organization #413

zijiexia · 2026-01-10T04:51:02Z

Summary

Add evaluations and W&B link for every examples .
Reorganized README.md into a table with W&B links and added the SWE-agent entry.
Standardized example scripts with clearer dataset download hints, W&B project names, and parameter tweaks for consistency.
Updated several run scripts for new defaults (paths, Ray GPU counts, rollout settings, and sglang CUDA graph usage) for easier reproduction.

Changes

README table layout + W&B links for each example (README.md).
W&B project naming normalized across example scripts (examples/**).
Default paths and runtime settings updated in multiple scripts (e.g., GPU counts, rollout length/temperature, CUDA graph disabling).

gemini-code-assist · 2026-01-10T04:51:27Z

Summary of Changes

Hello @zijiexia, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the clarity, organization, and reproducibility of the example scripts and documentation. By standardizing experiment tracking with Weights & Biases, restructuring the main examples README, and refining various script parameters and paths, the changes aim to provide users with a more streamlined and reliable experience when working with the provided examples.

Highlights

Documentation Enhancement: The examples/README.md has been significantly updated to a table format, now including direct links to Weights & Biases (W&B) runs for most examples, and a new entry for SWE-agent.
Standardized W&B Integration: All example scripts have been updated to consistently use W&B for experiment tracking, with standardized project names for better organization and traceability across different examples.
Improved Example Reproducibility: Key parameters and paths in various example run scripts have been adjusted, including dataset download hints, default model paths, Ray GPU allocations, and SGLang CUDA graph settings, to ensure easier and more reliable reproduction of results.
Script Cleanup and Optimization: Several run scripts now include pre-execution cleanup commands for SGLang and Ray, and some rollout parameters (like max-response-len and temperature) have been tuned for specific examples to improve performance or consistency.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a series of valuable updates to the example scripts and README, significantly improving their clarity, organization, and reproducibility. The changes include reorganizing the README into a table with W&B links, standardizing W&B project names across scripts, and updating various runtime parameters like GPU counts and CUDA graph usage for easier execution. I've found one minor redundancy in a cleanup script, which I've commented on. Overall, these are great improvements.

gemini-code-assist · 2026-01-10T04:52:32Z

examples/on_policy_distillation/run-qwen3-8B-opd.sh

+sleep 3
+pkill -9 ray
+pkill -9 python


This block of cleanup commands appears to be redundant. The pkill -9 ray and pkill -9 python commands are repeated. A single set of cleanup commands should be sufficient to ensure a clean environment.

zhaochenyang20 · 2026-01-12T23:59:01Z

examples/geo3k_vlm/run_geo3k_vlm.sh

   --rollout-num-gpus-per-engine 1
   --sglang-mem-fraction-static 0.6
-   --sglang-cuda-graph-bs 1 2 4 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160 168 176 184 192 200 208 216 224 232 240 248 256
+   --sglang-disable-cuda-graph


Why disable cuda graph here?

When CUDA Graph is enabled, VLM training runs into OOM after a certain number of steps. Reducing the maximum batch size does not resolve the issue, so I disabled CUDA Graph.

zhaochenyang20 · 2026-01-12T23:59:19Z

examples/geo3k_vlm_multi_turn/run_geo3k_vlm_multi_turn.py

-        "--sglang-mem-fraction-static 0.6 "
-        f"--sglang-cuda-graph-bs {' '.join(map(str, [1, 2, 4, 8] + list(range(16, 257, 8))))} "
-    )
+    sglang_args = "--rollout-num-gpus-per-engine 1 " "--sglang-mem-fraction-static 0.6 " "--sglang-disable-cuda-graph "


same, why disable cuda graph?

Same reason as above.

zhaochenyang20 · 2026-01-13T00:02:32Z

examples/README.md

+| **[DrGRPO](./DrGRPO)** | Custom reducer for Dr.GRPO algorithm. | |
+| **[eval](./eval)** | Documentation and setup for evaluation environments using NeMo-Skills. | [link](https://wandb.ai/zijie_xia-n-a/miles-eval) |
+| **[eval_multi_task](./eval_multi_task)** | Example for supporting OOD evaluation tasks, e.g., GPQA, IFBench. | [link](https://wandb.ai/zijie_xia-n-a/miles-eval-multi-task) |
+| **[formal_math](./formal_math)** | Examples related to formal math reasoning tasks, including a single round demo. | [link](https://wandb.ai/zijie_xia-n-a/miles-formal-math-run-minimal) |


This one has no rewards gain. Please double check it.

zhaochenyang20 · 2026-01-13T00:04:55Z

examples/README.md

+| **[geo3k_vlm](./geo3k_vlm)** | Training VLMs with FSDP on a single-turn reasoning task using GRPO on the GEO3K dataset. | [link](https://wandb.ai/zijie_xia-n-a/miles-geo3k-vlm) |
+| **[geo3k_vlm_multi_turn](./geo3k_vlm_multi_turn)** | VLM multi-turn training (FSDP backend) on Geo3k dataset. | [link](https://wandb.ai/zijie_xia-n-a/miles-geo3k-vlm-multi-turn) |


For geo 3k mutli-turn, I think both FSDP and Megatron converges in 2hour. Could you help check this?

https://github.com/THUDM/slime/pull/1378/changes

zhaochenyang20 · 2026-01-13T00:06:42Z

examples/README.md

+| **[fully_async](./fully_async)** | Demonstrates fully asynchronous rollout generation for higher efficiency. | [link](https://wandb.ai/zijie_xia-n-a/miles-fully-async) |
+| **[geo3k_vlm](./geo3k_vlm)** | Training VLMs with FSDP on a single-turn reasoning task using GRPO on the GEO3K dataset. | [link](https://wandb.ai/zijie_xia-n-a/miles-geo3k-vlm) |
+| **[geo3k_vlm_multi_turn](./geo3k_vlm_multi_turn)** | VLM multi-turn training (FSDP backend) on Geo3k dataset. | [link](https://wandb.ai/zijie_xia-n-a/miles-geo3k-vlm-multi-turn) |
+| **[low_precision](./low_precision)** | Examples of FP8 training and inference for improved throughput and stability. | [link](https://wandb.ai/zijie_xia-n-a/miles-low-precision) |


Could you check with the author whether this is as expected? @GeLee-Q

examples/README.md

zhaochenyang20 · 2026-01-13T00:11:51Z

examples/README.md

+| **[strands-agents](./strands-agents)** | Integration example with the Strands-Agents scaffolding framework. | [link](https://wandb.ai/zijie_xia-n-a/miles-strands-agents) |
+| **[swe-agent](./swe-agent)** | Example of SWE-agent training using Nvidia's Nemo-Gym and SWE-Gym. | [link](https://wandb.ai/zijie_xia-n-a/miles-swe-agent) |
+| **[tau-bench](./tau-bench)** | Training in an agentic multi-turn tool use environment (Tau-bench). | |
+| **[train_infer_mismatch_helper](./train_infer_mismatch_helper)** | Algorithmic methods for rollout correction (e.g., TIS, MIS). | [link](https://wandb.ai/zijie_xia-n-a/miles-train-infer-mismatch-helper) |


This one is empty.

updated the link to wandb: https://wandb.ai/zijie_xia-n-a/miles-train-infer-mismatch

examples/README.md

zhaochenyang20 · 2026-01-13T00:13:27Z

examples/README.md

+| **[swe-agent](./swe-agent)** | Example of SWE-agent training using Nvidia's Nemo-Gym and SWE-Gym. | [link](https://wandb.ai/zijie_xia-n-a/miles-swe-agent) |
+| **[tau-bench](./tau-bench)** | Training in an agentic multi-turn tool use environment (Tau-bench). | |
+| **[train_infer_mismatch_helper](./train_infer_mismatch_helper)** | Algorithmic methods for rollout correction (e.g., TIS, MIS). | [link](https://wandb.ai/zijie_xia-n-a/miles-train-infer-mismatch-helper) |
+| **[true_on_policy](./true_on_policy)** | Ensures strictly equal log probabilities between inference (SGLang) and training engines. | [link](https://wandb.ai/zijie_xia-n-a/miles-true-on-policy) |


ppo kl are always zero, but raw rewards are also 0. Does the raw reward follows our expectation?

zhaochenyang20 · 2026-01-13T00:14:12Z

examples/README.md

+| **[tau-bench](./tau-bench)** | Training in an agentic multi-turn tool use environment (Tau-bench). | |
+| **[train_infer_mismatch_helper](./train_infer_mismatch_helper)** | Algorithmic methods for rollout correction (e.g., TIS, MIS). | [link](https://wandb.ai/zijie_xia-n-a/miles-train-infer-mismatch-helper) |
+| **[true_on_policy](./true_on_policy)** | Ensures strictly equal log probabilities between inference (SGLang) and training engines. | [link](https://wandb.ai/zijie_xia-n-a/miles-true-on-policy) |
+| **[true_on_policy_vlm](./true_on_policy_vlm)** | "True On-Policy" training demonstration for VLM (Qwen3-VL). | |


This has not been provied by nan? @nanjiangwill

Update example scripts and README for improved clarity and organization

1db2b7b

fix

1181bca

gemini-code-assist bot reviewed Jan 10, 2026

View reviewed changes

zhaochenyang20 requested changes Jan 13, 2026

View reviewed changes

fix

e746448

zijiexia marked this pull request as draft January 13, 2026 18:59

		\| [geo3k_vlm](./geo3k_vlm) \| Training VLMs with FSDP on a single-turn reasoning task using GRPO on the GEO3K dataset. \| [link](https://wandb.ai/zijie_xia-n-a/miles-geo3k-vlm) \|
		\| [geo3k_vlm_multi_turn](./geo3k_vlm_multi_turn) \| VLM multi-turn training (FSDP backend) on Geo3k dataset. \| [link](https://wandb.ai/zijie_xia-n-a/miles-geo3k-vlm-multi-turn) \|

Update example scripts and README for improved clarity and organization #413

Are you sure you want to change the base?

Update example scripts and README for improved clarity and organization #413

Uh oh!

Conversation

zijiexia commented Jan 10, 2026

Summary

Changes

Uh oh!

gemini-code-assist bot commented Jan 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants