Conversation
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
📝 WalkthroughWalkthroughConfigures Nemotron 3 Nano model pretraining across various hardware platforms by enabling MoE router load balancing, adjusting batch size parameters across GPU variants (GB300/B300 and GB200/B200), adding CUDA graph optimization scope definitions, and disabling cuDNN LayerNorm environment variable removal for this recipe. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py (1)
75-75: Redundantcuda_graph_imploverride can be removed.Line 75 repeats the same
cuda_graph_implalready defined inBASE_NEMOTRON_3_NANO_CONFIG(Line 38). Dropping it here would reduce config drift risk.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py` at line 75, Remove the redundant cuda_graph_impl override in the specific workload config: delete the line setting cuda_graph_impl="transformer_engine" in the nemotron_3_nano workload so it inherits the value from BASE_NEMOTRON_3_NANO_CONFIG instead of shadowing it, ensuring the config no longer duplicates the same setting and reducing config drift risk.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py`:
- Line 39: Change the type annotation for the cuda_graph_scope variable in
scripts/performance/utils/utils.py from Optional[str] to allow both strings and
lists by using the union type str | list[str] | None; update every occurrence of
the cuda_graph_scope annotation (the variable named cuda_graph_scope and any
function signatures or defaults referencing it) so the annotation matches actual
usage (accepting values like "full_iteration" or ["attn","mamba","moe_router"]).
Ensure imports or typing usage remain valid for the project's Python version
(use PEP 604 union syntax).
---
Nitpick comments:
In
`@scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py`:
- Line 75: Remove the redundant cuda_graph_impl override in the specific
workload config: delete the line setting cuda_graph_impl="transformer_engine" in
the nemotron_3_nano workload so it inherits the value from
BASE_NEMOTRON_3_NANO_CONFIG instead of shadowing it, ensuring the config no
longer duplicates the same setting and reducing config drift risk.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7cca4e97-bce1-400e-b5ad-353938665028
📒 Files selected for processing (3)
scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.pyscripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.pyscripts/performance/perf_plugins.py
| expert_model_parallel_size=8, | ||
| moe_flex_dispatcher_backend="hybridep", | ||
| cuda_graph_impl="transformer_engine", | ||
| cuda_graph_scope=["attn", "mamba", "moe_router", "moe_preprocess"], |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find the WorkloadBaseConfig class definition and cuda_graph_scope type annotation
rg -A 20 "class WorkloadBaseConfig" scripts/performance/utils/utils.py | head -50Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 1046
🏁 Script executed:
# Find all cuda_graph_scope assignments in config files
rg -n "cuda_graph_scope\s*=" scripts/performance/configs/Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 8390
🏁 Script executed:
# Check the specific file and lines mentioned in the review
sed -n '35,90p' scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py | cat -nRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 2564
🏁 Script executed:
# Search for cuda_graph_scope field in WorkloadBaseConfig class
rg -A 100 "class WorkloadBaseConfig" scripts/performance/utils/utils.py | grep -A 5 -B 5 "cuda_graph_scope"Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 440
🏁 Script executed:
# Alternative: search more broadly for all fields in the class
wc -l scripts/performance/utils/utils.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 109
🏁 Script executed:
# Read the entire utils.py file to see the full WorkloadBaseConfig definition
cat -n scripts/performance/utils/utils.py | head -200Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 9205
Fix type annotation for cuda_graph_scope to accept both string and list values.
Line 55 in scripts/performance/utils/utils.py defines cuda_graph_scope: Optional[str] = None, but the actual usage across config files assigns both list[str] values (e.g., ["attn", "mamba"]) and str values (e.g., "full_iteration"). Update the type annotation to str | list[str] | None to match the actual usage pattern. This also applies to lines 81 and 85 of the file under review.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py`
at line 39, Change the type annotation for the cuda_graph_scope variable in
scripts/performance/utils/utils.py from Optional[str] to allow both strings and
lists by using the union type str | list[str] | None; update every occurrence of
the cuda_graph_scope annotation (the variable named cuda_graph_scope and any
function signatures or defaults referencing it) so the annotation matches actual
usage (accepting values like "full_iteration" or ["attn","mamba","moe_router"]).
Ensure imports or typing usage remain valid for the project's Python version
(use PEP 604 union syntax).
What does this PR do ?
Updated configs for better perf
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit