Skip to content

Set moe_router_force_load_balancing for Nemotron3 Nano#2639

Open
scsudhakaran wants to merge 1 commit intomainfrom
scsudhakaran/nano
Open

Set moe_router_force_load_balancing for Nemotron3 Nano#2639
scsudhakaran wants to merge 1 commit intomainfrom
scsudhakaran/nano

Conversation

@scsudhakaran
Copy link
Contributor

@scsudhakaran scsudhakaran commented Mar 4, 2026

Summary by CodeRabbit

  • New Features
    • Enabled load balancing optimization for the mixture-of-experts router in Nemotron 3 Nano model pretraining configuration.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 4, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@scsudhakaran scsudhakaran marked this pull request as ready for review March 4, 2026 11:27
@scsudhakaran scsudhakaran requested a review from malay-nagda March 4, 2026 11:27
malay-nagda
malay-nagda previously approved these changes Mar 4, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7f7fc747-f4cb-4b01-bc28-acbbe0bc0863

📥 Commits

Reviewing files that changed from the base of the PR and between 2eb1af6 and 7c5e8f7.

📒 Files selected for processing (1)
  • scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py

📝 Walkthrough

Walkthrough

Adds a configuration flag cfg.model.moe_router_force_load_balancing = True to the Nemotron 3 Nano pretraining configuration to enable load balancing for the MOE router, affecting initialization settings only.

Changes

Cohort / File(s) Summary
MOE Router Configuration
scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py
Adds configuration flag to enable load balancing for the MOE router in pretraining setup.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR enables moe_router_force_load_balancing for Nemotron-3 Nano pretraining without providing test results, performance metrics, or convergence analysis to validate the change. Add test results or validation data demonstrating that enabling moe_router_force_load_balancing does not cause training regressions, including loss curves, convergence metrics, or before-and-after comparisons.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Set moe_router_force_load_balancing for Nemotron3 Nano' directly matches the changeset, which adds the moe_router_force_load_balancing configuration flag to the Nemotron 3 Nano pretraining config.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch scsudhakaran/nano

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
@scsudhakaran
Copy link
Contributor Author

/ok to test a745826

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants