Override moe_token_dispatcher_type to "alltoall" when export megatron…#2658
Override moe_token_dispatcher_type to "alltoall" when export megatron…#2658jaeminh wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
Conversation
📝 WalkthroughWalkthroughThe change adds an initial model load with MOE token dispatcher override settings in the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/megatron/bridge/models/conversion/auto_bridge.py`:
- Around line 838-844: The second call to load_megatron_model immediately
overwrites the model with the mp_overrides, making the
{"moe_token_dispatcher_type": "alltoall"} override ineffective; remove the
redundant call that reassigns megatron_model (the load_megatron_model(...,
wrap_with_ddp=False) on line after the override) so that megatron_model retains
the mp_overrides before it is passed to save_hf_pretrained and any assertions
about the dispatcher type succeed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 85ff4248-ad81-4ba7-a887-9eeee6b942ac
📒 Files selected for processing (1)
src/megatron/bridge/models/conversion/auto_bridge.py
What does this PR do ?
When exporting a Megatron checkpoint to HF format, an error
AssertionError: Flex token dispatcher requires TPxEP > 1occurs ifmoe_token_dispatcher_typeis set toflex.This PR overrides
moe_token_dispatcher_typeduring the export process and temporarily switchesflextoalltoallonly for export. This avoids the assertion without affecting the original configuration.Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit