[dev] refactor to support emerging optimizers beyond muon by FDecaYed · Pull Request #3618 · NVIDIA/Megatron-LM

FDecaYed · 2026-02-26T12:50:56Z

What does this PR do ?

refactor optimizer config, distributed-optimizer dispatch and megatron optimizer get() in preparation for allowing more optimizers.

build on top of #3325

main PR: #3638

Summary

Unify optimizer creation so standard (Adam/SGD) and emerging (Muon) optimizers go through a single get_megatron_optimizer() entry point.

Single factory — All emerging optimizer logic consolidated into one new function _get_megatron_emerging_optimizer() in __init__.py. It reuses the same param-grouping and config-override mechanism as standard optimizers, removing the need to manually group parameters into separate optimizers with freeze/unfreeze hack.
Single config — Collapsed AdamOptimizerConfig/SGDOptimizerConfig subclasses back into one OptimizerConfig. Since every param group can override via config_overrides anyway, one default config + overrides is cleaner than passing multiple config objects.
Emerging optimizer registry — New emerging_optimizers.py with a pluggable registry. Each supported emerging optimizer maps to {optimizer_cls, config_to_kwargs, default_param_overrides, init_state_fn}. TensorParallelMuon and all Muon helpers moved here from muon.py (only backward-compat shim left in muon.py).
dist_muon deprecated — muon + --use-distributed-optimizer is resolved at the argument level into a new use_layer_wise_distributed_optimizer flag. this replaces dist_muon. use_distributed_optimizer reset to False to avoid side effects in the standard distributed-optimizer code path.
LayerWiseDistributedOptimizer now expects plain torch optimizers (wrapping happens internally). cleaner than unwrapping megatron optimizers.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]

Pre-checks

I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label `Expert Review`

(Step 2): Collect the expert reviewers reviews

Attach the Expert Review label when your PR is ready for review.
GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

Add Final Review label
GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

copy-pr-bot · 2026-02-26T12:51:00Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

skyw

Looks good overall.

Terminology "plain" can be improved.

megatron/core/optimizer/__init__.py

skyw · 2026-02-26T16:46:13Z

megatron/core/optimizer/__init__.py

+                setattr(optimizer, 'tp_group', tp_group)
+                result = optimizer
+        else:
+            fallback_config = copy.copy(config)


Q: Does this need deepcopy? there could be very heavy structure in config.

megatron/core/optimizer/__init__.py

megatron/core/optimizer/emerging_optimizers.py

skyw · 2026-02-26T16:53:49Z

megatron/core/optimizer/emerging_optimizers.py

+        "num_ns_steps": config.muon_num_ns_steps,
+        "scale_mode": config.muon_scale_mode,
+        "extra_scale_factor": config.muon_extra_scale_factor,
+        "mode": config.muon_tp_mode,


how about change this to tp_mode?
so the contract is everything with muon_ prefix in config, translate to kwargs by stripping out the prefix. code can also be generalized.

SG. we can do it in next PR when we bump emerging_optimizers and really add support for other optimizers

skyw · 2026-02-26T16:56:34Z

megatron/core/optimizer/muon.py

 # Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

-"""Megatron muon optimizer wrapper to handle tensor-parallel."""
+"""Backward-compatible shim — all code now lives in ``emerging_optimizers``."""


Q: Does everything exposed through muon.py still work? just with a deprecation warning?

I'll test it. haven't run the draft yet

skyw · 2026-02-26T17:15:03Z

megatron/training/arguments.py

-    # Muon optimizer check
-    if 'muon' in args.optimizer:
+    # Muon / emerging optimizer check
+    if args.optimizer in ('muon', 'dist_muon'):


Consider creating a emerging optimizer group for everything with muon_, soap_ or other prefixs.

same, we'll change these part properly in next step when we add support

Signed-off-by: Hao Wu <skyw@nvidia.com>

FDecaYed · 2026-02-27T09:37:56Z

/ok to test 3ce76a9

yaoyu-33 · 2026-03-02T23:52:46Z

@FDecaYed : I think Megatron-Bridge needs some api support change to cover this PR as well?

FDecaYed · 2026-03-04T03:29:32Z

@FDecaYed : I think Megatron-Bridge needs some api support change to cover this PR as well?

yes. but I'm hoping this would be last refactor

Signed-off-by: Hao Wu <skyw@nvidia.com> Co-authored-by: Hao Wu <skyw@nvidia.com>

svcnvidia-nemo-ci · 2026-03-05T03:57:28Z

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/22701547164

Signed-off-by: Hao Wu <skyw@nvidia.com> Co-authored-by: Hao Wu <skyw@nvidia.com>

svcnvidia-nemo-ci · 2026-03-05T04:05:04Z

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/22701733585

FDecaYed changed the title ~~[dev] emerging opt refactor~~ [dev] refactor to support emerging optimizers beyond muon Feb 26, 2026

skyw requested changes Feb 26, 2026

View reviewed changes

skyw mentioned this pull request Feb 26, 2026

Generalize emerging optimizer integration #3325

Closed

6 tasks

skyw reviewed Feb 26, 2026

View reviewed changes

skyw and others added 7 commits February 27, 2026 14:20

Move muon to its own config class

a94046d

Signed-off-by: Hao Wu <skyw@nvidia.com>

Generalize interface of muon to support more optimizers

8b75626

Signed-off-by: Hao Wu <skyw@nvidia.com>

Update muon reference to emerging optimizer

da1a405

Signed-off-by: Hao Wu <skyw@nvidia.com>

generalize init_state_fn

c93efd6

draft get opt refactor

7b553b4

full refactor to allow emerging optimizer properly

7303c08

address comments

6b21428

FDecaYed force-pushed the deyuf/emerging_opt_refactor branch from 1ab146d to 6b21428 Compare February 27, 2026 06:20

FDecaYed added 2 commits February 27, 2026 16:26

fix minor issue and initial test passes

71d3b36

fix minor test issues

3ce76a9

FDecaYed marked this pull request as ready for review February 27, 2026 09:21

FDecaYed requested review from a team as code owners February 27, 2026 09:21

svcnvidia-nemo-ci added this to the Core 0.16 milestone Feb 27, 2026

copy-pr-bot bot temporarily deployed to test February 27, 2026 09:38 Inactive

skyw approved these changes Feb 27, 2026

View reviewed changes

yaox12 approved these changes Feb 28, 2026

View reviewed changes

FDecaYed added this pull request to the merge queue Mar 5, 2026

github-merge-queue bot pushed a commit that referenced this pull request Mar 5, 2026

[dev] refactor to support emerging optimizers beyond muon (#3618)

cf9c0f2

Signed-off-by: Hao Wu <skyw@nvidia.com> Co-authored-by: Hao Wu <skyw@nvidia.com>

github-merge-queue bot pushed a commit that referenced this pull request Mar 5, 2026

[dev] refactor to support emerging optimizers beyond muon (#3618)

31f5294

Signed-off-by: Hao Wu <skyw@nvidia.com> Co-authored-by: Hao Wu <skyw@nvidia.com>

Conversation

FDecaYed commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Summary

Contribution process

Pre-checks

Code review

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

(Step 3): Final Review

(Optional Step 4): Cherry-pick into release branch

Merging your PR

Uh oh!

copy-pr-bot bot commented Feb 26, 2026

Uh oh!

skyw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skyw Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skyw Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

FDecaYed Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

skyw Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

FDecaYed Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

skyw Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

FDecaYed Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

FDecaYed commented Feb 27, 2026

Uh oh!

yaoyu-33 commented Mar 2, 2026

Uh oh!

FDecaYed commented Mar 4, 2026

Uh oh!

svcnvidia-nemo-ci commented Mar 5, 2026

Uh oh!

svcnvidia-nemo-ci commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

FDecaYed commented Feb 26, 2026 •

edited

Loading

(Step 1): Add PR label `Expert Review`