Norm-Based Adaptive Moment Estimation with Orthogonalized Momentum by mkhona-nvidia · Pull Request #107 · NVIDIA-NeMo/Emerging-Optimizers

mkhona-nvidia · 2026-02-20T19:29:23Z

Build Namo as another method to normalize Muon updates

From Adam improves Muon (https://arxiv.org/abs/2602.17080)

Signed-off-by: mikail <mkhona@nvidia.com>

copy-pr-bot · 2026-02-20T19:29:27Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-02-20T19:37:03Z

Greptile Summary

Implements NAMO (scalar adaptive scaling) as a third moment2_method option for AdaptiveMuon, following the approach from "Adam improves Muon" (arXiv:2602.17080). NAMO scales orthogonalized momentum by the Frobenius-norm ratio of pre-orthogonalization gradient to the EMA of raw gradient norms.

Key Changes:

Extended moment2_method parameter to accept "namo" alongside existing "adamuon" and "normuon"
Added scalar buffer initialization for NAMO (EMA of ||G_t||_F^2)
Implemented NAMO scaling logic: α_t = ||g_t^pre-orth||_F / (√v_t + ε)
Captured gradient norms before and after momentum+Nesterov updates
Added comprehensive docstrings with mathematical notation
Updated tests to cover NAMO across all test cases

Confidence Score: 4/5

Safe to merge with minor review of gradient norm capture logic
Implementation follows established patterns, includes comprehensive tests and documentation. Small deduction for potential ambiguity in which gradient tensor is used for pre_orth_norm calculation
Pay close attention to adaptive_muon.py:280 - verify the gradient norm calculation uses the correct tensor

Important Files Changed

Filename	Overview
emerging_optimizers/orthogonalized_optimizers/adaptive_muon.py	Adds NAMO method for scalar adaptive scaling via Frobenius-norm ratio, extends moment2_method parameter, includes proper documentation
tests/test_adaptive_muon.py	Comprehensive test coverage for NAMO including smoke tests and shape validation
docs/apidocs/orthogonalized-optimizers.md	Documentation properly updated to include AdaptiveMuon in API docs

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Start[Start step] --> CheckMethod{moment2_method?}
    
    CheckMethod -->|namo| CaptureNorm1[Capture grad_fro_sq = ‖G_t‖²_F<br/>before momentum]
    CheckMethod -->|adamuon/normuon| StandardPath[Standard path]
    
    CaptureNorm1 --> UpdateMomentum[Update momentum buffer<br/>exp_avg.lerp_]
    StandardPath --> UpdateMomentum
    
    UpdateMomentum --> Nesterov{use_nesterov?}
    Nesterov -->|Yes| NesterovGrad[grad = grad.lerp exp_avg]
    Nesterov -->|No| UseExpAvg[grad = exp_avg]
    
    NesterovGrad --> Orthogonalize[orth_grad = orthogonalize]
    UseExpAvg --> Orthogonalize
    
    Orthogonalize --> ApplyMethod{moment2_method?}
    
    ApplyMethod -->|namo| NAMOPath[NAMO: Update v_t with grad_fro_sq<br/>Compute α_t = ‖grad‖_F / √v_t + ε<br/>Return orth_grad * α_t]
    ApplyMethod -->|adamuon| AdamPath[AdamUon: Update elementwise v_t<br/>Return orth_grad / √v_t + ε]
    ApplyMethod -->|normuon| NorPath[NorMuon: Update row/col v_t<br/>Return orth_grad * rsqrt v_t]
    
    NAMOPath --> WeightUpdate[p.add_ update, alpha=-lr]
    AdamPath --> WeightUpdate
    NorPath --> WeightUpdate
    
    WeightUpdate --> End[End step]

_{Last reviewed commit: 6511afa}

greptile-apps

_{3 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

mkhona-nvidia added 4 commits February 20, 2026 11:21

added namo to adaptive muon

88a876d

Signed-off-by: mikail <mkhona@nvidia.com>

added namo to adaptive muon

a64ec10

Signed-off-by: mikail <mkhona@nvidia.com>

added namo to adaptive muon

4983de7

Signed-off-by: mikail <mkhona@nvidia.com>

changed comment for update size

6511afa

Signed-off-by: mikail <mkhona@nvidia.com>

mkhona-nvidia self-assigned this Feb 20, 2026

greptile-apps bot reviewed Feb 20, 2026

View reviewed changes

mkhona-nvidia changed the title ~~Mkhona/namo~~ Namo Feb 20, 2026

mkhona-nvidia changed the title ~~Namo~~ Norm-Based Adaptive Moment Estimation with Orthogonalized Momentum Feb 20, 2026

skyw marked this pull request as draft February 23, 2026 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Norm-Based Adaptive Moment Estimation with Orthogonalized Momentum#107

Norm-Based Adaptive Moment Estimation with Orthogonalized Momentum#107
mkhona-nvidia wants to merge 4 commits intoNVIDIA-NeMo:mainfrom
mkhona-nvidia:mkhona/namo

mkhona-nvidia commented Feb 20, 2026

Uh oh!

copy-pr-bot bot commented Feb 20, 2026

Uh oh!

greptile-apps bot commented Feb 20, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mkhona-nvidia commented Feb 20, 2026

Uh oh!

copy-pr-bot bot commented Feb 20, 2026

Uh oh!

greptile-apps bot commented Feb 20, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant