Variants of Muon Optimizer

Warning

Only verified with DDPStrategy, FSDP and deepspeed are still untested. Known issue exists in deepspeed_stage_2 due to sharded grad.

This repo tests out Muon optimizer and its variant applications. Including convenient mixture with AdamW, Scion, or under constraints like Stiefel Manifold, Spectral sphere,.etc.

Roadmap: TODO.md

Installation

This project can be easily installed via pip install -e .. For developers, you may install via pip install --no-build-isolation -e .[dev] to obtain pytest and related toolkits like tilelang for developing custom kernels.

Implementations & Key References

`Moonlight`

Official implementation: https://github.com/MoonshotAI/Moonlight
Core contribution:
- An extra "update-RMS equalization" step so the per-parameter update RMS lines up across matrix vs. non-matrix params, allowing a more unified LR strategy across groups (Muon vs AdamW).

MuonClip

[TODO] Unofficial implementation here
Core contribution:
- QK-Clip for controlling max-logit explosion (reference)

`Muon` on Spectral Sphere

$$ \min_{\Phi \in \mathbb{R}^{m \times n}} \mathstrut \text{tr}(G^\top \Phi) \quad \text{s.t.} \quad |\Phi|_2 = 1,\ |W|_2 = 1, |W-\eta \Phi |_2 = 1 $$

source update function
Reference: 《流形上的最速下降：4. Muon + 谱球面》

`StiefelMoonlight`

$$ \min_{\Phi \in \mathbb{R}^{m \times n}} \underbrace{\mathstrut \text{tr}(G^\top \Phi)}_{\text{linearization of cost}} \quad \text{s.t.} \quad \underbrace{|\Phi|_2 = 1}_{\text{spectral constraint}}, W^\top W=1, \\ \underbrace{\mathstrut \Phi^\top W + W^\top \Phi = 0}_{\text{tangent space constraint}}. $$

Note

This implementation has not been exposed due to unsatisfying speed and accuracy. However, you may import it via from manifold_muon.stiefel.stiefel_moonlight import StiefelMoonlight to try it out.

Dual Ascent based method (source code) referred from modula's blog.

Fixed Point based method (source code) referred from 《流形上的最速下降：3. Muon + Stiefel》.

[!] Weight decay is still under development

Example Usage

Our ManifoldMoonlight optimizer uses a new parameter grouping method, which is different from classical Moonlight or Muon implementation. Current valid grouping choices are ["use_muon", "use_adamw", "use_spectral_muon"].

from manifold_muon import ManifoldMoonlight, deduplicate_and_check_missing_params
params = {
    "use_muon": [p
            for name, p in model.named_parameters()
            if ((p.ndim >= 2 and "embed_tokens" not in name and "lm_head" not in name) and not ("q_proj" in name and "k_proj" in name))
    ],
    "use_adamw": [p
            for name, p in model.named_parameters()
            if not ((p.ndim >= 2 and "embed_tokens" not in name and "lm_head" not in name) and ("q_proj" not in name and "k_proj" not in name))
    ],
    "use_spectral_muon": [p
            for name, p in model.named_parameters()
            if (("q_proj" in name and "k_proj" in name) and not (p.ndim >= 2 and "embed_tokens" not in name and "lm_head" not in name))
    ],
}

# We highly suggest to add this line to check if any missing or duplicate params exists among groups 
deduplicate_and_check_missing_params(model, params) 

optimizer = ManifoldMoonlight(
    grouped_params = params,
    ...
)

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
TODO.md		TODO.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Variants of Muon Optimizer

Installation

Implementations & Key References

`Moonlight`

MuonClip

`Muon` on Spectral Sphere

`StiefelMoonlight`

Example Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Variants of Muon Optimizer

Installation

Implementations & Key References

Moonlight

MuonClip

Muon on Spectral Sphere

StiefelMoonlight

Example Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`Moonlight`

`Muon` on Spectral Sphere

`StiefelMoonlight`

Packages