Hi,
I would like to ask whether it would be acceptable for me to open a pull request adding optional QKClip (from the Kimi K2 paper’s MuonClip optimizer). I believe this feature could provide additional stability to those who use Muon (regular Muon is known to have instability on attention layers).
Thank you