Spectron optimizer for low-rank LLM pretraining by mkhona-nvidia · Pull Request #104 · NVIDIA-NeMo/Emerging-Optimizers

mkhona-nvidia · 2026-02-17T03:43:42Z

Added the Spectron optimizer

Also added power iteration and rayleigh coefficient method to get spectral norm to utils/eig.py

Based on https://arxiv.org/abs/2602.12429

Signed-off-by: mikail <mkhona@nvidia.com>

copy-pr-bot · 2026-02-17T03:43:46Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: mikail <mkhona@nvidia.com>

…ctors Signed-off-by: mikail <mkhona@nvidia.com>

Signed-off-by: mikail <mkhona@nvidia.com>

greptile-apps · 2026-02-17T03:59:34Z

Greptile Summary

Adds Spectron, a low-rank spectral optimizer with orthogonalized momentum for LLM pretraining based on https://arxiv.org/abs/2602.12429. Maintains weights as low-rank factorizations W = A @ B^T, applies momentum with Newton-Schulz orthogonalization, and scales learning rates by spectral radii.

Major changes:

New Spectron optimizer class with SVD-based initialization and low-rank factor updates
power_iteration function in utils/eig.py for spectral norm estimation
Comprehensive test suite with 13 test cases
Integration into CI/CD pipelines

Critical issues preventing production use:

Dtype mismatch bugs will cause runtime failures with bfloat16 parameters (the standard dtype for LLM pretraining)
Tensor used as scalar in add_ operation may cause issues under torch.compile
No test coverage for mixed-precision dtypes that would catch these bugs

Confidence Score: 1/5

Critical runtime failures expected with bfloat16 parameters - the stated use case for this optimizer
Multiple critical dtype-related bugs will cause runtime errors when using bfloat16 parameters (standard for LLM pretraining). The gradient@factor matmul on line 179 and momentum updates on line 187 will fail due to dtype mismatches between bfloat16 gradients and float32 factors. These are not edge cases - they affect the primary use case stated in the docstring.
emerging_optimizers/orthogonalized_optimizers/spectron.py requires dtype casting fixes before merge, tests/test_spectron.py needs bfloat16 test coverage

Important Files Changed

Filename	Overview
emerging_optimizers/orthogonalized_optimizers/spectron.py	New Spectron optimizer implementation with critical dtype bugs preventing bfloat16 usage in LLM pretraining
emerging_optimizers/utils/eig.py	Added `power_iteration` function implementing Algorithm 3 from Spectron paper - clean implementation
tests/test_spectron.py	Comprehensive test suite but missing critical bfloat16 test coverage that would catch dtype bugs

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Start([Optimizer Step]) --> CheckGrad{Gradient exists?}
    CheckGrad -->|No| End([Skip parameter])
    CheckGrad -->|Yes| InitCheck{First step?}
    
    InitCheck -->|Yes| SVDInit[SVD Initialization:<br/>W = U·S·V^T<br/>A = U·√S, B = V·√S]
    SVDInit --> InitState[Initialize:<br/>momentum_A, momentum_B<br/>u_A, u_B vectors]
    InitState --> Compute
    
    InitCheck -->|No| Compute[Compute factor gradients:<br/>grad_A = grad @ B<br/>grad_B = grad^T @ A]
    
    Compute --> WD[Apply weight decay<br/>to both factors]
    WD --> Momentum[Update momentum:<br/>momentum_A ← β·momentum_A + 1-β·grad_A<br/>momentum_B ← β·momentum_B + 1-β·grad_B]
    
    Momentum --> NS[Orthogonalize using<br/>Newton-Schulz iteration<br/>requires float32]
    
    NS --> PowerIter[Power iteration:<br/>estimate σ_A, σ_B<br/>spectral radii]
    
    PowerIter --> Scale[Scale learning rate:<br/>η_scaled = η / σ_A + σ_B + 1]
    
    Scale --> Update[Update factors:<br/>A ← A - η_scaled·orth_momentum_A<br/>B ← B - η_scaled·orth_momentum_B]
    
    Update --> Reconstruct[Reconstruct weight:<br/>W ← A @ B^T]
    
    Reconstruct --> End

_{Last reviewed commit: d2686bb}

greptile-apps

_{7 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

emerging_optimizers/orthogonalized_optimizers/spectron.py

greptile-apps · 2026-02-17T03:59:38Z

emerging_optimizers/orthogonalized_optimizers/__init__.py

 from emerging_optimizers.orthogonalized_optimizers.orthogonalized_optimizer import *
 from emerging_optimizers.orthogonalized_optimizers.scion import *
 from emerging_optimizers.orthogonalized_optimizers.spectral_clipping_utils import *
+from emerging_optimizers.orthogonalized_optimizers.spectron import *


Missing trailing newline

The file is missing a trailing newline after the new import line. This is flagged by most linters and POSIX standards, and the previous version of the file had one.

Suggested change

from emerging_optimizers.orthogonalized_optimizers.spectron import *

from emerging_optimizers.orthogonalized_optimizers.spectron import *

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Signed-off-by: mikail <mkhona@nvidia.com>

greptile-apps

_{7 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

emerging_optimizers/orthogonalized_optimizers/__init__.py

Signed-off-by: mikail <mkhona@nvidia.com>

greptile-apps

_{7 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

emerging_optimizers/orthogonalized_optimizers/spectron.py

Signed-off-by: mikail <mkhona@nvidia.com>

greptile-apps

_{7 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: mikail <mkhona@nvidia.com>

mkhona-nvidia · 2026-02-17T04:57:06Z

/ok to test 326f3f6

greptile-apps

_{7 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

mkhona-nvidia · 2026-02-17T05:39:21Z

/ok to test 326f3f6

Pauljanson002 · 2026-02-17T18:22:08Z

emerging_optimizers/orthogonalized_optimizers/spectron.py

+                factor_B.add_(orth_momentum_B, alpha=-scaled_lr)
+
+                # Reconstruct full weight matrix: W = A @ B^T
+                p.copy_(factor_A @ factor_B.mT)


I am guessing this reconstruction is for the compatibility with the rest of the library. Otherwise the whole implementation looks correct.

I leave the weights of the model as a single matrix, but do the low-rank decomposition as optimizer states (rather than having the low-rank factored weights as 2 separate matrices in the model, which make it harder to access them inside the optimizer). This is functionally identical but makes the SW easier to use

Pauljanson002

This implementation is correct with a minor difference. In our work we train the models with only factors. In this implementation the model weights remains in the dense form but optimization happens with low rank factors, reducing optimization state.

Signed-off-by: mikail <mkhona@nvidia.com>

greptile-apps

_{7 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-17T19:40:01Z

emerging_optimizers/orthogonalized_optimizers/spectron.py

+                with utils.fp32_matmul_precision("highest"):
+                    grad_A = grad @ factor_B  # shape: (m, r)
+                    grad_B = grad.mT @ factor_A  # shape: (n, r)
+


Gradient dtype mismatch with non-fp32 parameters

grad = p.grad inherits p's dtype, but factor_B is always float32 (initialized from torch.linalg.svd(p.float(), ...)). When the parameter is bfloat16 — the standard dtype for LLM pretraining, which is the stated use case — the line grad @ factor_B will raise a RuntimeError at runtime:

RuntimeError: expected scalar type Float but found BFloat16

Even if PyTorch silently promotes the dtype in some contexts, momentum_A.lerp_(grad_A, ...) on line 187 will then fail because momentum_A is float32 but grad_A would be bfloat16.

The gradient should be explicitly cast to float32 before the matmul:

Suggested change

with utils.fp32_matmul_precision("highest"):

grad_A = grad @ factor_B # shape: (m, r)

grad_B = grad.mT @ factor_A # shape: (n, r)

with utils.fp32_matmul_precision("highest"):

grad_A = grad.float() @ factor_B # shape: (m, r)

grad_B = grad.float().mT @ factor_A # shape: (n, r)

emerging_optimizers/orthogonalized_optimizers/spectron.py

tests/test_spectron.py

Signed-off-by: mikail <mkhona@nvidia.com>

greptile-apps

_{7 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

emerging_optimizers/orthogonalized_optimizers/spectron.py

Signed-off-by: mikail <mkhona@nvidia.com>

greptile-apps

_{7 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

added spectron + test

89657bf

Signed-off-by: mikail <mkhona@nvidia.com>

mkhona-nvidia self-assigned this Feb 17, 2026

mkhona-nvidia requested a review from skyw February 17, 2026 03:44

mkhona-nvidia added 6 commits February 16, 2026 19:47

moved power iteration to eig

8be51a8

Signed-off-by: mikail <mkhona@nvidia.com>

generalized power iteration to return both left and right singular ve…

ce30798

…ctors Signed-off-by: mikail <mkhona@nvidia.com>

added a unit test for both singular vectors

d83bc10

Signed-off-by: mikail <mkhona@nvidia.com>

linting errors fix

9d7c6fa

Signed-off-by: mikail <mkhona@nvidia.com>

added ref to paper in algorithm

0f01d19

Signed-off-by: mikail <mkhona@nvidia.com>

updated documentation and ci script with spectron

e8b59ea

Signed-off-by: mikail <mkhona@nvidia.com>

greptile-apps bot reviewed Feb 17, 2026

View reviewed changes

mkhona-nvidia added 2 commits February 16, 2026 20:05

handled float32 miss

d1c1179

Signed-off-by: mikail <mkhona@nvidia.com>

linting

a8096d0

Signed-off-by: mikail <mkhona@nvidia.com>

greptile-apps bot reviewed Feb 17, 2026

View reviewed changes

emerging_optimizers/orthogonalized_optimizers/__init__.py Show resolved Hide resolved

cleanup some bad practices, fix state initialization handling

cd985ad

Signed-off-by: mikail <mkhona@nvidia.com>

greptile-apps bot reviewed Feb 17, 2026

View reviewed changes

emerging_optimizers/orthogonalized_optimizers/spectron.py Show resolved Hide resolved

removed stale trailing comment

86f1126

Signed-off-by: mikail <mkhona@nvidia.com>

greptile-apps bot reviewed Feb 17, 2026

View reviewed changes

linting

326f3f6

Signed-off-by: mikail <mkhona@nvidia.com>

copy-pr-bot bot temporarily deployed to test February 17, 2026 04:57 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 17, 2026 04:57 Inactive

greptile-apps bot reviewed Feb 17, 2026

View reviewed changes

copy-pr-bot bot had a problem deploying to nemo-ci February 17, 2026 04:59 Failure

mkhona-nvidia requested a review from a team February 17, 2026 17:50

Pauljanson002 reviewed Feb 17, 2026

View reviewed changes

added fp32 matmul decorators for safety

b7eb9be

Signed-off-by: mikail <mkhona@nvidia.com>

greptile-apps bot reviewed Feb 17, 2026

View reviewed changes

separate newton-schulz and other fp32 gemm decorators

7450693

Signed-off-by: mikail <mkhona@nvidia.com>

greptile-apps bot reviewed Feb 17, 2026

View reviewed changes

emerging_optimizers/orthogonalized_optimizers/spectron.py Outdated Show resolved Hide resolved

linting

d2686bb

Signed-off-by: mikail <mkhona@nvidia.com>

greptile-apps bot reviewed Feb 20, 2026

View reviewed changes

skyw marked this pull request as draft February 23, 2026 16:14

	from emerging_optimizers.orthogonalized_optimizers.spectron import *
	from emerging_optimizers.orthogonalized_optimizers.spectron import *

Conversation

mkhona-nvidia commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Feb 17, 2026

Uh oh!

greptile-apps bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 1/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

mkhona-nvidia commented Feb 17, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

mkhona-nvidia commented Feb 17, 2026

Uh oh!

Pauljanson002 Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

mkhona-nvidia Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Pauljanson002 Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Pauljanson002 left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mkhona-nvidia commented Feb 17, 2026 •

edited

Loading

greptile-apps bot commented Feb 17, 2026 •

edited

Loading