Improve test coverage and report by skyw · Pull Request #119 · NVIDIA-NeMo/Emerging-Optimizers

skyw · 2026-03-06T00:40:17Z

Half done by Claude Code and I reviewed AI written code.

some tiny bugs a fixed along with it.

Signed-off-by: Hao Wu <skyw@nvidia.com>

copy-pr-bot · 2026-03-06T00:40:20Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-03-06T00:48:48Z

Greptile Summary

This PR improves test coverage and CI reporting across several optimizer components: it fixes a shape bug (torch.empty(0) → torch.empty(0, 0)) in soap_utils.py, reformulates the met_approx_eigvals_criteria expression for clarity, and adds a broad set of new tests for SOAP schedule classes, all_eigenbases_met_criteria, Lion updates, and AdEMAMix. CI reporting is upgraded to emit per-test JUnit XML reports, and distributed tests now correctly produce per-rank output files.

Key points:

Bug fix: Empty kronecker factor guard now returns a proper 2-D (0, 0) tensor rather than a 1-D (0,) tensor, ensuring downstream shape assumptions are met.
CI reporting: L0_Tests_CPU.sh is refactored to loop over test files and pass --xml_output_file to each; test_distributed_muon_utils_cpu.py appends the rank index to the output file name to prevent collisions.
New schedule tests: ScheduleTest is moved to its own class and extended with CosineSchedule and StepSchedule coverage, including error-path validation.
CI gap: tests/test_soap.py and tests/test_soap_utils.py — which contain the majority of the new tests — are not added to L0_Tests_CPU.sh, so those tests will not be run or reported in the CPU CI pipeline.
Negative eigenbasis test: test_all_eigenbases_met_criteria_random_eigenbasis_returns_false uses a random diagonal matrix as the eigenbasis rather than a random orthonormal matrix, which technically violates the function's documented preconditions, though it works in practice given the very tight default tolerance.

Confidence Score: 4/5

Safe to merge with minor follow-up: the production fixes are correct and the new tests are sound, but new test files are missing from the CI script.
The source-code changes (soap_utils.py shape fix, eig.py algebraic reformulation, pyproject.toml exclusion) are all correct and low-risk. New tests are well-structured and the previously-reported issues in test_scalar_optimizers.py have been fixed. The one gap is that test_soap.py and test_soap_utils.py are not wired into L0_Tests_CPU.sh, meaning the bulk of the new coverage is not exercised by CI. This doesn't block merging but should be addressed in a follow-up.
tests/ci/L0_Tests_CPU.sh — new test files should be added to the CI loop so the expanded coverage is actually enforced.

Important Files Changed

Filename	Overview
emerging_optimizers/soap/soap_utils.py	Fixes `torch.empty(0)` → `torch.empty(0, 0)` in two empty-factor guard clauses in both `get_eigenbasis_eigh` and `get_eigenbasis_qr`, ensuring the returned tensor has the correct 2-D shape expected by downstream code.
emerging_optimizers/utils/eig.py	Algebraically-equivalent reformulation of `met_approx_eigvals_criteria` return expression (mathematically identical to previous form) and minor docstring cleanups. No logic change.
tests/ci/L0_Tests_CPU.sh	Adds XML report generation and refactors distributed test runs into a loop, but `test_soap.py` and `test_soap_utils.py` (which contain the bulk of the new tests in this PR) are not included, leaving new coverage unchecked by CI.
tests/test_scalar_optimizers.py	Parameterizes AdEMAMix-equals-Adam test over `correct_bias` and `num_beta_fast_warmup_steps`, fixes `centered=correct_bias` propagation, and adds two new Lion update tests. The parameter `num_beta_fast_warmup_steps` is passed to `calculate_ademamix_update` as `num_beta_slow_warmup_steps`, which is semantically confusing (noted in a previous thread).
tests/test_soap_utils.py	Adds empty-factor QR test, a zero-dim eigenbasis eigh case, and three `all_eigenbases_met_criteria` tests. The negative test uses a non-orthonormal diagonal matrix as the eigenbasis, which technically violates the function's preconditions, though it works in practice due to the tight tolerance.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["L0_Tests_CPU.sh"] -->|"torchrun n=8,4"| B["test_distributed_muon_utils_cpu.py"]
    B -->|"rank-suffix XML per process"| C["test-results/tests/\ntest_distributed_muon_utils_cpu_n8_rank0.xml\n..."]
    A -->|"coverage run loop"| D["test_scalar_optimizers.py\ntest_procrustes_step.py"]
    D -->|"XML report"| E["test-results/tests/\ntest_scalar_optimizers.py.xml\n..."]

    F["test_soap.py\n(ScheduleTest, SoapFunctionsTest)"] -.->|"NOT in CI loop"| A
    G["test_soap_utils.py\n(SoapUtilsTest)"] -.->|"NOT in CI loop"| A

    H["soap_utils.py\nget_eigenbasis_eigh / qr"] -->|"empty factor → torch.empty(0,0)"| I["Downstream code\nexpects 2-D tensor"]
    J["eig.py\nmet_approx_eigvals_criteria"] -->|"algebraically equivalent\nreformulation"| K["tolerance * ||K|| ≥ ||K|| - ||diag||"]

_{Last reviewed commit: b4ea359}

tests/test_scalar_optimizers.py

greptile-apps · 2026-03-06T00:49:00Z

tests/test_soap_utils.py

+    def test_all_eigenbases_met_criteria_true_eigenbasis_returns_true(self, N: int) -> None:
+        kronecker_factor_list = [torch.randn(N, N, device=self.device)]
+
+        eigenbasis_list = [torch.diag(torch.linalg.eigh(K).eigenvalues) for K in kronecker_factor_list]
+        self.assertTrue(soap_utils.all_eigenbases_met_criteria(kronecker_factor_list, eigenbasis_list))


Wrong eigh attribute used — .eigenvalues instead of .eigenvectors

torch.linalg.eigh returns a named tuple with .eigenvalues (1-D vector λ) and .eigenvectors (N×N orthogonal matrix Q). The test wraps the 1-D eigenvalues in torch.diag(), producing a diagonal eigenvalue matrix D, and passes that to all_eigenbases_met_criteria.

However, the conjugate function (used internally) assumes its second argument is an orthogonal matrix. Passing a diagonal eigenvalue matrix instead breaks this invariant. The met_approx_eigvals_criteria check will compute a meaningless result and likely pass by chance, so the test does not validate the intended mathematical property.

Additionally, K = torch.randn(N, N) on line 200 is not symmetric; calling torch.linalg.eigh on it is undefined behaviour (PyTorch silently uses only the lower triangular part).

The test should construct a symmetric matrix and use the eigenvectors:

def test_all_eigenbases_met_criteria_true_eigenbasis_returns_true(self, N: int) -> None: g = torch.randn(N, N, device=self.device) K_sym = g @ g.T + torch.eye(N, device=self.device) * 1e-5 # symmetric PSD kronecker_factor_list = [K_sym] eigenbasis_list = [torch.linalg.eigh(K_sym).eigenvectors] self.assertTrue(soap_utils.all_eigenbases_met_criteria(kronecker_factor_list, eigenbasis_list))

@mkhona-nvidia , should we use eigen values or eigen vectors?

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Hao Wu <skyw@users.noreply.github.com>

Signed-off-by: Hao Wu <skyw@nvidia.com>

greptile-apps · 2026-03-06T00:57:46Z

tests/test_scalar_optimizers.py

            exp_avg_sq_for_sim_ademamix,
-            num_beta_fast_warmup_steps=None,
+            num_beta_fast_warmup_steps=num_beta_fast_warmup_steps,
            min_beta_fast=0.0,


Parameterized correct_bias is shadowed by local reassignment

Same issue as in the test_calculate_ademamix_update_with_alpha_zero_equals_adam function: the parameterized correct_bias argument (with values True and False) is immediately overwritten with correct_bias = False on this line. Both parameterized variants therefore always run with correct_bias=False, so the correct_bias=True branch (which would set centered=True in the RMSProp reference) is never actually exercised.

Suggested change

min_beta_fast=0.0,

step = 10

lr = 0.25

Remove the correct_bias = False override so the parameterized value flows through to both the calculate_sim_ademamix_update call (line 236) and the centered=correct_bias argument passed to torch.optim.RMSprop (line 252).

tests/ci/L0_Tests_CPU.sh

Signed-off-by: Hao Wu <skyw@nvidia.com>

greptile-apps · 2026-03-06T01:10:28Z

tests/test_soap_utils.py

+
+        self.assertEqual(len(Q_new_list), 2)
+        self.assertEqual(Q_new_list[0].shape, (N, N))
+        self.assertEqual(Q_new_list[1].numel(), 0)


Test doesn't verify the actual (0, 0) shape fix

The assertion only checks numel() == 0, which would have passed even before the soap_utils.py fix that changed torch.empty(0) to torch.empty(0, 0). Both a 1-D empty tensor and a 2-D empty tensor have numel() == 0, so this test doesn't actually validate the fix it was written for.

Compare with test_get_eigenbasis_eigh which correctly asserts Q.shape == (0, 0) for the zero-dim case. This test should do the same:

Suggested change

self.assertEqual(Q_new_list[1].numel(), 0)

self.assertEqual(Q_new_list[1].shape, (0, 0))

greptile-apps · 2026-03-06T01:10:29Z

tests/test_scalar_optimizers.py

+    @parameterized.parameters(
+        {"correct_bias": True, "num_beta_fast_warmup_steps": None},
+        {"correct_bias": False, "num_beta_fast_warmup_steps": 2},
+    )
+    def test_calculate_ademamix_update_with_alpha_zero_equals_adam(
+        self, correct_bias: bool, num_beta_fast_warmup_steps: int | None
+    ) -> None:


Misleading parameter name num_beta_fast_warmup_steps in AdEMAMix test

The parameterized argument is named num_beta_fast_warmup_steps, but inside the function body it is passed to the calculate_ademamix_update call as num_beta_slow_warmup_steps=num_beta_fast_warmup_steps (line 181). "Fast" and "slow" are separate concepts in AdEMAMix — using the wrong name here can mislead a reader into thinking the wrong warmup schedule is being varied.

By contrast, the sibling test test_calculate_sim_ademamix_update_with_zero_momentum_and_alpha_equals_rmsprop (line 207) uses the same parameter name and passes it correctly as num_beta_fast_warmup_steps=. Renaming the parameter in this test to num_beta_slow_warmup_steps would make the intent clear:

Suggested change

@parameterized.parameters(

{"correct_bias": True, "num_beta_fast_warmup_steps": None},

{"correct_bias": False, "num_beta_fast_warmup_steps": 2},

)

def test_calculate_ademamix_update_with_alpha_zero_equals_adam(

self, correct_bias: bool, num_beta_fast_warmup_steps: int | None

) -> None:

@parameterized.parameters(

{"correct_bias": True, "num_beta_slow_warmup_steps": None},

{"correct_bias": False, "num_beta_slow_warmup_steps": 2},

)

def test_calculate_ademamix_update_with_alpha_zero_equals_adam(

self, correct_bias: bool, num_beta_slow_warmup_steps: int | None

) -> None:

And update the usage on line 181 accordingly:

num_beta_slow_warmup_steps=num_beta_slow_warmup_steps,

Signed-off-by: Hao Wu <skyw@nvidia.com>

skyw · 2026-03-06T01:16:46Z

/ok to test b4ea359

greptile-apps · 2026-03-06T01:20:04Z

tests/ci/L0_Tests_CPU.sh

+for test in "tests/test_scalar_optimizers.py" "tests/test_procrustes_step.py"; do
+    report_name="test-results/${test}.xml"
+    coverage run -p --source=emerging_optimizers $test --device=cpu  -v -2 --xml_output_file="$report_name" || error=1
+done


New test files excluded from CI

test_soap.py and test_soap_utils.py are not added to this script, so all the new test cases added in this PR — including ScheduleTest, test_all_eigenbases_met_criteria_*, and test_get_eigenbasis_qr_empty_factor — will not be executed in CI.

The for-loop only covers test_scalar_optimizers.py and test_procrustes_step.py. Since the PR's stated goal is to "improve test coverage and report", the new tests should also be wired into the CI script, for example:

for test in "tests/test_scalar_optimizers.py" "tests/test_procrustes_step.py" "tests/test_soap.py" "tests/test_soap_utils.py"; do report_name="test-results/${test}.xml" coverage run -p --source=emerging_optimizers $test --device=cpu -v -2 --xml_output_file="$report_name" || error=1 done

greptile-apps · 2026-03-06T01:20:06Z

tests/test_soap_utils.py

+    def test_all_eigenbases_met_criteria_random_eigenbasis_returns_false(self, N: int) -> None:
+        kronecker_factor_list = [torch.randn(N, N, device=self.device)]
+        eigenbasis_list = [torch.diag(torch.randn(N, device=self.device))]
+        self.assertFalse(soap_utils.all_eigenbases_met_criteria(kronecker_factor_list, eigenbasis_list))


Flaky negative test — eigenbasis is not a valid orthonormal matrix

torch.diag(torch.randn(N, device=self.device)) produces a random diagonal matrix, not a random orthonormal matrix. The all_eigenbases_met_criteria function documents that eigenbasis_list should contain "orthonormal eigenbases", and internally calls eig_utils.conjugate(kronecker_factor, eigenbasis, diag=True) which assumes p is orthogonal.

While the test is very unlikely to fail in practice (the 1e-7 tolerance is extremely tight), a semantically correct negative test should use a randomly-rotated but misaligned orthonormal basis — e.g. via QR — rather than a random diagonal matrix that violates the function's own preconditions:

Q_random = torch.linalg.qr(torch.randn(N, N, device=self.device)).Q eigenbasis_list = [Q_random]

Using a proper orthonormal matrix makes the test verify the intended contract ("a correct orthonormal basis that doesn't diagonalize this particular matrix fails the criteria") rather than relying on an invalid input to produce the desired False return.

github-actions · 2026-03-06T01:41:04Z

Test Results

46 files + 12 96 suites +38 1m 27s ⏱️ +18s
955 tests + 38 955 ✅ + 38 0 💤 ±0 0 ❌ ±0
2 141 runs +316 2 141 ✅ +316 0 💤 ±0 0 ❌ ±0

Results for commit b4ea359. ± Comparison against base commit 7056267.

This pull request removes 2 and adds 40 tests. Note that renamed tests count towards both.

__main__.ScalarOptimizerTest ‑ test_calculate_ademamix_update_with_alpha_zero_equals_adam
__main__.SoapFunctionsTest ‑ test_soap_optimizer_class_based_schedule

__main__.DistributedNewtonSchulzCpuTest ‑ test_1step_close_to_non_distributed0 (shape=(3, 32))
__main__.DistributedNewtonSchulzCpuTest ‑ test_1step_close_to_non_distributed1 (shape=(5, 100))
__main__.DistributedNewtonSchulzCpuTest ‑ test_1step_with_partial_tp_close_to_non_distributed0 (shape=(32, 3), transpose=True, tp_size=2)
__main__.DistributedNewtonSchulzCpuTest ‑ test_1step_with_partial_tp_close_to_non_distributed1 (shape=(5, 100), transpose=False, tp_size=4)
__main__.DistributedNewtonSchulzCpuTest ‑ test_5steps_with_transpose_close_to_non_distributed0 (shape=(32, 3), transpose=True)
__main__.DistributedNewtonSchulzCpuTest ‑ test_5steps_with_transpose_close_to_non_distributed1 (shape=(5, 100), transpose=False)
__main__.DistributedNewtonSchulzCpuTest ‑ test_distributed_normalize_close_to_non_distributed0 (shape=(21, 16))
__main__.DistributedNewtonSchulzCpuTest ‑ test_distributed_normalize_close_to_non_distributed1 (shape=(16, 32))
__main__.DistributedNewtonSchulzStepCpuTest ‑ test_close_to_non_distributed0 (shape=(21, 16))
__main__.DistributedNewtonSchulzStepCpuTest ‑ test_close_to_non_distributed1 (shape=(16, 32))
…

skyw added 8 commits March 5, 2026 14:23

collect report for CPU tests

a70ed89

Signed-off-by: Hao Wu <skyw@nvidia.com>

add test for empty dim

1ca40c3

Signed-off-by: Hao Wu <skyw@nvidia.com>

add test for all_eigenbases_met_criteria, fix bug

3b0fc7e

Signed-off-by: Hao Wu <skyw@nvidia.com>

use 2d empty as place holder for kronecker factor

216bad2

Signed-off-by: Hao Wu <skyw@nvidia.com>

fix criteria check bug and add test

2ffc244

Signed-off-by: Hao Wu <skyw@nvidia.com>

add more coverage for calculate_sim_ademamix_update

ceeaf64

Signed-off-by: Hao Wu <skyw@nvidia.com>

add more coverage for lion

66fefd5

Signed-off-by: Hao Wu <skyw@nvidia.com>

improve test for schedule functions

362c781

Signed-off-by: Hao Wu <skyw@nvidia.com>

skyw requested a review from a team as a code owner March 6, 2026 00:40

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

skyw and others added 2 commits March 5, 2026 16:52

Update tests/test_scalar_optimizers.py

7425047

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Hao Wu <skyw@users.noreply.github.com>

revert greptile mess

07659c4

Signed-off-by: Hao Wu <skyw@nvidia.com>

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

skip some flaky test

695ce68

Signed-off-by: Hao Wu <skyw@nvidia.com>

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

fix eigenvector vs eigenvalue

b4ea359

Signed-off-by: Hao Wu <skyw@nvidia.com>

copy-pr-bot bot deployed to test March 6, 2026 01:17 Active

copy-pr-bot bot temporarily deployed to nemo-ci March 6, 2026 01:17 Inactive

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to nemo-ci March 6, 2026 01:26 Inactive

copy-pr-bot bot deployed to nemo-ci March 6, 2026 01:26 Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve test coverage and report#119

Improve test coverage and report#119
skyw wants to merge 12 commits intomainfrom
skyw/ai_aided_test_improvement

skyw commented Mar 6, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Mar 6, 2026

Uh oh!

greptile-apps bot commented Mar 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

greptile-apps bot Mar 6, 2026

Uh oh!

skyw Mar 6, 2026

Uh oh!

greptile-apps bot Mar 6, 2026

Uh oh!

Uh oh!

greptile-apps bot Mar 6, 2026

Uh oh!

greptile-apps bot Mar 6, 2026

Uh oh!

skyw commented Mar 6, 2026

Uh oh!

greptile-apps bot Mar 6, 2026

Uh oh!

greptile-apps bot Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	self.assertEqual(Q_new_list[1].numel(), 0)
	self.assertEqual(Q_new_list[1].shape, (0, 0))

Conversation

skyw commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Mar 6, 2026

Uh oh!

greptile-apps bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

skyw Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

skyw commented Mar 6, 2026

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 6, 2026

Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

skyw commented Mar 6, 2026 •

edited

Loading

greptile-apps bot commented Mar 6, 2026 •

edited

Loading