Skip to content

Improve test coverage and report#119

Open
skyw wants to merge 12 commits intomainfrom
skyw/ai_aided_test_improvement
Open

Improve test coverage and report#119
skyw wants to merge 12 commits intomainfrom
skyw/ai_aided_test_improvement

Conversation

@skyw
Copy link
Contributor

@skyw skyw commented Mar 6, 2026

Half done by Claude Code and I reviewed AI written code.

some tiny bugs a fixed along with it.

skyw added 8 commits March 5, 2026 14:23
Signed-off-by: Hao Wu <skyw@nvidia.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
@skyw skyw requested a review from a team as a code owner March 6, 2026 00:40
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 6, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR improves test coverage and CI reporting across several optimizer components: it fixes a shape bug (torch.empty(0)torch.empty(0, 0)) in soap_utils.py, reformulates the met_approx_eigvals_criteria expression for clarity, and adds a broad set of new tests for SOAP schedule classes, all_eigenbases_met_criteria, Lion updates, and AdEMAMix. CI reporting is upgraded to emit per-test JUnit XML reports, and distributed tests now correctly produce per-rank output files.

Key points:

  • Bug fix: Empty kronecker factor guard now returns a proper 2-D (0, 0) tensor rather than a 1-D (0,) tensor, ensuring downstream shape assumptions are met.
  • CI reporting: L0_Tests_CPU.sh is refactored to loop over test files and pass --xml_output_file to each; test_distributed_muon_utils_cpu.py appends the rank index to the output file name to prevent collisions.
  • New schedule tests: ScheduleTest is moved to its own class and extended with CosineSchedule and StepSchedule coverage, including error-path validation.
  • CI gap: tests/test_soap.py and tests/test_soap_utils.py — which contain the majority of the new tests — are not added to L0_Tests_CPU.sh, so those tests will not be run or reported in the CPU CI pipeline.
  • Negative eigenbasis test: test_all_eigenbases_met_criteria_random_eigenbasis_returns_false uses a random diagonal matrix as the eigenbasis rather than a random orthonormal matrix, which technically violates the function's documented preconditions, though it works in practice given the very tight default tolerance.

Confidence Score: 4/5

  • Safe to merge with minor follow-up: the production fixes are correct and the new tests are sound, but new test files are missing from the CI script.
  • The source-code changes (soap_utils.py shape fix, eig.py algebraic reformulation, pyproject.toml exclusion) are all correct and low-risk. New tests are well-structured and the previously-reported issues in test_scalar_optimizers.py have been fixed. The one gap is that test_soap.py and test_soap_utils.py are not wired into L0_Tests_CPU.sh, meaning the bulk of the new coverage is not exercised by CI. This doesn't block merging but should be addressed in a follow-up.
  • tests/ci/L0_Tests_CPU.sh — new test files should be added to the CI loop so the expanded coverage is actually enforced.

Important Files Changed

Filename Overview
emerging_optimizers/soap/soap_utils.py Fixes torch.empty(0)torch.empty(0, 0) in two empty-factor guard clauses in both get_eigenbasis_eigh and get_eigenbasis_qr, ensuring the returned tensor has the correct 2-D shape expected by downstream code.
emerging_optimizers/utils/eig.py Algebraically-equivalent reformulation of met_approx_eigvals_criteria return expression (mathematically identical to previous form) and minor docstring cleanups. No logic change.
tests/ci/L0_Tests_CPU.sh Adds XML report generation and refactors distributed test runs into a loop, but test_soap.py and test_soap_utils.py (which contain the bulk of the new tests in this PR) are not included, leaving new coverage unchecked by CI.
tests/test_scalar_optimizers.py Parameterizes AdEMAMix-equals-Adam test over correct_bias and num_beta_fast_warmup_steps, fixes centered=correct_bias propagation, and adds two new Lion update tests. The parameter num_beta_fast_warmup_steps is passed to calculate_ademamix_update as num_beta_slow_warmup_steps, which is semantically confusing (noted in a previous thread).
tests/test_soap_utils.py Adds empty-factor QR test, a zero-dim eigenbasis eigh case, and three all_eigenbases_met_criteria tests. The negative test uses a non-orthonormal diagonal matrix as the eigenbasis, which technically violates the function's preconditions, though it works in practice due to the tight tolerance.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["L0_Tests_CPU.sh"] -->|"torchrun n=8,4"| B["test_distributed_muon_utils_cpu.py"]
    B -->|"rank-suffix XML per process"| C["test-results/tests/\ntest_distributed_muon_utils_cpu_n8_rank0.xml\n..."]
    A -->|"coverage run loop"| D["test_scalar_optimizers.py\ntest_procrustes_step.py"]
    D -->|"XML report"| E["test-results/tests/\ntest_scalar_optimizers.py.xml\n..."]

    F["test_soap.py\n(ScheduleTest, SoapFunctionsTest)"] -.->|"NOT in CI loop"| A
    G["test_soap_utils.py\n(SoapUtilsTest)"] -.->|"NOT in CI loop"| A

    H["soap_utils.py\nget_eigenbasis_eigh / qr"] -->|"empty factor → torch.empty(0,0)"| I["Downstream code\nexpects 2-D tensor"]
    J["eig.py\nmet_approx_eigvals_criteria"] -->|"algebraically equivalent\nreformulation"| K["tolerance * ||K|| ≥ ||K|| - ||diag||"]
Loading

Last reviewed commit: b4ea359

Comment on lines +199 to +203
def test_all_eigenbases_met_criteria_true_eigenbasis_returns_true(self, N: int) -> None:
kronecker_factor_list = [torch.randn(N, N, device=self.device)]

eigenbasis_list = [torch.diag(torch.linalg.eigh(K).eigenvalues) for K in kronecker_factor_list]
self.assertTrue(soap_utils.all_eigenbases_met_criteria(kronecker_factor_list, eigenbasis_list))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong eigh attribute used — .eigenvalues instead of .eigenvectors

torch.linalg.eigh returns a named tuple with .eigenvalues (1-D vector λ) and .eigenvectors (N×N orthogonal matrix Q). The test wraps the 1-D eigenvalues in torch.diag(), producing a diagonal eigenvalue matrix D, and passes that to all_eigenbases_met_criteria.

However, the conjugate function (used internally) assumes its second argument is an orthogonal matrix. Passing a diagonal eigenvalue matrix instead breaks this invariant. The met_approx_eigvals_criteria check will compute a meaningless result and likely pass by chance, so the test does not validate the intended mathematical property.

Additionally, K = torch.randn(N, N) on line 200 is not symmetric; calling torch.linalg.eigh on it is undefined behaviour (PyTorch silently uses only the lower triangular part).

The test should construct a symmetric matrix and use the eigenvectors:

def test_all_eigenbases_met_criteria_true_eigenbasis_returns_true(self, N: int) -> None:
    g = torch.randn(N, N, device=self.device)
    K_sym = g @ g.T + torch.eye(N, device=self.device) * 1e-5  # symmetric PSD
    kronecker_factor_list = [K_sym]

    eigenbasis_list = [torch.linalg.eigh(K_sym).eigenvectors]
    self.assertTrue(soap_utils.all_eigenbases_met_criteria(kronecker_factor_list, eigenbasis_list))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkhona-nvidia , should we use eigen values or eigen vectors?

skyw and others added 2 commits March 5, 2026 16:52
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Hao Wu <skyw@users.noreply.github.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
exp_avg_sq_for_sim_ademamix,
num_beta_fast_warmup_steps=None,
num_beta_fast_warmup_steps=num_beta_fast_warmup_steps,
min_beta_fast=0.0,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameterized correct_bias is shadowed by local reassignment

Same issue as in the test_calculate_ademamix_update_with_alpha_zero_equals_adam function: the parameterized correct_bias argument (with values True and False) is immediately overwritten with correct_bias = False on this line. Both parameterized variants therefore always run with correct_bias=False, so the correct_bias=True branch (which would set centered=True in the RMSProp reference) is never actually exercised.

Suggested change
min_beta_fast=0.0,
step = 10
lr = 0.25

Remove the correct_bias = False override so the parameterized value flows through to both the calculate_sim_ademamix_update call (line 236) and the centered=correct_bias argument passed to torch.optim.RMSprop (line 252).

Signed-off-by: Hao Wu <skyw@nvidia.com>

self.assertEqual(len(Q_new_list), 2)
self.assertEqual(Q_new_list[0].shape, (N, N))
self.assertEqual(Q_new_list[1].numel(), 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test doesn't verify the actual (0, 0) shape fix

The assertion only checks numel() == 0, which would have passed even before the soap_utils.py fix that changed torch.empty(0) to torch.empty(0, 0). Both a 1-D empty tensor and a 2-D empty tensor have numel() == 0, so this test doesn't actually validate the fix it was written for.

Compare with test_get_eigenbasis_eigh which correctly asserts Q.shape == (0, 0) for the zero-dim case. This test should do the same:

Suggested change
self.assertEqual(Q_new_list[1].numel(), 0)
self.assertEqual(Q_new_list[1].shape, (0, 0))

Comment on lines +156 to +162
@parameterized.parameters(
{"correct_bias": True, "num_beta_fast_warmup_steps": None},
{"correct_bias": False, "num_beta_fast_warmup_steps": 2},
)
def test_calculate_ademamix_update_with_alpha_zero_equals_adam(
self, correct_bias: bool, num_beta_fast_warmup_steps: int | None
) -> None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading parameter name num_beta_fast_warmup_steps in AdEMAMix test

The parameterized argument is named num_beta_fast_warmup_steps, but inside the function body it is passed to the calculate_ademamix_update call as num_beta_slow_warmup_steps=num_beta_fast_warmup_steps (line 181). "Fast" and "slow" are separate concepts in AdEMAMix — using the wrong name here can mislead a reader into thinking the wrong warmup schedule is being varied.

By contrast, the sibling test test_calculate_sim_ademamix_update_with_zero_momentum_and_alpha_equals_rmsprop (line 207) uses the same parameter name and passes it correctly as num_beta_fast_warmup_steps=. Renaming the parameter in this test to num_beta_slow_warmup_steps would make the intent clear:

Suggested change
@parameterized.parameters(
{"correct_bias": True, "num_beta_fast_warmup_steps": None},
{"correct_bias": False, "num_beta_fast_warmup_steps": 2},
)
def test_calculate_ademamix_update_with_alpha_zero_equals_adam(
self, correct_bias: bool, num_beta_fast_warmup_steps: int | None
) -> None:
@parameterized.parameters(
{"correct_bias": True, "num_beta_slow_warmup_steps": None},
{"correct_bias": False, "num_beta_slow_warmup_steps": 2},
)
def test_calculate_ademamix_update_with_alpha_zero_equals_adam(
self, correct_bias: bool, num_beta_slow_warmup_steps: int | None
) -> None:

And update the usage on line 181 accordingly:

num_beta_slow_warmup_steps=num_beta_slow_warmup_steps,

Signed-off-by: Hao Wu <skyw@nvidia.com>
@skyw
Copy link
Contributor Author

skyw commented Mar 6, 2026

/ok to test b4ea359

Comment on lines +26 to +29
for test in "tests/test_scalar_optimizers.py" "tests/test_procrustes_step.py"; do
report_name="test-results/${test}.xml"
coverage run -p --source=emerging_optimizers $test --device=cpu -v -2 --xml_output_file="$report_name" || error=1
done
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test files excluded from CI

test_soap.py and test_soap_utils.py are not added to this script, so all the new test cases added in this PR — including ScheduleTest, test_all_eigenbases_met_criteria_*, and test_get_eigenbasis_qr_empty_factor — will not be executed in CI.

The for-loop only covers test_scalar_optimizers.py and test_procrustes_step.py. Since the PR's stated goal is to "improve test coverage and report", the new tests should also be wired into the CI script, for example:

for test in "tests/test_scalar_optimizers.py" "tests/test_procrustes_step.py" "tests/test_soap.py" "tests/test_soap_utils.py"; do
    report_name="test-results/${test}.xml"
    coverage run -p --source=emerging_optimizers $test --device=cpu  -v -2 --xml_output_file="$report_name" || error=1
done

Comment on lines +189 to +192
def test_all_eigenbases_met_criteria_random_eigenbasis_returns_false(self, N: int) -> None:
kronecker_factor_list = [torch.randn(N, N, device=self.device)]
eigenbasis_list = [torch.diag(torch.randn(N, device=self.device))]
self.assertFalse(soap_utils.all_eigenbases_met_criteria(kronecker_factor_list, eigenbasis_list))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flaky negative test — eigenbasis is not a valid orthonormal matrix

torch.diag(torch.randn(N, device=self.device)) produces a random diagonal matrix, not a random orthonormal matrix. The all_eigenbases_met_criteria function documents that eigenbasis_list should contain "orthonormal eigenbases", and internally calls eig_utils.conjugate(kronecker_factor, eigenbasis, diag=True) which assumes p is orthogonal.

While the test is very unlikely to fail in practice (the 1e-7 tolerance is extremely tight), a semantically correct negative test should use a randomly-rotated but misaligned orthonormal basis — e.g. via QR — rather than a random diagonal matrix that violates the function's own preconditions:

Q_random = torch.linalg.qr(torch.randn(N, N, device=self.device)).Q
eigenbasis_list = [Q_random]

Using a proper orthonormal matrix makes the test verify the intended contract ("a correct orthonormal basis that doesn't diagonalize this particular matrix fails the criteria") rather than relying on an invalid input to produce the desired False return.

@github-actions
Copy link

github-actions bot commented Mar 6, 2026

Test Results

   46 files  + 12     96 suites  +38   1m 27s ⏱️ +18s
  955 tests + 38    955 ✅ + 38  0 💤 ±0  0 ❌ ±0 
2 141 runs  +316  2 141 ✅ +316  0 💤 ±0  0 ❌ ±0 

Results for commit b4ea359. ± Comparison against base commit 7056267.

This pull request removes 2 and adds 40 tests. Note that renamed tests count towards both.
__main__.ScalarOptimizerTest ‑ test_calculate_ademamix_update_with_alpha_zero_equals_adam
__main__.SoapFunctionsTest ‑ test_soap_optimizer_class_based_schedule
__main__.DistributedNewtonSchulzCpuTest ‑ test_1step_close_to_non_distributed0 (shape=(3, 32))
__main__.DistributedNewtonSchulzCpuTest ‑ test_1step_close_to_non_distributed1 (shape=(5, 100))
__main__.DistributedNewtonSchulzCpuTest ‑ test_1step_with_partial_tp_close_to_non_distributed0 (shape=(32, 3), transpose=True, tp_size=2)
__main__.DistributedNewtonSchulzCpuTest ‑ test_1step_with_partial_tp_close_to_non_distributed1 (shape=(5, 100), transpose=False, tp_size=4)
__main__.DistributedNewtonSchulzCpuTest ‑ test_5steps_with_transpose_close_to_non_distributed0 (shape=(32, 3), transpose=True)
__main__.DistributedNewtonSchulzCpuTest ‑ test_5steps_with_transpose_close_to_non_distributed1 (shape=(5, 100), transpose=False)
__main__.DistributedNewtonSchulzCpuTest ‑ test_distributed_normalize_close_to_non_distributed0 (shape=(21, 16))
__main__.DistributedNewtonSchulzCpuTest ‑ test_distributed_normalize_close_to_non_distributed1 (shape=(16, 32))
__main__.DistributedNewtonSchulzStepCpuTest ‑ test_close_to_non_distributed0 (shape=(21, 16))
__main__.DistributedNewtonSchulzStepCpuTest ‑ test_close_to_non_distributed1 (shape=(16, 32))
…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant