Skip to content

Conversation

@isayev
Copy link
Contributor

@isayev isayev commented Dec 16, 2025

Summary

  • Add 6 new test files covering previously untested modules
  • Expand existing calculator tests with edge cases and validation
  • Increase test count from 18 to 152 tests (all passing)
  • Target coverage improvement from ~39% toward 65%

New Test Files

File Tests Coverage
tests/conftest.py - Shared fixtures
tests/test_nbops.py 25 Neighbor operations (modes 0, 1, 2)
tests/test_ops.py 34 Math ops, cutoffs, Coulomb matrices
tests/test_config.py 28 YAML/Jinja2 loading, module building
tests/test_lr.py 21 Long-range Coulomb (simple, DSF, Ewald)
tests/test_calculator_gpu.py 12 GPU inference, memory, consistency

Expanded Calculator Tests

44 new tests in test_calculator.py:

  • Input validation and error handling
  • Coulomb method switching
  • Batch processing (2D/3D formats)
  • Force, Hessian calculations
  • Translation/rotation invariance

Test plan

  • Run pytest tests/ - 152 tests passing
  • Run make check - linting passes
  • Run GPU tests on CUDA machine: pytest -m gpu

Add 6 new test files and expand existing calculator tests to improve
coverage from ~39% toward target of 65%:

New test files:
- tests/conftest.py: Shared fixtures for molecules, models, GPU checks
- tests/test_nbops.py: Neighbor operations (25 tests)
  - set_nb_mode, calc_masks, mask_ij_, mask_i_, get_ij, mol_sum
  - All three neighbor modes (0, 1, 2) with gradient flow tests
- tests/test_ops.py: Mathematical operations (34 tests)
  - Cutoff functions, distance calculations, Coulomb matrices
  - NSE charge equilibration, transition functions
- tests/test_config.py: Configuration system (28 tests)
  - get_module, load_yaml with Jinja2, build_module
  - dict_to_dotted/dotted_to_dict conversions
- tests/test_lr.py: Long-range Coulomb module (21 tests)
  - Simple, DSF, and Ewald methods with gradient tests
- tests/test_calculator_gpu.py: GPU-specific tests (12 tests)
  - CUDA inference, GPU/CPU consistency, memory management

Expanded tests/test_calculator.py (44 new tests):
- Input validation and error handling
- Coulomb method switching
- Batch processing (2D and 3D formats)
- Force, stress, Hessian calculations
- Translation and rotation invariance

Total: 152 tests passing (up from 18)
@claude
Copy link

claude bot commented Dec 16, 2025

Code Review - PR #31: Comprehensive Test Coverage

Overall Assessment

This is an excellent PR that significantly improves test coverage from ~39% to ~65% by adding 134 new tests (18→152). The tests are well-structured, comprehensive, and follow good testing practices. Great work! 🎉

Strengths

Test Organization

  • ✅ Well-organized into logical test classes (e.g., TestInputValidation, TestCoulombMethods)
  • ✅ Clear, descriptive test names following the pattern test_<what>_<condition>
  • ✅ Good use of docstrings to explain test purpose
  • ✅ Excellent fixtures in conftest.py promoting code reuse

Test Coverage

  • ✅ Comprehensive edge case testing (padding, batching, various input formats)
  • ✅ Good validation testing (missing keys, invalid inputs)
  • ✅ Physics-based correctness checks (translation/rotation invariance, symmetry)
  • ✅ GPU-specific tests properly isolated with markers

Code Quality

  • ✅ Consistent coding style throughout
  • ✅ Proper use of pytest features (fixtures, marks, parametrization)
  • ✅ Type hints in fixture signatures

Issues Found

🔴 Critical: Code Duplication in GPU Tests

Location: tests/test_calculator_gpu.py:80-95 and tests/test_calculator_gpu.py:113-128

The CPU calculator initialization code is duplicated across multiple tests. This is a maintenance burden and violates DRY principle.

Recommendation: Extract to a helper function or fixture:

@pytest.fixture
def cpu_calculator():
    """Create CPU calculator instance for comparison tests."""
    calc_cpu = AIMNet2Calculator.__new__(AIMNet2Calculator)
    calc_cpu.device = "cpu"
    from aimnet.calculators.model_registry import get_model_path
    
    p = get_model_path("aimnet2")
    calc_cpu.model = torch.jit.load(p, map_location="cpu")
    calc_cpu.cutoff = calc_cpu.model.cutoff
    calc_cpu.lr = hasattr(calc_cpu.model, "cutoff_lr")
    calc_cpu.cutoff_lr = getattr(calc_cpu.model, "cutoff_lr", float("inf")) if calc_cpu.lr else None
    calc_cpu.max_density = 0.2
    calc_cpu.nb_threshold = 0
    calc_cpu._batch = None
    calc_cpu._max_mol_size = 0
    calc_cpu._saved_for_grad = {}
    calc_cpu._coulomb_method = "simple"
    return calc_cpu

🟡 Medium: Potential Missing Attribute Initialization

Location: tests/test_calculator_gpu.py:81-95

Using __new__ to bypass __init__ is risky. If the AIMNet2Calculator.__init__ adds new attributes in the future, these tests will break or behave unexpectedly.

Recommendation: Either:

  1. Use a proper factory method in the calculator class
  2. Add a comment explaining why __new__ is necessary
  3. Add validation that all required attributes are set

🟡 Medium: Hard-coded Magic Numbers

Location: tests/test_calculator.py:49-50, tests/test_calculator_gpu.py:225-226

assert memory_allocated < 2e9  # Less than 2GB allocated
assert memory_reserved < 4e9  # Less than 4GB reserved
assert growth < 100e6  # Less than 100MB growth

These thresholds may fail on different GPU models or under different conditions.

Recommendation: Make these configurable or calculate dynamically based on available memory.

🟢 Minor: Inconsistent Error Message Matching

Location: tests/test_calculator.py:43-46

Some tests use specific error message matching (match="Missing key coord"), while others don't specify messages.

Recommendation: Consistently verify error messages to catch regression in error reporting quality.

🟢 Minor: Missing Device Fixture in Some Tests

Location: tests/test_calculator_gpu.py:140

test_nbmat_cuda_vs_cpu doesn't use the device fixture, instead hard-coding device selection.

Recommendation: Use the device fixture for consistency.

Performance Considerations

✅ Good Practices

  • Tests use torch.manual_seed(42) for reproducibility
  • Small test molecules used where appropriate (H2O instead of caffeine for expensive operations like Hessian)
  • GPU tests properly skip when CUDA unavailable

💡 Suggestions

  1. Parametrize similar tests: Tests like test_set_coulomb_simple, test_set_coulomb_dsf, test_set_coulomb_ewald could be parametrized:

    @pytest.mark.parametrize("method,kwargs", [
        ("simple", {}),
        ("dsf", {"cutoff": 12.0, "dsf_alpha": 0.25}),
        ("ewald", {"cutoff": 10.0}),
    ])
    def test_set_coulomb_method(self, method, kwargs):
        ...
  2. GPU memory tests: Consider using @pytest.mark.slow for memory leak tests since they run 10 iterations.

Security Considerations

✅ No security issues identified. The tests:

  • Don't execute arbitrary code
  • Don't access network resources
  • Use safe file operations with tempfile
  • Don't expose sensitive information

Test Coverage Quality

Excellent Coverage Of:

  • ✅ Input validation (missing keys, wrong types)
  • ✅ Edge cases (zero atoms, padding, batching)
  • ✅ Physical correctness (invariances, symmetry)
  • ✅ Multiple code paths (different nb_modes, Coulomb methods)
  • ✅ Error conditions

Potential Gaps:

  1. Numerical stability: No tests for extreme values (very large/small coordinates, charges)
  2. Concurrency: No tests for thread-safety if the calculator is used in multi-threaded context
  3. Resource cleanup: Limited tests for proper cleanup of GPU memory in error paths

Recommendations Summary

Must Fix (Before Merge):

  1. ❗ Refactor duplicated CPU calculator initialization code

Should Fix (High Priority):

  1. 🔧 Add validation/comments for __new__ usage pattern
  2. 🔧 Make memory threshold assertions more robust

Nice to Have:

  1. ✨ Parametrize similar test cases to reduce code
  2. ✨ Add tests for numerical edge cases
  3. ✨ Consistent error message validation

Conclusion

This is production-ready code with just one required refactoring (deduplication). The test suite is comprehensive, well-organized, and significantly improves the project's reliability. Once the duplication is addressed, this should be merged.

Approval Status:Approved with minor changes requested

Great work on improving test coverage! 🚀


Review generated by Claude Code

Addresses code review feedback by extracting duplicated CPU calculator
initialization code into a reusable create_cpu_calculator() helper function.
@claude
Copy link

claude bot commented Dec 16, 2025

Comprehensive Test Coverage Review

Thank you for this substantial contribution! This PR significantly improves test coverage from ~39% toward 65% with 152 tests.

Strengths

Code Quality

  • Excellent test organization with well-structured classes and clear docstrings
  • Comprehensive fixtures in conftest.py covering multiple test scenarios
  • Good coverage breadth across calculator, GPU, config, neighbor ops, and math ops
  • Proper use of pytest markers, fixtures, and parametrization

Test Design

  • Physical correctness tests verify translation/rotation invariance
  • Good edge case coverage for padding, batching, zero charges
  • Gradient flow verification
  • Proper error handling validation

Issues & Recommendations

1. GPU Test CPU Calculator Creation (High Priority)

Location: tests/test_calculator_gpu.py:36-59

The create_cpu_calculator() function uses new to bypass init, manually setting attributes. This is fragile and will break if the calculator structure changes.

Recommendation: Use unittest.mock.patch to control device or add a device parameter to init.

2. Code Duplication

The load_mol helper appears in both test_calculator.py and test_calculator_gpu.py. Move to conftest.py to maintain DRY principle.

3. Test Assertions

  • Error message matching is too strict - use flexible regex
  • Memory thresholds (2GB, 4GB) are arbitrary - make relative to model size
  • Tolerance values vary without rationale - document choices

4. Missing Test Coverage

Consider adding:

  1. Negative test cases for malformed inputs
  2. PBC edge cases
  3. Stress tensor tests
  4. Performance regression tests
  5. Numerical stability tests

5. Performance Considerations

  • Use scope=module for expensive fixtures like random_coords_100
  • Add GPU memory cleanup in teardown fixtures

6. Best Practices

  • Use pytest tmp_path fixture instead of manual tempfile cleanup
  • Document random seeds or use global pytest fixture

Priority Fixes

  1. High: Fix create_cpu_calculator() fragility
  2. Medium: Remove code duplication
  3. Medium: Add negative test cases
  4. Low: Improve documentation

Conclusion

This is an excellent contribution that substantially improves test coverage and reliability. The tests are well-structured and comprehensive.

The issues identified are mostly minor refinements. After addressing the high-priority create_cpu_calculator() issue, this PR will be in great shape to merge.

Recommendation: Approve after addressing Issue 1

Great work!

Signed-off-by: isayev <olexandr@olexandrisayev.com>
Signed-off-by: isayev <olexandr@olexandrisayev.com>
Signed-off-by: isayev <olexandr@olexandrisayev.com>
@claude
Copy link

claude bot commented Dec 31, 2025

Code Review: PR #31 - Comprehensive Test Coverage

Summary

This PR significantly improves test coverage from ~39% to ~65% by adding 134 new tests across 6 new test files. The changes include both new test modules and important bug fixes in the core codebase. Overall, this is an excellent contribution that will greatly improve code reliability.

Positive Highlights ✅

Test Organization

  • Excellent fixture design: tests/conftest.py provides well-structured, reusable fixtures with clear documentation
  • Comprehensive coverage: Tests cover critical modules (nbops, ops, config, lr, calculator, GPU functionality)
  • Good test structure: Tests are organized into logical classes (e.g., TestInputValidation, TestCoulombMethods)
  • Proper markers: GPU tests are correctly marked with @pytest.mark.gpu for selective execution

Code Quality

  • Clear test names: Descriptive test method names like test_translation_invariance, test_forces_sum_approximately_zero
  • Good documentation: Docstrings explain what each test validates
  • Helper functions: Reusable helpers like load_mol() and create_cpu_calculator() reduce duplication

Bug Fixes

  1. Critical fix in nbmat.py: Resolves CPU fallback issue by always importing CPU kernels and selecting appropriate kernels at runtime (lines 11-58)
  2. Typo fix: impemented_speciesimplemented_species in pt2jpt.py:72

Issues & Recommendations

1. GPU Test CPU Fallback Implementation ⚠️

Location: tests/test_calculator_gpu.py:42-59

The create_cpu_calculator() function bypasses normal initialization using __new__ to force CPU execution. While this works, it's fragile:

def create_cpu_calculator():
    calc = AIMNet2Calculator.__new__(AIMNet2Calculator)
    calc.device = "cpu"
    # ... manual attribute initialization

Recommendation: Consider adding a formal device parameter to AIMNet2Calculator.__init__() to avoid this workaround:

def __init__(self, model, device=None, **kwargs):
    self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")

2. Removed OOM Test 🔍

Location: Commit message mentions "remove test_handles_cuda_oom_gracefully"

Question: Why was the OOM test removed? This seems like important functionality to test. If it's flaky, consider using @pytest.mark.skipif with a condition or mocking the OOM condition rather than removing it entirely.

3. Test Coverage Gaps

While coverage improved significantly, consider adding tests for:

  • Error recovery scenarios (what happens after TooManyNeighborsError?)
  • Edge cases: empty molecules, single-atom molecules, extremely large molecules
  • Concurrent/parallel execution (if supported)
  • Model loading failures and error messages

4. Tolerance Values 📊

Location: Multiple test files

Several tests use hard-coded tolerances that may be too tight or too loose:

  • test_calculator.py:188: assert diff.item() < 1e-4 (Hessian symmetry)
  • test_calculator.py:244: assert abs(e1 - e2) < 1e-5 (translation invariance)
  • test_calculator_gpu.py:110: rtol=1e-5, atol=1e-6 (GPU/CPU energy consistency)

Recommendation: Document why these specific tolerances were chosen, or extract them as module-level constants with explanatory comments.

5. Type Annotations

Location: tests/conftest.py and test files

Great use of type hints in fixtures! Consider adding them consistently across all test functions for better IDE support and documentation.

6. Missing Assertions

Location: tests/test_calculator.py:147-160

The test_batched_input_3d test only checks shape but doesn't verify the energies are reasonable or compare them:

res = calc(data)
assert "energy" in res
assert res["energy"].shape == (2,)
# Consider adding: assert torch.isfinite(res["energy"]).all()
# Or: assert (res["energy"] < 0).all()  # if energies should be negative

7. Memory Leak Test Reliability ⚠️

Location: tests/test_calculator_gpu.py:241-261

The memory leak test uses hard-coded thresholds (100MB growth). This could be flaky on different hardware or CUDA versions. Consider:

  • Using percentage-based growth limits instead
  • Running more iterations to make growth more apparent
  • Adding explanatory comments about expected memory patterns

8. PBC Cell Fixture

Location: tests/conftest.py:152-155

The pbc_cell fixture creates a simple cubic cell but isn't used in any tests shown in the diff. If it's for future tests, consider adding a TODO comment. If it's unused, consider removing it to avoid confusion.

Security Considerations

✅ No security concerns identified:

  • Tests use safe synthetic data
  • No exposure of sensitive information
  • Proper use of temporary files where needed

Performance Considerations

  • GPU tests properly isolated with markers to avoid running on CPU-only systems
  • Test fixtures are efficiently scoped (function-level is appropriate for these tests)
  • Memory tests help catch performance regressions

Documentation

  • Consider adding a tests/README.md explaining:
    • How to run different test suites (pytest -m gpu, etc.)
    • What each test module covers
    • How to add new tests following the established patterns

Final Assessment

Overall Rating: ⭐⭐⭐⭐ (4/5)

This is a high-quality PR that significantly improves the project's test coverage and reliability. The test organization is excellent, and the bug fixes are important. The main concerns are around the CPU calculator workaround and the removed OOM test, which should be addressed before merging.

Recommendation: Approve with minor revisions

Before Merging:

  1. Address or document the create_cpu_calculator() workaround
  2. Clarify why the OOM test was removed
  3. Add finite energy checks to batch processing tests
  4. Consider adding tolerance value documentation

Nice to Have (Future PRs):

  • Add tests/README.md
  • Increase edge case coverage
  • Add parallel execution tests if applicable

Great work on this comprehensive test suite! The code quality and organization are excellent. 🎉

Reviewed by Claude Code

@isayev isayev merged commit 4c4f657 into main Dec 31, 2025
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants