-
Notifications
You must be signed in to change notification settings - Fork 30
try automated benchmarking #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a PR-focused GitHub Actions benchmarking workflow, a pytest-benchmark test module with fixtures and parametric tests, a dev dependency for pytest-benchmark, defensive OBO parse error handling in proforma, and conditional test skips for known XLMOD/OBO SyntaxError cases. Changes
Sequence Diagram(s)sequenceDiagram
participant GHA as GitHub Actions
participant Runner as Benchmark Runner
participant PR as PR Branch
participant Main as Main Branch
participant Script as Comment Builder
participant GHPR as PR Comment
GHA->>PR: Checkout PR code
GHA->>Runner: Install deps & run benchmarks (PR)
Runner-->>GHA: pr_results.json
GHA->>Main: Checkout main (spectrum_utils/)
alt main checkout OK
GHA->>Runner: Install main & run benchmarks
Runner-->>GHA: main_results.json
else checkout failed
GHA-->>GHA: create empty main_results.json
end
GHA->>Script: Build comparison markdown (PR vs Main)
Script-->>GHA: markdown payload
GHA->>GHPR: Post comment on PR
GHPR-->>GHA: posted
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Nitpick comments (4)
setup.cfg (1)
39-39: LGTM! Consider pinning the version.The addition of
pytest-benchmarkto dev dependencies is appropriate for the benchmarking infrastructure. However, consider pinning to a specific version or minimum version to ensure reproducible builds.Optional: Apply this diff to pin to a minimum version:
- pytest-benchmark + pytest-benchmark>=4.0.0pytest-benchmark.ini (1)
33-33: Consider if 10% regression threshold is appropriate.A 10% regression threshold (
compare-fail=mean:10%) may be too strict for initial benchmarking setup, potentially causing false positives due to system variance. Consider starting with a higher threshold (e.g., 20-30%) and tightening it once baseline stability is established.benchmarks/test_spectrum_benchmarks.py (2)
130-146: Inconsistent use of benchmark.extra_info.Line 136 sets
benchmark.extra_info['spectrum_size'], but line 154 also adds'implementation'. For consistency, both test methods should include the same metadata fields.Apply this diff to add implementation info to the regular test:
def test_creation_performance_comparison(self, benchmark, comparison_data, size): """Compare creation performance across different spectrum sizes.""" data = comparison_data[f"size_{size}"] # This will show in benchmark results which size is being tested benchmark.extra_info['spectrum_size'] = size + benchmark.extra_info['implementation'] = 'regular' # Benchmark regular spectrum creation
170-180: Prefer explicit operation chaining for clarity.The
create_and_processhelper uses method chaining but reassigns the spectrum variable multiple times. While functionally correct, this mixed style can be confusing.Consider using pure method chaining for clarity:
def test_memory_efficiency_regular(self, benchmark, sample_data): """Test memory usage of regular spectrum operations.""" def create_and_process(): - spectrum = MsmsSpectrum(**sample_data) - spectrum = spectrum.filter_intensity(min_intensity=100) - spectrum = spectrum.scale_intensity("sqrt") - spectrum = spectrum.round(decimals=2) - return spectrum + return (MsmsSpectrum(**sample_data) + .filter_intensity(min_intensity=0.01) + .scale_intensity(scaling="root") + .round(decimals=2)) result = benchmark(create_and_process) assert result is not NoneNote: This also incorporates the parameter fixes from previous comments.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.github/workflows/benchmarks.yml(1 hunks)benchmarks/test_spectrum_benchmarks.py(1 hunks)pytest-benchmark.ini(1 hunks)setup.cfg(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/test_spectrum_benchmarks.py (1)
spectrum_utils/spectrum.py (2)
MsmsSpectrum(217-808)MsmsSpectrumJit(30-214)
🪛 GitHub Actions: Lint
benchmarks/test_spectrum_benchmarks.py
[error] 1-1: ruff format check failed. 1 file would be reformatted (benchmarks/test_spectrum_benchmarks.py). Run 'ruff format' to fix.
🪛 GitHub Actions: Performance Benchmarks
benchmarks/test_spectrum_benchmarks.py
[error] 79-79: Test 'test_spectrum_round_jit' failed: TypeError: some keyword arguments unexpected.
[error] 91-91: Test 'test_spectrum_filter_intensity_jit' failed: TypeError: some keyword arguments unexpected.
[error] 103-103: Test 'test_spectrum_scale_intensity_jit' failed: TypeError: some keyword arguments unexpected.
[error] 170-170: Fixture 'sample_data' not found for test 'test_memory_efficiency_regular'.
[error] 182-182: Fixture 'sample_data' not found for test 'test_memory_efficiency_jit'.
🔇 Additional comments (2)
pytest-benchmark.ini (1)
30-30: I'll search for information about pytest-benchmark's baseline behavior with thecomparesetting.Let me search for the full pytest-benchmark.ini configuration and any CI workflows:
Based on my verification using web search of pytest-benchmark documentation:
No actionable issues found; review comment should be removed.
To create baseline '0001', the workflow needs
--benchmark-autosaveor--benchmark-saveoption, which can be added to the pytest configuration. After a baseline is saved, you can compare against it with--benchmark-compare=0001. The ini settingcompare = 0001is valid; it does not cause errors on first run—it simply won't perform comparison if the baseline doesn't exist yet, which is expected and not a problem that requires handling.If the workflow includes
--benchmark-autosavein its pytest configuration or CI setup, the baseline is created automatically on the first benchmark run, making this a non-issue. The review concern assumes a missing baseline is a problem, but this is standard pytest-benchmark workflow: save baselines first, then compare subsequent runs against them..github/workflows/benchmarks.yml (1)
44-54: Configuration is correct and compatible.The workflow properly generates pytest-benchmark JSON output via the
--benchmark-jsonflag (line 35) and the benchmark-action is correctly configured to consume it. The tool parameter is set to'pytest'and the output path matches between pytest generation and action input. The github-action-benchmark action supports pytest-benchmark as documented, and this workflow implements it correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
benchmarks/test_spectrum_benchmarks.py (2)
1-195: Run ruff format to fix formatting issues.The pipeline reports that ruff would reformat this file. This is consistent with a past review comment about formatting issues that still needs to be addressed.
Run the following to fix:
#!/bin/bash ruff format benchmarks/test_spectrum_benchmarks.py
105-115: Critical: Fix min_intensity parameter causing segmentation fault.The pipeline reports a segmentation fault at line 108. The
min_intensityparameter should be a relative value between 0.0 and 1.0 (representing a fraction of the maximum intensity), not an absolute value like 100. Using 100 causes all peaks to be filtered out, which leads to the crash.Apply this diff to fix both tests:
def test_spectrum_filter_intensity_regular(self, benchmark, sample_data): """Benchmark intensity filtering on regular spectrum.""" spectrum = MsmsSpectrum(**sample_data) - result = benchmark(spectrum.filter_intensity, min_intensity=100) + result = benchmark(spectrum.filter_intensity, min_intensity=0.01) assert result is not None def test_spectrum_filter_intensity_jit(self, benchmark, sample_data): """Benchmark intensity filtering on JIT spectrum.""" spectrum = MsmsSpectrumJit(**sample_data) - result = benchmark(spectrum.filter_intensity, 100.0) + result = benchmark(spectrum.filter_intensity, 0.01) assert result is not None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
benchmarks/test_spectrum_benchmarks.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/test_spectrum_benchmarks.py (1)
spectrum_utils/spectrum.py (2)
MsmsSpectrum(217-808)MsmsSpectrumJit(30-214)
🪛 GitHub Actions: Lint
benchmarks/test_spectrum_benchmarks.py
[error] 1-1: Ruff format check failed. 1 file would be reformatted by 'ruff format --check'. Run 'ruff format' to fix formatting.
🪛 GitHub Actions: Performance Benchmarks
benchmarks/test_spectrum_benchmarks.py
[error] 108-108: Segmentation fault during benchmark test 'test_spectrum_filter_intensity_regular' triggered by pytest-benchmark (exit code 139).
🔇 Additional comments (1)
benchmarks/test_spectrum_benchmarks.py (1)
130-167: LGTM! Well-structured parametrized benchmarks.The use of
pytest.mark.parametrizewithbenchmark.pedanticprovides controlled, reproducible performance comparisons across different spectrum sizes. Theextra_infometadata will help track results effectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
benchmarks/test_spectrum_benchmarks.py (1)
1-195: Fix ruff formatting so the lint pipeline passes.The lint job reports that
ruff format --check .would reformat this file (likely due to trailing spaces and spacing around decorators/blocks). Please run ruff’s formatter and commit the result so CI passes:#!/bin/bash ruff format benchmarks/test_spectrum_benchmarks.py ruff format --check benchmarks/test_spectrum_benchmarks.py
🧹 Nitpick comments (2)
benchmarks/test_spectrum_benchmarks.py (2)
15-67: Avoid repeatedly resetting the global NumPy RNG in fixtures.Each session-scoped fixture calls
np.random.seed(42), which repeatedly resets the global RNG state and can surprise other tests that rely on NumPy randomness. Consider using a local generator per fixture instead (e.g.,rng = np.random.default_rng(42)and thenrng.uniform/rng.exponential) so the benchmark data remain reproducible without mutating global RNG state.
170-195: Clarify that these “memory usage” tests are time benchmarks, or add real memory measurement.
test_memory_efficiency_*wrapcreate_and_processwith thebenchmarkfixture, which measures execution time rather than memory. Either adjust the docstrings/names to reflect timing-focused benchmarks or augment them with an actual memory-measurement tool if memory profiling is the goal.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
benchmarks/test_spectrum_benchmarks.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/test_spectrum_benchmarks.py (1)
spectrum_utils/spectrum.py (2)
MsmsSpectrum(217-808)MsmsSpectrumJit(30-214)
🪛 GitHub Actions: Lint
benchmarks/test_spectrum_benchmarks.py
[error] 1-1: ruff format --check . detected that 1 file would be reformatted. Run 'ruff format' to fix formatting.
🔇 Additional comments (1)
benchmarks/test_spectrum_benchmarks.py (1)
70-168: Benchmarks correctly exercise regular vs JIT spectra and fix earlier TypeErrors.The creation/operation and comparison benchmarks use
MsmsSpectrumwith keyword args andMsmsSpectrumJitwith positional args (round,filter_intensity,scale_intensity), which aligns with the library signatures and avoids the previous keyword-argument TypeErrors. Use ofbenchmarkandbenchmark.pedantic(includingextra_infometadata) looks sound and should give useful performance data across sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
.github/workflows/benchmarks.yml (1)
84-105: Clarify what “previous benchmark results” actually are and whether that matches your intent.
actions/download-artifact@v4without extra configuration only fetches artifacts from this same workflow run, soresults_3.11.jsonis the output of the matrixbenchmarkjob for the current commit, not a baseline from a prior run onmain/dev. Thepytest-benchmark compare --group-by=name current_results.json results_3.11.json || truecall itself is fine, but today it’s comparing two runs of the same code, not PR vs base branch.If your goal is true cross-commit comparison (PR vs main), consider instead:
- Pulling baseline JSON from the data stored by
github-action-benchmarkongh-pages, or- Using a dedicated PR benchmark action (e.g., github-action-pull-request-benchmark) that knows how to fetch base-branch results.
As-is, this step gives useful run-to-run noise information but not regression vs base.
benchmarks/test_spectrum_benchmarks.py (1)
95-197: JIT benchmarks now align with the APIs and avoid the previous argument issues.Using positional arguments for the JIT methods (
round,filter_intensity,scale_intensity) and switching to relativemin_intensity=0.01plus validscaling="root"brings these tests in line with the implementations inspectrum_utils.spectrumand should eliminate the earlier TypeErrors. The memory-usage benchmarks also mirror realistic operation chains on both regular and JIT spectra.If you care about isolating steady-state JIT performance, you might optionally add a one-time warmup call for
MsmsSpectrumJitoperations outside the measuredbenchmark(...)call so compilation cost doesn’t skew the measurements.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/benchmarks.yml(1 hunks)benchmarks/test_spectrum_benchmarks.py(1 hunks)pytest-benchmark.ini(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pytest-benchmark.ini
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/test_spectrum_benchmarks.py (1)
spectrum_utils/spectrum.py (2)
MsmsSpectrum(224-868)MsmsSpectrumJit(37-221)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: build (macos-latest, 3.12)
- GitHub Check: build (windows-latest, 3.11)
- GitHub Check: build (windows-latest, 3.12)
- GitHub Check: build (windows-latest, 3.10)
- GitHub Check: build (ubuntu-latest, 3.11)
- GitHub Check: build (macos-latest, 3.10)
🔇 Additional comments (2)
.github/workflows/benchmarks.yml (1)
38-61: Benchmark output paths and artifacts are now wired consistently.Running pytest from the repo root with
--benchmark-json=benchmarks/results_${{ matrix.python-version }}.jsonand reusing that same path for bothgithub-action-benchmarkandupload-artifactkeeps all consumers in sync and fixes the earlier directory mismatch. No further changes needed here.benchmarks/test_spectrum_benchmarks.py (1)
15-68: Fixtures and synthetic datasets look consistent and reusable.The three session-scoped fixtures (
sample_data,large_sample_data,comparison_data) generate well-structured dicts with consistent keys, making it easy to construct bothMsmsSpectrumandMsmsSpectrumJitacross benchmarks and parameterized tests. Nothing blocking here.
🚀 Performance Benchmark Results
Times are in milliseconds. Lower is better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
.github/workflows/benchmarks.yml (1)
107-107: Incorrect pytest-benchmark executable name—should still be fixed.Per the past review, line 107 uses
pytest-benchmark compare, but the correct executable per pytest-benchmark documentation ispy.test-benchmark compare(or alternativelypytest --benchmark-compare). The current command will likely fail at runtime.Apply this diff:
- pytest-benchmark compare --group-by=name current_results.json results_3.11.json || true + py.test-benchmark compare --group-by=name current_results.json results_3.11.json || trueAlternatively, if you prefer to run through pytest:
- pytest-benchmark compare --group-by=name current_results.json results_3.11.json || true + pytest --benchmark-compare=current_results.json --group-by=name results_3.11.json || true
🧹 Nitpick comments (1)
.github/workflows/benchmarks.yml (1)
3-13: Path filter includes benchmark-adjacent files correctly.The trigger conditions appropriately filter to run only on code and configuration changes relevant to benchmarking. However, consider whether
pyproject.toml(line 13) should also trigger runs if it's not used by this project—if so, removing it avoids unnecessary CI runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/benchmarks.yml (1)
3-7: Remove unusedpushtrigger or separate jobs by event type.The workflow triggers on both
push(to main/dev) andpull_request(to main/dev), but the only job (benchmark-comparison) is conditioned ongithub.event_name == 'pull_request'(line 23), making thepushtrigger unused. Either remove thepushtrigger or add a separate job to handle push events.on: - push: - branches: [ main, dev ] pull_request: branches: [ main, dev ] # Only run on performance-related changes paths:Alternatively, if push events are intentional (e.g., for tracking performance on main), keep the trigger and add a separate job.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/benchmarks.yml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build (windows-latest, 3.12)
- GitHub Check: build (windows-latest, 3.11)
- GitHub Check: build (macos-latest, 3.10)
- GitHub Check: build (windows-latest, 3.10)
- GitHub Check: build (macos-latest, 3.11)
🔇 Additional comments (3)
.github/workflows/benchmarks.yml (3)
36-43: Installation and PR benchmark execution look good.The dependency setup correctly installs dev dependencies, and the benchmark command uses correct paths with output directed to
benchmarks/pr_results.json.
45-74: Verify cross-branch comparison behavior fordevbranch PRs.Line 51 always checks out and compares against
origin/main, regardless of the target branch. For PRs to thedevbranch, this compares performance againstmaininstead ofdev, which may not reflect the intended baseline. Consider using the target branch dynamically:- if git checkout origin/main -- spectrum_utils 2>/dev/null; then + TARGET_BRANCH="${{ github.base_ref }}" + if git checkout "origin/${TARGET_BRANCH}" -- spectrum_utils 2>/dev/null; thenThis ensures comparisons are against the target branch (main for main PRs, dev for dev PRs). Verify this matches the PR objectives and is the desired behavior.
76-169: Comparison logic and error handling are sound.The benchmark comparison script correctly:
- Calculates percentage changes (line 131):
(delta / mainMean) * 100✅- Applies 5% significance threshold (line 134):
Math.abs(changePercent) < 5✅- Handles missing main results (lines 92-105) and new benchmarks (lines 146-149) ✅
- Provides detailed comparison table with delta, percentage, and status indicators ✅
Error handling at lines 170–172 intentionally catches errors and logs without failing the workflow, preserving PR functionality even if benchmarking fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
benchmarks/test_spectrum_benchmarks.py (1)
1-221: Fix formatting to pass ruff checks.The pipeline reports that ruff would reformat this file. Run
ruff format benchmarks/test_spectrum_benchmarks.pylocally to fix the formatting issues and ensure the Lint workflow passes.#!/bin/bash # Check what formatting changes ruff would make ruff format --check --diff benchmarks/test_spectrum_benchmarks.py
🧹 Nitpick comments (2)
.github/workflows/benchmarks.yml (2)
45-74: Consider adding error handling and cleanup to the branch-switching logic.The current approach of manually swapping directories and reinstalling works, but could be more robust. Consider:
- Add cleanup at the start to handle previous failed runs:
rm -rf spectrum_utils_pr 2>/dev/null || true
- Add error handling to restore PR code if any step fails:
trap 'rm -rf spectrum_utils; mv spectrum_utils_pr spectrum_utils 2>/dev/null || true' ERR
- Consider using git worktrees instead of manual directory swapping for cleaner isolation.
76-174: Comment generation logic is well-structured.The JavaScript code handles the comparison comprehensively with good error handling. The 5% threshold and status categorization provide clear, actionable feedback.
One optional enhancement: consider adding validation for the JSON structure before processing:
if (!prResults.benchmarks || !Array.isArray(prResults.benchmarks)) { throw new Error('Invalid PR results structure'); }This would provide clearer error messages if the benchmark JSON format changes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/benchmarks.yml(1 hunks)benchmarks/test_spectrum_benchmarks.py(1 hunks)pytest-benchmark.ini(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pytest-benchmark.ini
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/test_spectrum_benchmarks.py (1)
spectrum_utils/spectrum.py (2)
MsmsSpectrum(224-868)MsmsSpectrumJit(37-221)
🪛 GitHub Actions: Lint
benchmarks/test_spectrum_benchmarks.py
[error] 1-1: ruff format --check . failed with exit code 1. 1 file would be reformatted: benchmarks/test_spectrum_benchmarks.py. Run 'ruff format .' to fix.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build (macos-latest, 3.11)
- GitHub Check: build (windows-latest, 3.12)
- GitHub Check: build (windows-latest, 3.10)
- GitHub Check: build (windows-latest, 3.11)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
benchmarks/test_spectrum_benchmarks.py (1)
15-67: Consider consistent intensity distributions across fixtures.The
sample_dataandlarge_sample_datafixtures usenp.random.exponential(1000)for intensity values, whilecomparison_datausesnp.random.uniform(0, 1). This inconsistency could make cross-fixture performance comparisons less meaningful, though each fixture may serve different benchmark purposes.If the fixtures are intended to be comparable, consider using the same intensity distribution:
for size in sizes: mz = np.sort(np.random.uniform(100, 2000, size)) - intensity = np.random.uniform(0, 1, size) + intensity = np.random.exponential(1000, size) datasets[f"size_{size}"] = {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
benchmarks/test_spectrum_benchmarks.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/test_spectrum_benchmarks.py (1)
spectrum_utils/spectrum.py (2)
MsmsSpectrum(224-868)MsmsSpectrumJit(37-221)
🔇 Additional comments (1)
benchmarks/test_spectrum_benchmarks.py (1)
1-221: Well-structured benchmark suite with good practices.The overall benchmark structure is solid:
- Session-scoped fixtures for efficiency
- Proper JIT warmup strategy (create separate warmup instances)
- Good use of
benchmark.pedanticwith explicit rounds in comparison tests- Effective use of
benchmark.extra_infofor annotating results with metadata- The
TestMemoryUsagetests demonstrate the correct pattern for benchmarking in-place operationsOnce the mutation issues in the JIT operation tests are addressed, this will provide reliable performance metrics for
spectrum_utils.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
benchmarks/test_spectrum_benchmarks.py (2)
15-67: Fixtures are clear; consider avoiding global RNG seeding side effectsThe session-scoped fixtures produce reproducible spectra, which is good for benchmarks. To avoid mutating NumPy’s global RNG state (and potentially affecting other tests), consider switching from repeated
np.random.seed(42)calls to a local generator (e.g.,rng = np.random.default_rng(42)and thenrng.uniform(...),rng.exponential(...)) inside each fixture.
188-226: Unify benchmark.extra_info metadata for easier result comparisonIn
test_creation_performance_comparisonyou only setbenchmark.extra_info["spectrum_size"], while the JIT variant also tags"implementation": "jit". For symmetric reporting and simpler downstream analysis, consider tagging the regular implementation as well:def test_creation_performance_comparison( self, benchmark, comparison_data, size ): """Compare creation performance across different spectrum sizes.""" data = comparison_data[f"size_{size}"] # This will show in benchmark results which size is being tested - benchmark.extra_info["spectrum_size"] = size + benchmark.extra_info["spectrum_size"] = size + benchmark.extra_info["implementation"] = "regular"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
benchmarks/test_spectrum_benchmarks.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/test_spectrum_benchmarks.py (1)
spectrum_utils/spectrum.py (2)
MsmsSpectrum(224-868)MsmsSpectrumJit(37-221)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build (windows-latest, 3.11)
- GitHub Check: build (windows-latest, 3.12)
- GitHub Check: build (macos-latest, 3.10)
- GitHub Check: build (windows-latest, 3.10)
🔇 Additional comments (2)
benchmarks/test_spectrum_benchmarks.py (2)
73-107: Creation benchmarks and JIT warmup usage look correctUsing
benchmark.pedantic(..., rounds=50, iterations=1)withkwargs=sample_data/large_sample_datais idiomatic, and the separateMsmsSpectrumJit(**...)warmup before timing avoids contaminating the measured constructor cost. The simple length assertions keep overhead low while still validating results.
109-185: Operation benchmarks now correctly avoid in-place mutation skewWrapping
round,filter_intensity, andscale_intensityin small helpers that construct a freshMsmsSpectrum/MsmsSpectrumJitper iteration, plus doing JIT warmup on separate instances, fixes the earlier “benchmark on already-mutated spectrum” problem and uses positional args where required for jitclasses. The assertions and parameters (decimals=2,min_intensity=0.01,scaling="root") align with the spectrum_utils APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
benchmarks/test_spectrum_benchmarks.py (2)
15-67: Consider documenting the rationale for different intensity distributions.The
sample_dataandlarge_sample_datafixtures use exponential distribution for intensities, whilecomparison_datauses uniform distribution (0 to 1). Different distributions may affect benchmark results due to varying data characteristics (e.g., number of peaks filtered, computational complexity). If this is intentional for testing different scenarios, consider adding comments to document the rationale.
109-191: Consider using positional arguments consistently for clarity.The regular spectrum tests use keyword arguments (e.g.,
decimals=2,min_intensity=0.01,scaling="root"), while JIT tests use positional arguments. Although technically correct (MsmsSpectrum supports keywords, but MsmsSpectrumJit requires positional due to Numba jitclass constraints), using positional arguments consistently in both variants would improve clarity and make the API constraints more obvious.Example for consistency:
def test_spectrum_round_regular(self, benchmark, sample_data): """Benchmark rounding operation on regular spectrum.""" def setup(): return (MsmsSpectrum(**sample_data),), {} result = benchmark.pedantic( - lambda s: s.round(decimals=2), setup=setup, rounds=50, iterations=1 + lambda s: s.round(2), setup=setup, rounds=50, iterations=1 ) assert result is not None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
benchmarks/test_spectrum_benchmarks.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/test_spectrum_benchmarks.py (1)
spectrum_utils/spectrum.py (2)
MsmsSpectrum(224-868)MsmsSpectrumJit(37-221)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build (macos-latest, 3.10)
- GitHub Check: build (macos-latest, 3.12)
- GitHub Check: build (windows-latest, 3.11)
- GitHub Check: build (windows-latest, 3.12)
- GitHub Check: build (windows-latest, 3.10)
🔇 Additional comments (1)
benchmarks/test_spectrum_benchmarks.py (1)
194-232: LGTM! Well-structured parametric benchmarks.The comparison tests effectively leverage pytest's parametrization to benchmark across multiple spectrum sizes. The use of
benchmark.extra_infoto track size and implementation type will help with result analysis.
🚀 Performance Benchmark Results (Python 3.11)Comparing PR branch vs main branch
Summary
Changes smaller than ±5% are not considered significant. |
🚀 Performance Benchmark Results (Python 3.11)Comparing PR branch vs main branch
Summary
Changes smaller than ±5% are not considered significant. |
🚀 Performance Benchmark Results (Python 3.11)Comparing PR branch vs main branch
Summary
Changes smaller than ±5% are not considered significant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/proforma_test.py (1)
2370-2395: Skip logic is reasonable for known upstream issues.The try-except with conditional skip handles the known XLMOD OBO parsing issue without masking other errors. The dual condition check (
"XLMOD"and"OBO file"in error message) ensures only the specific known issue triggers the skip.Consider whether
pytest.xfailwould be more semantically appropriate thanpytest.skipto indicate this is an expected failure rather than a test that should be skipped. However, the current approach is acceptable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
spectrum_utils/proforma.py(2 hunks)tests/proforma_test.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/proforma_test.py (1)
spectrum_utils/proforma.py (9)
proteoform(373-395)proforma(366-371)parse(642-691)mass(334-342)accession(118-121)accession(124-125)name(128-133)name(136-137)_import_cv(695-823)
🪛 GitHub Actions: Run tests
tests/proforma_test.py
[error] 1-1: pytest failed: test_proforma_name failed, test_proforma_accession failed, test_proforma_xlink failed in tests/proforma_test.py
[error] 1-1: Multiple Proforma-related tests failed during pytest run.
spectrum_utils/proforma.py
[error] 858-865: SyntaxError: Failed to parse XLMOD controlled vocabulary OBO file. This is likely due to a format issue in the upstream vocabulary file. Original error: expected QuotedString. Command: pytest --cov=spectrum_utils --verbose tests/.
🔇 Additional comments (3)
spectrum_utils/proforma.py (2)
186-188: LGTM - Documentation updated correctly.The addition of
SyntaxErrorto the docstring accurately documents the new error handling behavior for OBO parsing failures.
857-867: The error handling is correct — no changes needed.fastobo.load() returns an OboDoc (the OBO document deserialized into an Abstract Syntax Tree), not a generator. Since it's eagerly evaluated, any
SyntaxErroris raised whenfastobo.load(obo_fh)is called on line 858, which is inside the try-except block. The error will be caught and re-raised with your custom message as intended.tests/proforma_test.py (1)
2453-2467: Consistent and correct skip logic for CV imports.This skip block follows the same pattern as the earlier change, properly scoping the skip to XLMOD with OBO parsing errors. The implementation correctly re-raises errors for other vocabularies.
🚀 Performance Benchmark Results (Python 3.11)Comparing PR branch vs main branch
Summary
Changes smaller than ±5% are not considered significant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/proforma_test.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/proforma_test.py (1)
spectrum_utils/proforma.py (8)
proforma(366-371)parse(642-691)mass(334-342)accession(118-121)accession(124-125)name(128-133)name(136-137)_import_cv(695-823)
🔇 Additional comments (1)
tests/proforma_test.py (1)
208-232: Pattern for handling known XLMOD issue is reasonable.The try-except pattern used here appropriately handles the known upstream XLMOD/OBO parsing issue by:
- Catching
SyntaxErrorspecifically- Checking for diagnostic strings before skipping
- Re-raising unexpected errors
- Providing clear skip messages
The string matching on
"XLMOD" in str(e) and "OBO file" in str(e)is pragmatic for tests, though it's fragile to upstream error message changes.Consider adding a test that verifies the XLMOD issue resolution once upstream fixes are available:
@pytest.mark.xfail(reason="XLMOD OBO parsing issue - remove xfail when upstream is fixed") def test_xlmod_parsing_recovered(): """Verify XLMOD parsing works again after upstream fix.""" proteoform = proforma.parse("EMEVTK[X:DSS#XL1]SESPEK")[0] assert proteoform.modifications[0].source[0].controlled_vocabulary == "XLMOD"Also applies to: 403-415, 2451-2476, 2534-2548
🚀 Performance Benchmark Results (Python 3.11)Comparing PR branch vs main branch
Summary
Changes smaller than ±5% are not considered significant. |
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.