try automated benchmarking #84

rukubrakov · 2025-11-18T19:57:30Z

Summary by CodeRabbit

Chores
- Added automated performance benchmarking to CI that runs on PRs and compares results with main; introduced a dev dependency for benchmarking.
Tests
- Added benchmark tests measuring spectrum creation and common operations across multiple data sizes and implementations.
Bug Fixes
- Improved parsing error handling and made tests resilient to a known upstream parsing/format issue by conditionally skipping affected scenarios.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-18T19:57:49Z

Walkthrough

Adds a PR-focused GitHub Actions benchmarking workflow, a pytest-benchmark test module with fixtures and parametric tests, a dev dependency for pytest-benchmark, defensive OBO parse error handling in proforma, and conditional test skips for known XLMOD/OBO SyntaxError cases.

Changes

Cohort / File(s)	Summary
CI/CD Benchmark Workflow `\.github/workflows/benchmarks.yml`	New "Performance Benchmarks" workflow that runs on pushes to main/dev and on PRs (path-filtered). Runs benchmarks on the PR branch, attempts to checkout and run benchmarks on main (creates empty main results if checkout fails), and posts a markdown comparison comment with deltas and significance logic.
Benchmark Tests `benchmarks/test_spectrum_benchmarks.py`	New pytest-benchmark module adding fixtures (`sample_data`, `large_sample_data`, `comparison_data`) and two test classes (`TestSpectrumPerformance`, `TestSpectrumComparison`) with pedantic/parametric benchmarks for regular vs JIT spectrum creation and operations (round, filter_intensity, scale_intensity).
Dev Dependency Update `setup.cfg`	Adds `pytest-benchmark` to development extras.
ProForma parsing robustness `spectrum_utils/proforma.py`	Wraps `fastobo.load` in try/except to catch `SyntaxError`, re-raises with contextual message including `cv_id`; assigns `frames` before iterating to avoid direct generator iteration.
Tests: conditional skip on upstream parse errors `tests/proforma_test.py`	Surrounds XLMOD/OBO parsing blocks with try/except to detect `SyntaxError` mentioning XLMOD/OBO and conditionally skip those tests while re-raising other exceptions.

Sequence Diagram(s)

sequenceDiagram
    participant GHA as GitHub Actions
    participant Runner as Benchmark Runner
    participant PR as PR Branch
    participant Main as Main Branch
    participant Script as Comment Builder
    participant GHPR as PR Comment

    GHA->>PR: Checkout PR code
    GHA->>Runner: Install deps & run benchmarks (PR)
    Runner-->>GHA: pr_results.json

    GHA->>Main: Checkout main (spectrum_utils/)
    alt main checkout OK
        GHA->>Runner: Install main & run benchmarks
        Runner-->>GHA: main_results.json
    else checkout failed
        GHA-->>GHA: create empty main_results.json
    end

    GHA->>Script: Build comparison markdown (PR vs Main)
    Script-->>GHA: markdown payload
    GHA->>GHPR: Post comment on PR
    GHPR-->>GHA: posted

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Focus review on:
- spectrum_utils/proforma.py — SyntaxError capture, message clarity, and preserved parsing semantics.
- tests/proforma_test.py — correctness of conditional skips (only XLMOD/OBO SyntaxError).
- .github/workflows/benchmarks.yml — checkout/restore logic and PR comment content/formatting.
- benchmarks/test_spectrum_benchmarks.py — correctness of fixtures, benchmark configuration, and JIT vs regular comparisons.

Possibly related PRs

fix tests and docs #83 — Changes also modify ProForma OBO parsing and caching; strong overlap with spectrum_utils/proforma.py.

Poem

🐰 I hopped through code to time each run,
Benchmarks humming under morning sun,
PR and main I gently test,
OBO guards keep parsing blessed,
Little rabbit grins — CI's at rest.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 77.27% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'try automated benchmarking' accurately describes the main objective of the pull request, which adds automated performance benchmarking infrastructure via a GitHub Actions workflow and benchmark test suite.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/automated-benchmarking

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f941067 and 66581b7.

📒 Files selected for processing (1)

tests/proforma_test.py (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/proforma_test.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: build (windows-latest, 3.10)
GitHub Check: build (windows-latest, 3.11)
GitHub Check: build (macos-latest, 3.10)
GitHub Check: build (windows-latest, 3.12)

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

🧹 Nitpick comments (4)

setup.cfg (1)
39-39: LGTM! Consider pinning the version.

The addition of pytest-benchmark to dev dependencies is appropriate for the benchmarking infrastructure. However, consider pinning to a specific version or minimum version to ensure reproducible builds.

Optional: Apply this diff to pin to a minimum version:
-    pytest-benchmark
+    pytest-benchmark>=4.0.0
pytest-benchmark.ini (1)

33-33: Consider if 10% regression threshold is appropriate.

A 10% regression threshold (compare-fail=mean:10%) may be too strict for initial benchmarking setup, potentially causing false positives due to system variance. Consider starting with a higher threshold (e.g., 20-30%) and tightening it once baseline stability is established.
benchmarks/test_spectrum_benchmarks.py (2)
130-146: Inconsistent use of benchmark.extra_info.

Line 136 sets benchmark.extra_info['spectrum_size'], but line 154 also adds 'implementation'. For consistency, both test methods should include the same metadata fields.

Apply this diff to add implementation info to the regular test:
     def test_creation_performance_comparison(self, benchmark, comparison_data, size):
         """Compare creation performance across different spectrum sizes."""
         data = comparison_data[f"size_{size}"]
         
         # This will show in benchmark results which size is being tested
         benchmark.extra_info['spectrum_size'] = size
+        benchmark.extra_info['implementation'] = 'regular'
         
         # Benchmark regular spectrum creation
170-180: Prefer explicit operation chaining for clarity.

The create_and_process helper uses method chaining but reassigns the spectrum variable multiple times. While functionally correct, this mixed style can be confusing.

Consider using pure method chaining for clarity:
     def test_memory_efficiency_regular(self, benchmark, sample_data):
         """Test memory usage of regular spectrum operations."""
         def create_and_process():
-            spectrum = MsmsSpectrum(**sample_data)
-            spectrum = spectrum.filter_intensity(min_intensity=100)
-            spectrum = spectrum.scale_intensity("sqrt") 
-            spectrum = spectrum.round(decimals=2)
-            return spectrum
+            return (MsmsSpectrum(**sample_data)
+                    .filter_intensity(min_intensity=0.01)
+                    .scale_intensity(scaling="root")
+                    .round(decimals=2))
             
         result = benchmark(create_and_process)
         assert result is not None
Note: This also incorporates the parameter fixes from previous comments.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a12f453 and 5ec48a4.

📒 Files selected for processing (4)

.github/workflows/benchmarks.yml (1 hunks)
benchmarks/test_spectrum_benchmarks.py (1 hunks)
pytest-benchmark.ini (1 hunks)
setup.cfg (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

benchmarks/test_spectrum_benchmarks.py (1)

spectrum_utils/spectrum.py (2)

MsmsSpectrum (217-808)

MsmsSpectrumJit (30-214)

🪛 GitHub Actions: Lint

benchmarks/test_spectrum_benchmarks.py

[error] 1-1: ruff format check failed. 1 file would be reformatted (benchmarks/test_spectrum_benchmarks.py). Run 'ruff format' to fix.

🪛 GitHub Actions: Performance Benchmarks

benchmarks/test_spectrum_benchmarks.py

[error] 79-79: Test 'test_spectrum_round_jit' failed: TypeError: some keyword arguments unexpected.

[error] 91-91: Test 'test_spectrum_filter_intensity_jit' failed: TypeError: some keyword arguments unexpected.

[error] 103-103: Test 'test_spectrum_scale_intensity_jit' failed: TypeError: some keyword arguments unexpected.

[error] 170-170: Fixture 'sample_data' not found for test 'test_memory_efficiency_regular'.

[error] 182-182: Fixture 'sample_data' not found for test 'test_memory_efficiency_jit'.

🔇 Additional comments (2)

pytest-benchmark.ini (1)

30-30: I'll search for information about pytest-benchmark's baseline behavior with the compare setting.

Let me search for the full pytest-benchmark.ini configuration and any CI workflows:

Based on my verification using web search of pytest-benchmark documentation:

No actionable issues found; review comment should be removed.

To create baseline '0001', the workflow needs --benchmark-autosave or --benchmark-save option, which can be added to the pytest configuration. After a baseline is saved, you can compare against it with --benchmark-compare=0001. The ini setting compare = 0001 is valid; it does not cause errors on first run—it simply won't perform comparison if the baseline doesn't exist yet, which is expected and not a problem that requires handling.

If the workflow includes --benchmark-autosave in its pytest configuration or CI setup, the baseline is created automatically on the first benchmark run, making this a non-issue. The review concern assumes a missing baseline is a problem, but this is standard pytest-benchmark workflow: save baselines first, then compare subsequent runs against them.

.github/workflows/benchmarks.yml (1)

44-54: Configuration is correct and compatible.

The workflow properly generates pytest-benchmark JSON output via the --benchmark-json flag (line 35) and the benchmark-action is correctly configured to consume it. The tool parameter is set to 'pytest' and the output path matches between pytest generation and action input. The github-action-benchmark action supports pytest-benchmark as documented, and this workflow implements it correctly.

.github/workflows/benchmarks.yml

benchmarks/test_spectrum_benchmarks.py

pytest-benchmark.ini

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

benchmarks/test_spectrum_benchmarks.py (2)
1-195: Run ruff format to fix formatting issues.

The pipeline reports that ruff would reformat this file. This is consistent with a past review comment about formatting issues that still needs to be addressed.

Run the following to fix:
#!/bin/bash
ruff format benchmarks/test_spectrum_benchmarks.py
105-115: Critical: Fix min_intensity parameter causing segmentation fault.

The pipeline reports a segmentation fault at line 108. The min_intensity parameter should be a relative value between 0.0 and 1.0 (representing a fraction of the maximum intensity), not an absolute value like 100. Using 100 causes all peaks to be filtered out, which leads to the crash.

Apply this diff to fix both tests:
     def test_spectrum_filter_intensity_regular(self, benchmark, sample_data):
         """Benchmark intensity filtering on regular spectrum."""
         spectrum = MsmsSpectrum(**sample_data)
-        result = benchmark(spectrum.filter_intensity, min_intensity=100)
+        result = benchmark(spectrum.filter_intensity, min_intensity=0.01)
         assert result is not None
 
     def test_spectrum_filter_intensity_jit(self, benchmark, sample_data):
         """Benchmark intensity filtering on JIT spectrum."""
         spectrum = MsmsSpectrumJit(**sample_data)
-        result = benchmark(spectrum.filter_intensity, 100.0)
+        result = benchmark(spectrum.filter_intensity, 0.01)
         assert result is not None

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ec48a4 and 0a8623a.

📒 Files selected for processing (1)

benchmarks/test_spectrum_benchmarks.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

benchmarks/test_spectrum_benchmarks.py (1)

spectrum_utils/spectrum.py (2)

MsmsSpectrum (217-808)

MsmsSpectrumJit (30-214)

🪛 GitHub Actions: Lint

benchmarks/test_spectrum_benchmarks.py

[error] 1-1: Ruff format check failed. 1 file would be reformatted by 'ruff format --check'. Run 'ruff format' to fix formatting.

🪛 GitHub Actions: Performance Benchmarks

benchmarks/test_spectrum_benchmarks.py

[error] 108-108: Segmentation fault during benchmark test 'test_spectrum_filter_intensity_regular' triggered by pytest-benchmark (exit code 139).

🔇 Additional comments (1)

benchmarks/test_spectrum_benchmarks.py (1)

130-167: LGTM! Well-structured parametrized benchmarks.

The use of pytest.mark.parametrize with benchmark.pedantic provides controlled, reproducible performance comparisons across different spectrum sizes. The extra_info metadata will help track results effectively.

benchmarks/test_spectrum_benchmarks.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

benchmarks/test_spectrum_benchmarks.py (1)
1-195: Fix ruff formatting so the lint pipeline passes.

The lint job reports that ruff format --check . would reformat this file (likely due to trailing spaces and spacing around decorators/blocks). Please run ruff’s formatter and commit the result so CI passes:
#!/bin/bash
ruff format benchmarks/test_spectrum_benchmarks.py
ruff format --check benchmarks/test_spectrum_benchmarks.py

🧹 Nitpick comments (2)

benchmarks/test_spectrum_benchmarks.py (2)

15-67: Avoid repeatedly resetting the global NumPy RNG in fixtures.

Each session-scoped fixture calls np.random.seed(42), which repeatedly resets the global RNG state and can surprise other tests that rely on NumPy randomness. Consider using a local generator per fixture instead (e.g., rng = np.random.default_rng(42) and then rng.uniform / rng.exponential) so the benchmark data remain reproducible without mutating global RNG state.

170-195: Clarify that these “memory usage” tests are time benchmarks, or add real memory measurement.

test_memory_efficiency_* wrap create_and_process with the benchmark fixture, which measures execution time rather than memory. Either adjust the docstrings/names to reflect timing-focused benchmarks or augment them with an actual memory-measurement tool if memory profiling is the goal.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a8623a and 0ce6b71.

📒 Files selected for processing (1)

benchmarks/test_spectrum_benchmarks.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

benchmarks/test_spectrum_benchmarks.py (1)

spectrum_utils/spectrum.py (2)

MsmsSpectrum (217-808)

MsmsSpectrumJit (30-214)

🪛 GitHub Actions: Lint

benchmarks/test_spectrum_benchmarks.py

[error] 1-1: ruff format --check . detected that 1 file would be reformatted. Run 'ruff format' to fix formatting.

🔇 Additional comments (1)

benchmarks/test_spectrum_benchmarks.py (1)

70-168: Benchmarks correctly exercise regular vs JIT spectra and fix earlier TypeErrors.

The creation/operation and comparison benchmarks use MsmsSpectrum with keyword args and MsmsSpectrumJit with positional args (round, filter_intensity, scale_intensity), which aligns with the library signatures and avoids the previous keyword-argument TypeErrors. Use of benchmark and benchmark.pedantic (including extra_info metadata) looks sound and should give useful performance data across sizes.

…chmarking

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

.github/workflows/benchmarks.yml (1)

84-105: Clarify what “previous benchmark results” actually are and whether that matches your intent.

actions/download-artifact@v4 without extra configuration only fetches artifacts from this same workflow run, so results_3.11.json is the output of the matrix benchmark job for the current commit, not a baseline from a prior run on main/dev. The pytest-benchmark compare --group-by=name current_results.json results_3.11.json || true call itself is fine, but today it’s comparing two runs of the same code, not PR vs base branch.

If your goal is true cross-commit comparison (PR vs main), consider instead:

Pulling baseline JSON from the data stored by github-action-benchmark on gh-pages, or

Using a dedicated PR benchmark action (e.g., github-action-pull-request-benchmark) that knows how to fetch base-branch results.

As-is, this step gives useful run-to-run noise information but not regression vs base.

benchmarks/test_spectrum_benchmarks.py (1)

95-197: JIT benchmarks now align with the APIs and avoid the previous argument issues.

Using positional arguments for the JIT methods (round, filter_intensity, scale_intensity) and switching to relative min_intensity=0.01 plus valid scaling="root" brings these tests in line with the implementations in spectrum_utils.spectrum and should eliminate the earlier TypeErrors. The memory-usage benchmarks also mirror realistic operation chains on both regular and JIT spectra.

If you care about isolating steady-state JIT performance, you might optionally add a one-time warmup call for MsmsSpectrumJit operations outside the measured benchmark(...) call so compilation cost doesn’t skew the measurements.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 999d5d6 and 82f9232.

📒 Files selected for processing (3)

.github/workflows/benchmarks.yml (1 hunks)
benchmarks/test_spectrum_benchmarks.py (1 hunks)
pytest-benchmark.ini (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

pytest-benchmark.ini

🧰 Additional context used

🧬 Code graph analysis (1)

benchmarks/test_spectrum_benchmarks.py (1)

spectrum_utils/spectrum.py (2)

MsmsSpectrum (224-868)

MsmsSpectrumJit (37-221)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: build (macos-latest, 3.12)
GitHub Check: build (windows-latest, 3.11)
GitHub Check: build (windows-latest, 3.12)
GitHub Check: build (windows-latest, 3.10)
GitHub Check: build (ubuntu-latest, 3.11)
GitHub Check: build (macos-latest, 3.10)

🔇 Additional comments (2)

.github/workflows/benchmarks.yml (1)

38-61: Benchmark output paths and artifacts are now wired consistently.

Running pytest from the repo root with --benchmark-json=benchmarks/results_${{ matrix.python-version }}.json and reusing that same path for both github-action-benchmark and upload-artifact keeps all consumers in sync and fixes the earlier directory mismatch. No further changes needed here.

benchmarks/test_spectrum_benchmarks.py (1)

15-68: Fixtures and synthetic datasets look consistent and reusable.

The three session-scoped fixtures (sample_data, large_sample_data, comparison_data) generate well-structured dicts with consistent keys, making it easy to construct both MsmsSpectrum and MsmsSpectrumJit across benchmarks and parameterized tests. Nothing blocking here.

github-actions · 2025-11-20T14:32:23Z

🚀 Performance Benchmark Results

Benchmark	Mean	Min	Max	Rounds
test_spectrum_creation_regular	0.053ms	0.037ms	0.103ms	5
test_spectrum_creation_jit	0.040ms	0.030ms	0.072ms	5
test_spectrum_creation_large_regular	0.162ms	0.156ms	0.282ms	4600
test_spectrum_creation_large_jit	0.157ms	0.151ms	0.295ms	5557
test_spectrum_round_regular	0.063ms	0.048ms	0.109ms	5
test_spectrum_round_jit	0.021ms	0.019ms	0.027ms	5
test_spectrum_filter_intensity_regular	0.036ms	0.019ms	0.078ms	5
test_spectrum_filter_intensity_jit	0.035ms	0.019ms	0.074ms	5
test_spectrum_scale_intensity_regular	0.068ms	0.047ms	0.132ms	5
test_spectrum_scale_intensity_jit	0.025ms	0.016ms	0.058ms	5
test_creation_performance_comparison[100]	0.032ms	0.024ms	0.081ms	10
test_creation_performance_comparison[1000]	0.040ms	0.034ms	0.056ms	10
test_creation_performance_comparison[5000]	0.096ms	0.088ms	0.121ms	10
test_creation_performance_comparison[10000]	0.169ms	0.160ms	0.192ms	10
test_jit_creation_performance_comparison[100]	0.024ms	0.018ms	0.045ms	10
test_jit_creation_performance_comparison[1000]	0.033ms	0.028ms	0.047ms	10
test_jit_creation_performance_comparison[5000]	0.088ms	0.082ms	0.114ms	10
test_jit_creation_performance_comparison[10000]	0.161ms	0.155ms	0.186ms	10
test_memory_efficiency_regular	0.171ms	0.162ms	0.390ms	3219
test_memory_efficiency_jit	0.091ms	0.087ms	0.173ms	6336

Times are in milliseconds. Lower is better.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

.github/workflows/benchmarks.yml (1)
107-107: Incorrect pytest-benchmark executable name—should still be fixed.

Per the past review, line 107 uses pytest-benchmark compare, but the correct executable per pytest-benchmark documentation is py.test-benchmark compare (or alternatively pytest --benchmark-compare). The current command will likely fail at runtime.

Apply this diff:
-          pytest-benchmark compare --group-by=name current_results.json results_3.11.json || true
+          py.test-benchmark compare --group-by=name current_results.json results_3.11.json || true
Alternatively, if you prefer to run through pytest:
-          pytest-benchmark compare --group-by=name current_results.json results_3.11.json || true
+          pytest --benchmark-compare=current_results.json --group-by=name results_3.11.json || true

🧹 Nitpick comments (1)

.github/workflows/benchmarks.yml (1)

3-13: Path filter includes benchmark-adjacent files correctly.

The trigger conditions appropriately filter to run only on code and configuration changes relevant to benchmarking. However, consider whether pyproject.toml (line 13) should also trigger runs if it's not used by this project—if so, removing it avoids unnecessary CI runs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82f9232 and 4975d4a.

📒 Files selected for processing (1)

.github/workflows/benchmarks.yml (1 hunks)

.github/workflows/benchmarks.yml

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

.github/workflows/benchmarks.yml (1)
3-7: Remove unused push trigger or separate jobs by event type.

The workflow triggers on both push (to main/dev) and pull_request (to main/dev), but the only job (benchmark-comparison) is conditioned on github.event_name == 'pull_request' (line 23), making the push trigger unused. Either remove the push trigger or add a separate job to handle push events.
 on:
-  push:
-    branches: [ main, dev ]
   pull_request:
     branches: [ main, dev ]
     # Only run on performance-related changes
     paths:
Alternatively, if push events are intentional (e.g., for tracking performance on main), keep the trigger and add a separate job.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4975d4a and cb222ed.

📒 Files selected for processing (1)

.github/workflows/benchmarks.yml (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: build (windows-latest, 3.12)
GitHub Check: build (windows-latest, 3.11)
GitHub Check: build (macos-latest, 3.10)
GitHub Check: build (windows-latest, 3.10)
GitHub Check: build (macos-latest, 3.11)

🔇 Additional comments (3)

.github/workflows/benchmarks.yml (3)
36-43: Installation and PR benchmark execution look good.

The dependency setup correctly installs dev dependencies, and the benchmark command uses correct paths with output directed to benchmarks/pr_results.json.

45-74: Verify cross-branch comparison behavior for dev branch PRs.

Line 51 always checks out and compares against origin/main, regardless of the target branch. For PRs to the dev branch, this compares performance against main instead of dev, which may not reflect the intended baseline. Consider using the target branch dynamically:
-        if git checkout origin/main -- spectrum_utils 2>/dev/null; then
+        TARGET_BRANCH="${{ github.base_ref }}"
+        if git checkout "origin/${TARGET_BRANCH}" -- spectrum_utils 2>/dev/null; then
This ensures comparisons are against the target branch (main for main PRs, dev for dev PRs). Verify this matches the PR objectives and is the desired behavior.

76-169: Comparison logic and error handling are sound.

The benchmark comparison script correctly:

Calculates percentage changes (line 131): (delta / mainMean) * 100 ✅

Applies 5% significance threshold (line 134): Math.abs(changePercent) < 5 ✅

Handles missing main results (lines 92-105) and new benchmarks (lines 146-149) ✅

Provides detailed comparison table with delta, percentage, and status indicators ✅

Error handling at lines 170–172 intentionally catches errors and logs without failing the workflow, preserving PR functionality even if benchmarking fails.

…mount of rounds

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

benchmarks/test_spectrum_benchmarks.py (1)
1-221: Fix formatting to pass ruff checks.

The pipeline reports that ruff would reformat this file. Run ruff format benchmarks/test_spectrum_benchmarks.py locally to fix the formatting issues and ensure the Lint workflow passes.
#!/bin/bash
# Check what formatting changes ruff would make
ruff format --check --diff benchmarks/test_spectrum_benchmarks.py

🧹 Nitpick comments (2)

.github/workflows/benchmarks.yml (2)
45-74: Consider adding error handling and cleanup to the branch-switching logic.

The current approach of manually swapping directories and reinstalling works, but could be more robust. Consider:

Add cleanup at the start to handle previous failed runs:
rm -rf spectrum_utils_pr 2>/dev/null || true
Add error handling to restore PR code if any step fails:
trap 'rm -rf spectrum_utils; mv spectrum_utils_pr spectrum_utils 2>/dev/null || true' ERR
Consider using git worktrees instead of manual directory swapping for cleaner isolation.

76-174: Comment generation logic is well-structured.

The JavaScript code handles the comparison comprehensively with good error handling. The 5% threshold and status categorization provide clear, actionable feedback.

One optional enhancement: consider adding validation for the JSON structure before processing:
if (!prResults.benchmarks || !Array.isArray(prResults.benchmarks)) {
  throw new Error('Invalid PR results structure');
}
This would provide clearer error messages if the benchmark JSON format changes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb222ed and 554929a.

📒 Files selected for processing (3)

.github/workflows/benchmarks.yml (1 hunks)
benchmarks/test_spectrum_benchmarks.py (1 hunks)
pytest-benchmark.ini (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

pytest-benchmark.ini

🧰 Additional context used

🧬 Code graph analysis (1)

benchmarks/test_spectrum_benchmarks.py (1)

spectrum_utils/spectrum.py (2)

MsmsSpectrum (224-868)

MsmsSpectrumJit (37-221)

🪛 GitHub Actions: Lint

benchmarks/test_spectrum_benchmarks.py

[error] 1-1: ruff format --check . failed with exit code 1. 1 file would be reformatted: benchmarks/test_spectrum_benchmarks.py. Run 'ruff format .' to fix.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: build (macos-latest, 3.11)
GitHub Check: build (windows-latest, 3.12)
GitHub Check: build (windows-latest, 3.10)
GitHub Check: build (windows-latest, 3.11)

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

benchmarks/test_spectrum_benchmarks.py (1)
15-67: Consider consistent intensity distributions across fixtures.

The sample_data and large_sample_data fixtures use np.random.exponential(1000) for intensity values, while comparison_data uses np.random.uniform(0, 1). This inconsistency could make cross-fixture performance comparisons less meaningful, though each fixture may serve different benchmark purposes.

If the fixtures are intended to be comparable, consider using the same intensity distribution:
     for size in sizes:
         mz = np.sort(np.random.uniform(100, 2000, size))
-        intensity = np.random.uniform(0, 1, size)
+        intensity = np.random.exponential(1000, size)
         datasets[f"size_{size}"] = {

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 554929a and 5bbe668.

📒 Files selected for processing (1)

benchmarks/test_spectrum_benchmarks.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

benchmarks/test_spectrum_benchmarks.py (1)

spectrum_utils/spectrum.py (2)

MsmsSpectrum (224-868)

MsmsSpectrumJit (37-221)

🔇 Additional comments (1)

benchmarks/test_spectrum_benchmarks.py (1)

1-221: Well-structured benchmark suite with good practices.

The overall benchmark structure is solid:

Session-scoped fixtures for efficiency

Proper JIT warmup strategy (create separate warmup instances)

Good use of benchmark.pedantic with explicit rounds in comparison tests

Effective use of benchmark.extra_info for annotating results with metadata

The TestMemoryUsage tests demonstrate the correct pattern for benchmarking in-place operations

Once the mutation issues in the JIT operation tests are addressed, this will provide reliable performance metrics for spectrum_utils.

benchmarks/test_spectrum_benchmarks.py

…benchmarks

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

benchmarks/test_spectrum_benchmarks.py (2)
15-67: Fixtures are clear; consider avoiding global RNG seeding side effects

The session-scoped fixtures produce reproducible spectra, which is good for benchmarks. To avoid mutating NumPy’s global RNG state (and potentially affecting other tests), consider switching from repeated np.random.seed(42) calls to a local generator (e.g., rng = np.random.default_rng(42) and then rng.uniform(...), rng.exponential(...)) inside each fixture.

188-226: Unify benchmark.extra_info metadata for easier result comparison

In test_creation_performance_comparison you only set benchmark.extra_info["spectrum_size"], while the JIT variant also tags "implementation": "jit". For symmetric reporting and simpler downstream analysis, consider tagging the regular implementation as well:
     def test_creation_performance_comparison(
         self, benchmark, comparison_data, size
     ):
         """Compare creation performance across different spectrum sizes."""
         data = comparison_data[f"size_{size}"]

         # This will show in benchmark results which size is being tested
-        benchmark.extra_info["spectrum_size"] = size
+        benchmark.extra_info["spectrum_size"] = size
+        benchmark.extra_info["implementation"] = "regular"

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8398d14 and 34921ab.

📒 Files selected for processing (1)

benchmarks/test_spectrum_benchmarks.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

benchmarks/test_spectrum_benchmarks.py (1)

spectrum_utils/spectrum.py (2)

MsmsSpectrum (224-868)

MsmsSpectrumJit (37-221)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: build (windows-latest, 3.11)
GitHub Check: build (windows-latest, 3.12)
GitHub Check: build (macos-latest, 3.10)
GitHub Check: build (windows-latest, 3.10)

🔇 Additional comments (2)

benchmarks/test_spectrum_benchmarks.py (2)

73-107: Creation benchmarks and JIT warmup usage look correct

Using benchmark.pedantic(..., rounds=50, iterations=1) with kwargs=sample_data/large_sample_data is idiomatic, and the separate MsmsSpectrumJit(**...) warmup before timing avoids contaminating the measured constructor cost. The simple length assertions keep overhead low while still validating results.

109-185: Operation benchmarks now correctly avoid in-place mutation skew

Wrapping round, filter_intensity, and scale_intensity in small helpers that construct a fresh MsmsSpectrum/MsmsSpectrumJit per iteration, plus doing JIT warmup on separate instances, fixes the earlier “benchmark on already-mutated spectrum” problem and uses positional args where required for jitclasses. The assertions and parameters (decimals=2, min_intensity=0.01, scaling="root") align with the spectrum_utils APIs.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

benchmarks/test_spectrum_benchmarks.py (2)
15-67: Consider documenting the rationale for different intensity distributions.

The sample_data and large_sample_data fixtures use exponential distribution for intensities, while comparison_data uses uniform distribution (0 to 1). Different distributions may affect benchmark results due to varying data characteristics (e.g., number of peaks filtered, computational complexity). If this is intentional for testing different scenarios, consider adding comments to document the rationale.

109-191: Consider using positional arguments consistently for clarity.

The regular spectrum tests use keyword arguments (e.g., decimals=2, min_intensity=0.01, scaling="root"), while JIT tests use positional arguments. Although technically correct (MsmsSpectrum supports keywords, but MsmsSpectrumJit requires positional due to Numba jitclass constraints), using positional arguments consistently in both variants would improve clarity and make the API constraints more obvious.

Example for consistency:
     def test_spectrum_round_regular(self, benchmark, sample_data):
         """Benchmark rounding operation on regular spectrum."""

         def setup():
             return (MsmsSpectrum(**sample_data),), {}

         result = benchmark.pedantic(
-            lambda s: s.round(decimals=2), setup=setup, rounds=50, iterations=1
+            lambda s: s.round(2), setup=setup, rounds=50, iterations=1
         )
         assert result is not None

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 34921ab and 983bf8f.

📒 Files selected for processing (1)

benchmarks/test_spectrum_benchmarks.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

benchmarks/test_spectrum_benchmarks.py (1)

spectrum_utils/spectrum.py (2)

MsmsSpectrum (224-868)

MsmsSpectrumJit (37-221)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: build (macos-latest, 3.10)
GitHub Check: build (macos-latest, 3.12)
GitHub Check: build (windows-latest, 3.11)
GitHub Check: build (windows-latest, 3.12)
GitHub Check: build (windows-latest, 3.10)

🔇 Additional comments (1)

benchmarks/test_spectrum_benchmarks.py (1)

194-232: LGTM! Well-structured parametric benchmarks.

The comparison tests effectively leverage pytest's parametrization to benchmark across multiple spectrum sizes. The use of benchmark.extra_info to track size and implementation type will help with result analysis.

github-actions · 2025-11-21T15:30:19Z

🚀 Performance Benchmark Results (Python 3.11)

Comparing PR branch vs main branch

Benchmark	PR (ms)	Main (ms)	Δ (ms)	Change %	Rounds	Status
test_spectrum_creation_regular	0.037	0.035	+0.001	+3.7%	50	—
test_spectrum_creation_jit	0.030	0.029	+0.001	+2.9%	50	—
test_spectrum_creation_large_regular	0.165	0.164	+0.002	+1.0%	50	—
test_spectrum_creation_large_jit	0.159	0.159	-0.001	-0.4%	50	—
test_spectrum_round_regular	0.059	0.054	+0.006	+10.3%	5	⚠️ 10.3% slower
test_spectrum_round_jit	0.018	0.018	+0.000	+1.3%	34479	—
test_spectrum_filter_intensity_regular	0.025	0.024	+0.001	+4.9%	5	—
test_spectrum_filter_intensity_jit	0.019	0.018	+0.000	+1.3%	13110	—
test_spectrum_scale_intensity_regular	0.056	0.054	+0.002	+3.7%	5	—
test_spectrum_scale_intensity_jit	0.013	0.013	+0.000	+2.2%	15761	—
test_creation_performance_comparison[100]	0.025	0.023	+0.002	+8.5%	50	⚠️ 8.5% slower
test_creation_performance_comparison[1000]	0.035	0.034	+0.001	+3.6%	50	—
test_creation_performance_comparison[5000]	0.090	0.089	+0.001	+1.6%	50	—
test_creation_performance_comparison[10000]	0.165	0.162	+0.003	+1.6%	50	—
test_jit_creation_performance_comparison[100]	0.019	0.018	+0.001	+4.9%	50	—
test_jit_creation_performance_comparison[1000]	0.029	0.028	+0.001	+2.4%	50	—
test_jit_creation_performance_comparison[5000]	0.085	0.083	+0.001	+1.7%	50	—
test_jit_creation_performance_comparison[10000]	0.158	0.156	+0.002	+1.2%	50	—

Summary

✅ 0 improvements (>5% faster)
⚠️ 2 regressions (>5% slower)
16 unchanged (within ±5%)

Changes smaller than ±5% are not considered significant.
Lower times are better.

github-actions · 2025-11-24T11:01:09Z

🚀 Performance Benchmark Results (Python 3.11)

Comparing PR branch vs main branch

Benchmark	PR (ms)	Main (ms)	Δ (ms)	Change %	Rounds	Status
test_spectrum_creation_regular	0.035	0.035	+0.000	+1.2%	50	—
test_spectrum_creation_jit	0.030	0.029	+0.000	+1.1%	50	—
test_spectrum_creation_large_regular	0.162	0.163	-0.000	-0.3%	50	—
test_spectrum_creation_large_jit	0.155	0.156	-0.001	-0.4%	50	—
test_spectrum_round_regular	0.080	0.050	+0.029	+58.6%	5	⚠️ 58.6% slower
test_spectrum_round_jit	0.019	0.018	+0.000	+1.8%	34658	—
test_spectrum_filter_intensity_regular	0.033	0.023	+0.010	+42.5%	5	⚠️ 42.5% slower
test_spectrum_filter_intensity_jit	0.018	0.018	-0.000	-1.3%	10001	—
test_spectrum_scale_intensity_regular	0.048	0.056	-0.008	-14.4%	5	✅ 14.4% faster
test_spectrum_scale_intensity_jit	0.013	0.013	+0.000	+0.6%	16683	—
test_creation_performance_comparison[100]	0.024	0.023	+0.001	+5.3%	50	⚠️ 5.3% slower
test_creation_performance_comparison[1000]	0.034	0.034	+0.000	+0.3%	50	—
test_creation_performance_comparison[5000]	0.089	0.088	+0.001	+0.8%	50	—
test_creation_performance_comparison[10000]	0.163	0.161	+0.002	+1.2%	50	—
test_jit_creation_performance_comparison[100]	0.019	0.018	+0.001	+5.2%	50	⚠️ 5.2% slower
test_jit_creation_performance_comparison[1000]	0.030	0.028	+0.001	+4.2%	50	—
test_jit_creation_performance_comparison[5000]	0.083	0.082	+0.001	+1.2%	50	—
test_jit_creation_performance_comparison[10000]	0.157	0.155	+0.002	+1.0%	50	—

Summary

✅ 1 improvement (>5% faster)
⚠️ 4 regressions (>5% slower)
13 unchanged (within ±5%)

Changes smaller than ±5% are not considered significant.
Lower times are better.

github-actions · 2025-11-24T13:53:20Z

🚀 Performance Benchmark Results (Python 3.11)

Comparing PR branch vs main branch

Benchmark	PR (ms)	Main (ms)	Δ (ms)	Change %	Rounds	Status
test_spectrum_creation_regular	0.036	0.035	+0.001	+1.6%	50	—
test_spectrum_creation_jit	0.030	0.029	+0.001	+2.7%	50	—
test_spectrum_creation_large_regular	0.162	0.161	+0.001	+0.6%	50	—
test_spectrum_creation_large_jit	0.158	0.155	+0.002	+1.5%	50	—
test_spectrum_round_regular	0.061	0.052	+0.009	+17.3%	5	⚠️ 17.3% slower
test_spectrum_round_jit	0.018	0.018	+0.000	+0.1%	35046	—
test_spectrum_filter_intensity_regular	0.023	0.034	-0.012	-33.7%	5	✅ 33.7% faster
test_spectrum_filter_intensity_jit	0.018	0.018	+0.000	+0.2%	13122	—
test_spectrum_scale_intensity_regular	0.051	0.049	+0.003	+5.5%	5	⚠️ 5.5% slower
test_spectrum_scale_intensity_jit	0.013	0.013	+0.000	+3.2%	17171	—
test_creation_performance_comparison[100]	0.024	0.023	+0.002	+6.6%	50	⚠️ 6.6% slower
test_creation_performance_comparison[1000]	0.034	0.034	+0.001	+2.2%	50	—
test_creation_performance_comparison[5000]	0.089	0.088	+0.001	+1.1%	50	—
test_creation_performance_comparison[10000]	0.162	0.162	+0.000	+0.2%	50	—
test_jit_creation_performance_comparison[100]	0.019	0.018	+0.000	+2.6%	50	—
test_jit_creation_performance_comparison[1000]	0.029	0.029	+0.000	+0.9%	50	—
test_jit_creation_performance_comparison[5000]	0.083	0.082	+0.001	+0.9%	50	—
test_jit_creation_performance_comparison[10000]	0.156	0.155	+0.001	+0.8%	50	—

Summary

✅ 1 improvement (>5% faster)
⚠️ 3 regressions (>5% slower)
14 unchanged (within ±5%)

Changes smaller than ±5% are not considered significant.
Lower times are better.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/proforma_test.py (1)

2370-2395: Skip logic is reasonable for known upstream issues.

The try-except with conditional skip handles the known XLMOD OBO parsing issue without masking other errors. The dual condition check ("XLMOD" and "OBO file" in error message) ensures only the specific known issue triggers the skip.

Consider whether pytest.xfail would be more semantically appropriate than pytest.skip to indicate this is an expected failure rather than a test that should be skipped. However, the current approach is acceptable.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13345bb and 7470b81.

📒 Files selected for processing (2)

spectrum_utils/proforma.py (2 hunks)
tests/proforma_test.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/proforma_test.py (1)

spectrum_utils/proforma.py (9)

proteoform (373-395)

proforma (366-371)

parse (642-691)

mass (334-342)

accession (118-121)

accession (124-125)

name (128-133)

name (136-137)

_import_cv (695-823)

🪛 GitHub Actions: Run tests

tests/proforma_test.py

[error] 1-1: pytest failed: test_proforma_name failed, test_proforma_accession failed, test_proforma_xlink failed in tests/proforma_test.py

[error] 1-1: Multiple Proforma-related tests failed during pytest run.

spectrum_utils/proforma.py

[error] 858-865: SyntaxError: Failed to parse XLMOD controlled vocabulary OBO file. This is likely due to a format issue in the upstream vocabulary file. Original error: expected QuotedString. Command: pytest --cov=spectrum_utils --verbose tests/.

🔇 Additional comments (3)

spectrum_utils/proforma.py (2)

186-188: LGTM - Documentation updated correctly.

The addition of SyntaxError to the docstring accurately documents the new error handling behavior for OBO parsing failures.

857-867: The error handling is correct — no changes needed.

fastobo.load() returns an OboDoc (the OBO document deserialized into an Abstract Syntax Tree), not a generator. Since it's eagerly evaluated, any SyntaxError is raised when fastobo.load(obo_fh) is called on line 858, which is inside the try-except block. The error will be caught and re-raised with your custom message as intended.

tests/proforma_test.py (1)

2453-2467: Consistent and correct skip logic for CV imports.

This skip block follows the same pattern as the earlier change, properly scoping the skip to XLMOD with OBO parsing errors. The implementation correctly re-raises errors for other vocabularies.

github-actions · 2025-11-24T16:12:56Z

🚀 Performance Benchmark Results (Python 3.11)

Comparing PR branch vs main branch

Benchmark	PR (ms)	Main (ms)	Δ (ms)	Change %	Rounds	Status
test_spectrum_creation_regular	0.034	0.035	-0.001	-2.0%	50	—
test_spectrum_creation_jit	0.029	0.030	-0.000	-1.1%	50	—
test_spectrum_creation_large_regular	0.164	0.163	+0.001	+0.4%	50	—
test_spectrum_creation_large_jit	0.156	0.156	+0.000	+0.0%	50	—
test_spectrum_round_regular	0.049	0.050	-0.001	-1.9%	5	—
test_spectrum_round_jit	0.018	0.018	+0.000	+1.1%	35571	—
test_spectrum_filter_intensity_regular	0.024	0.023	+0.001	+5.7%	5	⚠️ 5.7% slower
test_spectrum_filter_intensity_jit	0.018	0.018	-0.000	-0.2%	13068	—
test_spectrum_scale_intensity_regular	0.049	0.048	+0.001	+1.8%	5	—
test_spectrum_scale_intensity_jit	0.013	0.012	+0.000	+3.2%	18132	—
test_creation_performance_comparison[100]	0.023	0.023	-0.000	-1.0%	50	—
test_creation_performance_comparison[1000]	0.034	0.034	-0.000	-0.6%	50	—
test_creation_performance_comparison[5000]	0.087	0.088	-0.001	-0.6%	50	—
test_creation_performance_comparison[10000]	0.162	0.162	-0.000	-0.2%	50	—
test_jit_creation_performance_comparison[100]	0.019	0.018	+0.000	+1.6%	50	—
test_jit_creation_performance_comparison[1000]	0.028	0.028	+0.000	+0.2%	50	—
test_jit_creation_performance_comparison[5000]	0.082	0.082	-0.001	-0.6%	50	—
test_jit_creation_performance_comparison[10000]	0.155	0.156	-0.001	-0.6%	50	—

Summary

✅ 0 improvements (>5% faster)
⚠️ 1 regression (>5% slower)
17 unchanged (within ±5%)

Changes smaller than ±5% are not considered significant.
Lower times are better.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7470b81 and f941067.

📒 Files selected for processing (1)

tests/proforma_test.py (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/proforma_test.py (1)

spectrum_utils/proforma.py (8)

proforma (366-371)

parse (642-691)

mass (334-342)

accession (118-121)

accession (124-125)

name (128-133)

name (136-137)

_import_cv (695-823)

🔇 Additional comments (1)

tests/proforma_test.py (1)
208-232: Pattern for handling known XLMOD issue is reasonable.

The try-except pattern used here appropriately handles the known upstream XLMOD/OBO parsing issue by:

Catching SyntaxError specifically

Checking for diagnostic strings before skipping

Re-raising unexpected errors

Providing clear skip messages

The string matching on "XLMOD" in str(e) and "OBO file" in str(e) is pragmatic for tests, though it's fragile to upstream error message changes.

Consider adding a test that verifies the XLMOD issue resolution once upstream fixes are available:
@pytest.mark.xfail(reason="XLMOD OBO parsing issue - remove xfail when upstream is fixed")
def test_xlmod_parsing_recovered():
    """Verify XLMOD parsing works again after upstream fix."""
    proteoform = proforma.parse("EMEVTK[X:DSS#XL1]SESPEK")[0]
    assert proteoform.modifications[0].source[0].controlled_vocabulary == "XLMOD"
Also applies to: 403-415, 2451-2476, 2534-2548

tests/proforma_test.py

github-actions · 2025-11-25T16:01:05Z

🚀 Performance Benchmark Results (Python 3.11)

Comparing PR branch vs main branch

Benchmark	PR (ms)	Main (ms)	Δ (ms)	Change %	Rounds	Status
test_spectrum_creation_regular	0.035	0.037	-0.002	-5.4%	50	✅ 5.4% faster
test_spectrum_creation_jit	0.029	0.029	-0.000	-0.1%	50	—
test_spectrum_creation_large_regular	0.162	0.163	-0.001	-0.4%	50	—
test_spectrum_creation_large_jit	0.156	0.156	-0.000	-0.3%	50	—
test_spectrum_round_regular	0.052	0.051	+0.002	+3.6%	5	—
test_spectrum_round_jit	0.018	0.018	+0.000	+0.1%	35295	—
test_spectrum_filter_intensity_regular	0.024	0.023	+0.000	+2.0%	5	—
test_spectrum_filter_intensity_jit	0.018	0.018	-0.000	-0.2%	13074	—
test_spectrum_scale_intensity_regular	0.051	0.049	+0.001	+2.8%	5	—
test_spectrum_scale_intensity_jit	0.013	0.012	+0.000	+3.5%	16633	—
test_creation_performance_comparison[100]	0.024	0.024	-0.000	-0.9%	50	—
test_creation_performance_comparison[1000]	0.035	0.035	-0.000	-0.7%	50	—
test_creation_performance_comparison[5000]	0.090	0.090	+0.001	+0.8%	50	—
test_creation_performance_comparison[10000]	0.161	0.163	-0.002	-1.2%	50	—
test_jit_creation_performance_comparison[100]	0.019	0.018	+0.001	+3.1%	50	—
test_jit_creation_performance_comparison[1000]	0.030	0.028	+0.002	+5.9%	50	⚠️ 5.9% slower
test_jit_creation_performance_comparison[5000]	0.083	0.083	-0.001	-1.0%	50	—
test_jit_creation_performance_comparison[10000]	0.156	0.157	-0.001	-0.7%	50	—

Summary

✅ 1 improvement (>5% faster)
⚠️ 1 regression (>5% slower)
16 unchanged (within ±5%)

Changes smaller than ±5% are not considered significant.
Lower times are better.

try automated benchmarking

5ec48a4

coderabbitai bot reviewed Nov 18, 2025

View reviewed changes

Fix benchmark tests: fix JIT method signatures and fixture scoping

0a8623a

coderabbitai bot reviewed Nov 18, 2025

View reviewed changes

benchmarks/test_spectrum_benchmarks.py Outdated Show resolved Hide resolved

set reasonable intensity

0ce6b71

coderabbitai bot reviewed Nov 18, 2025

View reviewed changes

rukubrakov added 3 commits November 20, 2025 12:42

Merge remote-tracking branch 'origin/main' into feature/automated-ben…

999d5d6

…chmarking

fix ruff, fix comments

82f9232

add git action right to write

38bb3a9

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

fix table formating with benchmark results

4975d4a

bittremieuxlab deleted a comment from github-actions bot Nov 20, 2025

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

.github/workflows/benchmarks.yml Outdated Show resolved Hide resolved

make benchmark compare PR with main

cb222ed

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

rukubrakov added 2 commits November 21, 2025 13:58

add number of rounds info, warmup for jit functions and increas min a…

554929a

…mount of rounds

fix ruff

5bbe668

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

benchmarks/test_spectrum_benchmarks.py Show resolved Hide resolved

benchmarks/test_spectrum_benchmarks.py Show resolved Hide resolved

benchmarks/test_spectrum_benchmarks.py Show resolved Hide resolved

rukubrakov added 2 commits November 21, 2025 15:14

separate python processes for main and PR benchmarks, add cache clearing

8398d14

fix mutation issues, make consistent number if rounds, remove unused …

34921ab

…benchmarks

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

make sure we time only operations and not spectra creation itself

983bf8f

bittremieuxlab deleted a comment from github-actions bot Nov 21, 2025

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

rukubrakov added 3 commits November 21, 2025 15:47

create sample spectra before measuring time

1f4da9b

use setup to exclude spectra creation time from measuring operations

a8746dc

fix to actualy measure operations without creation

bf980b1

bittremieuxlab deleted a comment from github-actions bot Nov 21, 2025

use median instead of mean

41815f5

bittremieuxlab deleted a comment from github-actions bot Nov 21, 2025

remove unused file

13345bb

Improve error handling for XLMOD OBO parsing failures

7470b81

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

add workarounds for the XLMOD OBO

f941067

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

tests/proforma_test.py Show resolved Hide resolved

rukubrakov requested a review from bittremieux November 25, 2025 12:00

rukubrakov self-assigned this Nov 25, 2025

Fix comment new line

66581b7

bittremieux approved these changes Nov 25, 2025

View reviewed changes

bittremieux merged commit b6e5d48 into main Nov 25, 2025
21 of 22 checks passed

bittremieux deleted the feature/automated-benchmarking branch November 25, 2025 16:52

try automated benchmarking #84

try automated benchmarking #84

Uh oh!

Conversation

rukubrakov commented Nov 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 20, 2025

🚀 Performance Benchmark Results

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 21, 2025

🚀 Performance Benchmark Results (Python 3.11)

Summary

Uh oh!

github-actions bot commented Nov 24, 2025

🚀 Performance Benchmark Results (Python 3.11)

Summary

Uh oh!

github-actions bot commented Nov 24, 2025

🚀 Performance Benchmark Results (Python 3.11)

Summary

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

rukubrakov commented Nov 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 18, 2025 •

edited

Loading