Skip to content

Conversation

@rukubrakov
Copy link
Contributor

@rukubrakov rukubrakov commented Nov 18, 2025

Summary by CodeRabbit

  • Chores
    • Added automated performance benchmarking to CI that runs on PRs and compares results with main; introduced a dev dependency for benchmarking.
  • Tests
    • Added benchmark tests measuring spectrum creation and common operations across multiple data sizes and implementations.
  • Bug Fixes
    • Improved parsing error handling and made tests resilient to a known upstream parsing/format issue by conditionally skipping affected scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 18, 2025

Walkthrough

Adds a PR-focused GitHub Actions benchmarking workflow, a pytest-benchmark test module with fixtures and parametric tests, a dev dependency for pytest-benchmark, defensive OBO parse error handling in proforma, and conditional test skips for known XLMOD/OBO SyntaxError cases.

Changes

Cohort / File(s) Summary
CI/CD Benchmark Workflow
\.github/workflows/benchmarks.yml
New "Performance Benchmarks" workflow that runs on pushes to main/dev and on PRs (path-filtered). Runs benchmarks on the PR branch, attempts to checkout and run benchmarks on main (creates empty main results if checkout fails), and posts a markdown comparison comment with deltas and significance logic.
Benchmark Tests
benchmarks/test_spectrum_benchmarks.py
New pytest-benchmark module adding fixtures (sample_data, large_sample_data, comparison_data) and two test classes (TestSpectrumPerformance, TestSpectrumComparison) with pedantic/parametric benchmarks for regular vs JIT spectrum creation and operations (round, filter_intensity, scale_intensity).
Dev Dependency Update
setup.cfg
Adds pytest-benchmark to development extras.
ProForma parsing robustness
spectrum_utils/proforma.py
Wraps fastobo.load in try/except to catch SyntaxError, re-raises with contextual message including cv_id; assigns frames before iterating to avoid direct generator iteration.
Tests: conditional skip on upstream parse errors
tests/proforma_test.py
Surrounds XLMOD/OBO parsing blocks with try/except to detect SyntaxError mentioning XLMOD/OBO and conditionally skip those tests while re-raising other exceptions.

Sequence Diagram(s)

sequenceDiagram
    participant GHA as GitHub Actions
    participant Runner as Benchmark Runner
    participant PR as PR Branch
    participant Main as Main Branch
    participant Script as Comment Builder
    participant GHPR as PR Comment

    GHA->>PR: Checkout PR code
    GHA->>Runner: Install deps & run benchmarks (PR)
    Runner-->>GHA: pr_results.json

    GHA->>Main: Checkout main (spectrum_utils/)
    alt main checkout OK
        GHA->>Runner: Install main & run benchmarks
        Runner-->>GHA: main_results.json
    else checkout failed
        GHA-->>GHA: create empty main_results.json
    end

    GHA->>Script: Build comparison markdown (PR vs Main)
    Script-->>GHA: markdown payload
    GHA->>GHPR: Post comment on PR
    GHPR-->>GHA: posted
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Focus review on:
    • spectrum_utils/proforma.py — SyntaxError capture, message clarity, and preserved parsing semantics.
    • tests/proforma_test.py — correctness of conditional skips (only XLMOD/OBO SyntaxError).
    • .github/workflows/benchmarks.yml — checkout/restore logic and PR comment content/formatting.
    • benchmarks/test_spectrum_benchmarks.py — correctness of fixtures, benchmark configuration, and JIT vs regular comparisons.

Possibly related PRs

  • fix tests and docs #83 — Changes also modify ProForma OBO parsing and caching; strong overlap with spectrum_utils/proforma.py.

Poem

🐰 I hopped through code to time each run,
Benchmarks humming under morning sun,
PR and main I gently test,
OBO guards keep parsing blessed,
Little rabbit grins — CI's at rest.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.27% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'try automated benchmarking' accurately describes the main objective of the pull request, which adds automated performance benchmarking infrastructure via a GitHub Actions workflow and benchmark test suite.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/automated-benchmarking

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f941067 and 66581b7.

📒 Files selected for processing (1)
  • tests/proforma_test.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/proforma_test.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build (windows-latest, 3.10)
  • GitHub Check: build (windows-latest, 3.11)
  • GitHub Check: build (macos-latest, 3.10)
  • GitHub Check: build (windows-latest, 3.12)

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (4)
setup.cfg (1)

39-39: LGTM! Consider pinning the version.

The addition of pytest-benchmark to dev dependencies is appropriate for the benchmarking infrastructure. However, consider pinning to a specific version or minimum version to ensure reproducible builds.

Optional: Apply this diff to pin to a minimum version:

-    pytest-benchmark
+    pytest-benchmark>=4.0.0
pytest-benchmark.ini (1)

33-33: Consider if 10% regression threshold is appropriate.

A 10% regression threshold (compare-fail=mean:10%) may be too strict for initial benchmarking setup, potentially causing false positives due to system variance. Consider starting with a higher threshold (e.g., 20-30%) and tightening it once baseline stability is established.

benchmarks/test_spectrum_benchmarks.py (2)

130-146: Inconsistent use of benchmark.extra_info.

Line 136 sets benchmark.extra_info['spectrum_size'], but line 154 also adds 'implementation'. For consistency, both test methods should include the same metadata fields.

Apply this diff to add implementation info to the regular test:

     def test_creation_performance_comparison(self, benchmark, comparison_data, size):
         """Compare creation performance across different spectrum sizes."""
         data = comparison_data[f"size_{size}"]
         
         # This will show in benchmark results which size is being tested
         benchmark.extra_info['spectrum_size'] = size
+        benchmark.extra_info['implementation'] = 'regular'
         
         # Benchmark regular spectrum creation

170-180: Prefer explicit operation chaining for clarity.

The create_and_process helper uses method chaining but reassigns the spectrum variable multiple times. While functionally correct, this mixed style can be confusing.

Consider using pure method chaining for clarity:

     def test_memory_efficiency_regular(self, benchmark, sample_data):
         """Test memory usage of regular spectrum operations."""
         def create_and_process():
-            spectrum = MsmsSpectrum(**sample_data)
-            spectrum = spectrum.filter_intensity(min_intensity=100)
-            spectrum = spectrum.scale_intensity("sqrt") 
-            spectrum = spectrum.round(decimals=2)
-            return spectrum
+            return (MsmsSpectrum(**sample_data)
+                    .filter_intensity(min_intensity=0.01)
+                    .scale_intensity(scaling="root")
+                    .round(decimals=2))
             
         result = benchmark(create_and_process)
         assert result is not None

Note: This also incorporates the parameter fixes from previous comments.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a12f453 and 5ec48a4.

📒 Files selected for processing (4)
  • .github/workflows/benchmarks.yml (1 hunks)
  • benchmarks/test_spectrum_benchmarks.py (1 hunks)
  • pytest-benchmark.ini (1 hunks)
  • setup.cfg (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/test_spectrum_benchmarks.py (1)
spectrum_utils/spectrum.py (2)
  • MsmsSpectrum (217-808)
  • MsmsSpectrumJit (30-214)
🪛 GitHub Actions: Lint
benchmarks/test_spectrum_benchmarks.py

[error] 1-1: ruff format check failed. 1 file would be reformatted (benchmarks/test_spectrum_benchmarks.py). Run 'ruff format' to fix.

🪛 GitHub Actions: Performance Benchmarks
benchmarks/test_spectrum_benchmarks.py

[error] 79-79: Test 'test_spectrum_round_jit' failed: TypeError: some keyword arguments unexpected.


[error] 91-91: Test 'test_spectrum_filter_intensity_jit' failed: TypeError: some keyword arguments unexpected.


[error] 103-103: Test 'test_spectrum_scale_intensity_jit' failed: TypeError: some keyword arguments unexpected.


[error] 170-170: Fixture 'sample_data' not found for test 'test_memory_efficiency_regular'.


[error] 182-182: Fixture 'sample_data' not found for test 'test_memory_efficiency_jit'.

🔇 Additional comments (2)
pytest-benchmark.ini (1)

30-30: I'll search for information about pytest-benchmark's baseline behavior with the compare setting.

Let me search for the full pytest-benchmark.ini configuration and any CI workflows:

Based on my verification using web search of pytest-benchmark documentation:

No actionable issues found; review comment should be removed.

To create baseline '0001', the workflow needs --benchmark-autosave or --benchmark-save option, which can be added to the pytest configuration. After a baseline is saved, you can compare against it with --benchmark-compare=0001. The ini setting compare = 0001 is valid; it does not cause errors on first run—it simply won't perform comparison if the baseline doesn't exist yet, which is expected and not a problem that requires handling.

If the workflow includes --benchmark-autosave in its pytest configuration or CI setup, the baseline is created automatically on the first benchmark run, making this a non-issue. The review concern assumes a missing baseline is a problem, but this is standard pytest-benchmark workflow: save baselines first, then compare subsequent runs against them.

.github/workflows/benchmarks.yml (1)

44-54: Configuration is correct and compatible.

The workflow properly generates pytest-benchmark JSON output via the --benchmark-json flag (line 35) and the benchmark-action is correctly configured to consume it. The tool parameter is set to 'pytest' and the output path matches between pytest generation and action input. The github-action-benchmark action supports pytest-benchmark as documented, and this workflow implements it correctly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
benchmarks/test_spectrum_benchmarks.py (2)

1-195: Run ruff format to fix formatting issues.

The pipeline reports that ruff would reformat this file. This is consistent with a past review comment about formatting issues that still needs to be addressed.

Run the following to fix:

#!/bin/bash
ruff format benchmarks/test_spectrum_benchmarks.py

105-115: Critical: Fix min_intensity parameter causing segmentation fault.

The pipeline reports a segmentation fault at line 108. The min_intensity parameter should be a relative value between 0.0 and 1.0 (representing a fraction of the maximum intensity), not an absolute value like 100. Using 100 causes all peaks to be filtered out, which leads to the crash.

Apply this diff to fix both tests:

     def test_spectrum_filter_intensity_regular(self, benchmark, sample_data):
         """Benchmark intensity filtering on regular spectrum."""
         spectrum = MsmsSpectrum(**sample_data)
-        result = benchmark(spectrum.filter_intensity, min_intensity=100)
+        result = benchmark(spectrum.filter_intensity, min_intensity=0.01)
         assert result is not None
 
     def test_spectrum_filter_intensity_jit(self, benchmark, sample_data):
         """Benchmark intensity filtering on JIT spectrum."""
         spectrum = MsmsSpectrumJit(**sample_data)
-        result = benchmark(spectrum.filter_intensity, 100.0)
+        result = benchmark(spectrum.filter_intensity, 0.01)
         assert result is not None
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ec48a4 and 0a8623a.

📒 Files selected for processing (1)
  • benchmarks/test_spectrum_benchmarks.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/test_spectrum_benchmarks.py (1)
spectrum_utils/spectrum.py (2)
  • MsmsSpectrum (217-808)
  • MsmsSpectrumJit (30-214)
🪛 GitHub Actions: Lint
benchmarks/test_spectrum_benchmarks.py

[error] 1-1: Ruff format check failed. 1 file would be reformatted by 'ruff format --check'. Run 'ruff format' to fix formatting.

🪛 GitHub Actions: Performance Benchmarks
benchmarks/test_spectrum_benchmarks.py

[error] 108-108: Segmentation fault during benchmark test 'test_spectrum_filter_intensity_regular' triggered by pytest-benchmark (exit code 139).

🔇 Additional comments (1)
benchmarks/test_spectrum_benchmarks.py (1)

130-167: LGTM! Well-structured parametrized benchmarks.

The use of pytest.mark.parametrize with benchmark.pedantic provides controlled, reproducible performance comparisons across different spectrum sizes. The extra_info metadata will help track results effectively.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
benchmarks/test_spectrum_benchmarks.py (1)

1-195: Fix ruff formatting so the lint pipeline passes.

The lint job reports that ruff format --check . would reformat this file (likely due to trailing spaces and spacing around decorators/blocks). Please run ruff’s formatter and commit the result so CI passes:

#!/bin/bash
ruff format benchmarks/test_spectrum_benchmarks.py
ruff format --check benchmarks/test_spectrum_benchmarks.py
🧹 Nitpick comments (2)
benchmarks/test_spectrum_benchmarks.py (2)

15-67: Avoid repeatedly resetting the global NumPy RNG in fixtures.

Each session-scoped fixture calls np.random.seed(42), which repeatedly resets the global RNG state and can surprise other tests that rely on NumPy randomness. Consider using a local generator per fixture instead (e.g., rng = np.random.default_rng(42) and then rng.uniform / rng.exponential) so the benchmark data remain reproducible without mutating global RNG state.


170-195: Clarify that these “memory usage” tests are time benchmarks, or add real memory measurement.

test_memory_efficiency_* wrap create_and_process with the benchmark fixture, which measures execution time rather than memory. Either adjust the docstrings/names to reflect timing-focused benchmarks or augment them with an actual memory-measurement tool if memory profiling is the goal.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a8623a and 0ce6b71.

📒 Files selected for processing (1)
  • benchmarks/test_spectrum_benchmarks.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/test_spectrum_benchmarks.py (1)
spectrum_utils/spectrum.py (2)
  • MsmsSpectrum (217-808)
  • MsmsSpectrumJit (30-214)
🪛 GitHub Actions: Lint
benchmarks/test_spectrum_benchmarks.py

[error] 1-1: ruff format --check . detected that 1 file would be reformatted. Run 'ruff format' to fix formatting.

🔇 Additional comments (1)
benchmarks/test_spectrum_benchmarks.py (1)

70-168: Benchmarks correctly exercise regular vs JIT spectra and fix earlier TypeErrors.

The creation/operation and comparison benchmarks use MsmsSpectrum with keyword args and MsmsSpectrumJit with positional args (round, filter_intensity, scale_intensity), which aligns with the library signatures and avoids the previous keyword-argument TypeErrors. Use of benchmark and benchmark.pedantic (including extra_info metadata) looks sound and should give useful performance data across sizes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
.github/workflows/benchmarks.yml (1)

84-105: Clarify what “previous benchmark results” actually are and whether that matches your intent.

actions/download-artifact@v4 without extra configuration only fetches artifacts from this same workflow run, so results_3.11.json is the output of the matrix benchmark job for the current commit, not a baseline from a prior run on main/dev. The pytest-benchmark compare --group-by=name current_results.json results_3.11.json || true call itself is fine, but today it’s comparing two runs of the same code, not PR vs base branch.

If your goal is true cross-commit comparison (PR vs main), consider instead:

  • Pulling baseline JSON from the data stored by github-action-benchmark on gh-pages, or
  • Using a dedicated PR benchmark action (e.g., github-action-pull-request-benchmark) that knows how to fetch base-branch results.

As-is, this step gives useful run-to-run noise information but not regression vs base.

benchmarks/test_spectrum_benchmarks.py (1)

95-197: JIT benchmarks now align with the APIs and avoid the previous argument issues.

Using positional arguments for the JIT methods (round, filter_intensity, scale_intensity) and switching to relative min_intensity=0.01 plus valid scaling="root" brings these tests in line with the implementations in spectrum_utils.spectrum and should eliminate the earlier TypeErrors. The memory-usage benchmarks also mirror realistic operation chains on both regular and JIT spectra.

If you care about isolating steady-state JIT performance, you might optionally add a one-time warmup call for MsmsSpectrumJit operations outside the measured benchmark(...) call so compilation cost doesn’t skew the measurements.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 999d5d6 and 82f9232.

📒 Files selected for processing (3)
  • .github/workflows/benchmarks.yml (1 hunks)
  • benchmarks/test_spectrum_benchmarks.py (1 hunks)
  • pytest-benchmark.ini (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pytest-benchmark.ini
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/test_spectrum_benchmarks.py (1)
spectrum_utils/spectrum.py (2)
  • MsmsSpectrum (224-868)
  • MsmsSpectrumJit (37-221)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: build (macos-latest, 3.12)
  • GitHub Check: build (windows-latest, 3.11)
  • GitHub Check: build (windows-latest, 3.12)
  • GitHub Check: build (windows-latest, 3.10)
  • GitHub Check: build (ubuntu-latest, 3.11)
  • GitHub Check: build (macos-latest, 3.10)
🔇 Additional comments (2)
.github/workflows/benchmarks.yml (1)

38-61: Benchmark output paths and artifacts are now wired consistently.

Running pytest from the repo root with --benchmark-json=benchmarks/results_${{ matrix.python-version }}.json and reusing that same path for both github-action-benchmark and upload-artifact keeps all consumers in sync and fixes the earlier directory mismatch. No further changes needed here.

benchmarks/test_spectrum_benchmarks.py (1)

15-68: Fixtures and synthetic datasets look consistent and reusable.

The three session-scoped fixtures (sample_data, large_sample_data, comparison_data) generate well-structured dicts with consistent keys, making it easy to construct both MsmsSpectrum and MsmsSpectrumJit across benchmarks and parameterized tests. Nothing blocking here.

@bittremieuxlab bittremieuxlab deleted a comment from github-actions bot Nov 20, 2025
@github-actions
Copy link

🚀 Performance Benchmark Results

Benchmark Mean Min Max Rounds
test_spectrum_creation_regular 0.053ms 0.037ms 0.103ms 5
test_spectrum_creation_jit 0.040ms 0.030ms 0.072ms 5
test_spectrum_creation_large_regular 0.162ms 0.156ms 0.282ms 4600
test_spectrum_creation_large_jit 0.157ms 0.151ms 0.295ms 5557
test_spectrum_round_regular 0.063ms 0.048ms 0.109ms 5
test_spectrum_round_jit 0.021ms 0.019ms 0.027ms 5
test_spectrum_filter_intensity_regular 0.036ms 0.019ms 0.078ms 5
test_spectrum_filter_intensity_jit 0.035ms 0.019ms 0.074ms 5
test_spectrum_scale_intensity_regular 0.068ms 0.047ms 0.132ms 5
test_spectrum_scale_intensity_jit 0.025ms 0.016ms 0.058ms 5
test_creation_performance_comparison[100] 0.032ms 0.024ms 0.081ms 10
test_creation_performance_comparison[1000] 0.040ms 0.034ms 0.056ms 10
test_creation_performance_comparison[5000] 0.096ms 0.088ms 0.121ms 10
test_creation_performance_comparison[10000] 0.169ms 0.160ms 0.192ms 10
test_jit_creation_performance_comparison[100] 0.024ms 0.018ms 0.045ms 10
test_jit_creation_performance_comparison[1000] 0.033ms 0.028ms 0.047ms 10
test_jit_creation_performance_comparison[5000] 0.088ms 0.082ms 0.114ms 10
test_jit_creation_performance_comparison[10000] 0.161ms 0.155ms 0.186ms 10
test_memory_efficiency_regular 0.171ms 0.162ms 0.390ms 3219
test_memory_efficiency_jit 0.091ms 0.087ms 0.173ms 6336

Times are in milliseconds. Lower is better.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
.github/workflows/benchmarks.yml (1)

107-107: Incorrect pytest-benchmark executable name—should still be fixed.

Per the past review, line 107 uses pytest-benchmark compare, but the correct executable per pytest-benchmark documentation is py.test-benchmark compare (or alternatively pytest --benchmark-compare). The current command will likely fail at runtime.

Apply this diff:

-          pytest-benchmark compare --group-by=name current_results.json results_3.11.json || true
+          py.test-benchmark compare --group-by=name current_results.json results_3.11.json || true

Alternatively, if you prefer to run through pytest:

-          pytest-benchmark compare --group-by=name current_results.json results_3.11.json || true
+          pytest --benchmark-compare=current_results.json --group-by=name results_3.11.json || true
🧹 Nitpick comments (1)
.github/workflows/benchmarks.yml (1)

3-13: Path filter includes benchmark-adjacent files correctly.

The trigger conditions appropriately filter to run only on code and configuration changes relevant to benchmarking. However, consider whether pyproject.toml (line 13) should also trigger runs if it's not used by this project—if so, removing it avoids unnecessary CI runs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82f9232 and 4975d4a.

📒 Files selected for processing (1)
  • .github/workflows/benchmarks.yml (1 hunks)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/workflows/benchmarks.yml (1)

3-7: Remove unused push trigger or separate jobs by event type.

The workflow triggers on both push (to main/dev) and pull_request (to main/dev), but the only job (benchmark-comparison) is conditioned on github.event_name == 'pull_request' (line 23), making the push trigger unused. Either remove the push trigger or add a separate job to handle push events.

 on:
-  push:
-    branches: [ main, dev ]
   pull_request:
     branches: [ main, dev ]
     # Only run on performance-related changes
     paths:

Alternatively, if push events are intentional (e.g., for tracking performance on main), keep the trigger and add a separate job.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4975d4a and cb222ed.

📒 Files selected for processing (1)
  • .github/workflows/benchmarks.yml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build (windows-latest, 3.12)
  • GitHub Check: build (windows-latest, 3.11)
  • GitHub Check: build (macos-latest, 3.10)
  • GitHub Check: build (windows-latest, 3.10)
  • GitHub Check: build (macos-latest, 3.11)
🔇 Additional comments (3)
.github/workflows/benchmarks.yml (3)

36-43: Installation and PR benchmark execution look good.

The dependency setup correctly installs dev dependencies, and the benchmark command uses correct paths with output directed to benchmarks/pr_results.json.


45-74: Verify cross-branch comparison behavior for dev branch PRs.

Line 51 always checks out and compares against origin/main, regardless of the target branch. For PRs to the dev branch, this compares performance against main instead of dev, which may not reflect the intended baseline. Consider using the target branch dynamically:

-        if git checkout origin/main -- spectrum_utils 2>/dev/null; then
+        TARGET_BRANCH="${{ github.base_ref }}"
+        if git checkout "origin/${TARGET_BRANCH}" -- spectrum_utils 2>/dev/null; then

This ensures comparisons are against the target branch (main for main PRs, dev for dev PRs). Verify this matches the PR objectives and is the desired behavior.


76-169: Comparison logic and error handling are sound.

The benchmark comparison script correctly:

  • Calculates percentage changes (line 131): (delta / mainMean) * 100
  • Applies 5% significance threshold (line 134): Math.abs(changePercent) < 5
  • Handles missing main results (lines 92-105) and new benchmarks (lines 146-149) ✅
  • Provides detailed comparison table with delta, percentage, and status indicators ✅

Error handling at lines 170–172 intentionally catches errors and logs without failing the workflow, preserving PR functionality even if benchmarking fails.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
benchmarks/test_spectrum_benchmarks.py (1)

1-221: Fix formatting to pass ruff checks.

The pipeline reports that ruff would reformat this file. Run ruff format benchmarks/test_spectrum_benchmarks.py locally to fix the formatting issues and ensure the Lint workflow passes.

#!/bin/bash
# Check what formatting changes ruff would make
ruff format --check --diff benchmarks/test_spectrum_benchmarks.py
🧹 Nitpick comments (2)
.github/workflows/benchmarks.yml (2)

45-74: Consider adding error handling and cleanup to the branch-switching logic.

The current approach of manually swapping directories and reinstalling works, but could be more robust. Consider:

  1. Add cleanup at the start to handle previous failed runs:
rm -rf spectrum_utils_pr 2>/dev/null || true
  1. Add error handling to restore PR code if any step fails:
trap 'rm -rf spectrum_utils; mv spectrum_utils_pr spectrum_utils 2>/dev/null || true' ERR
  1. Consider using git worktrees instead of manual directory swapping for cleaner isolation.

76-174: Comment generation logic is well-structured.

The JavaScript code handles the comparison comprehensively with good error handling. The 5% threshold and status categorization provide clear, actionable feedback.

One optional enhancement: consider adding validation for the JSON structure before processing:

if (!prResults.benchmarks || !Array.isArray(prResults.benchmarks)) {
  throw new Error('Invalid PR results structure');
}

This would provide clearer error messages if the benchmark JSON format changes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb222ed and 554929a.

📒 Files selected for processing (3)
  • .github/workflows/benchmarks.yml (1 hunks)
  • benchmarks/test_spectrum_benchmarks.py (1 hunks)
  • pytest-benchmark.ini (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pytest-benchmark.ini
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/test_spectrum_benchmarks.py (1)
spectrum_utils/spectrum.py (2)
  • MsmsSpectrum (224-868)
  • MsmsSpectrumJit (37-221)
🪛 GitHub Actions: Lint
benchmarks/test_spectrum_benchmarks.py

[error] 1-1: ruff format --check . failed with exit code 1. 1 file would be reformatted: benchmarks/test_spectrum_benchmarks.py. Run 'ruff format .' to fix.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build (macos-latest, 3.11)
  • GitHub Check: build (windows-latest, 3.12)
  • GitHub Check: build (windows-latest, 3.10)
  • GitHub Check: build (windows-latest, 3.11)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
benchmarks/test_spectrum_benchmarks.py (1)

15-67: Consider consistent intensity distributions across fixtures.

The sample_data and large_sample_data fixtures use np.random.exponential(1000) for intensity values, while comparison_data uses np.random.uniform(0, 1). This inconsistency could make cross-fixture performance comparisons less meaningful, though each fixture may serve different benchmark purposes.

If the fixtures are intended to be comparable, consider using the same intensity distribution:

     for size in sizes:
         mz = np.sort(np.random.uniform(100, 2000, size))
-        intensity = np.random.uniform(0, 1, size)
+        intensity = np.random.exponential(1000, size)
         datasets[f"size_{size}"] = {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 554929a and 5bbe668.

📒 Files selected for processing (1)
  • benchmarks/test_spectrum_benchmarks.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/test_spectrum_benchmarks.py (1)
spectrum_utils/spectrum.py (2)
  • MsmsSpectrum (224-868)
  • MsmsSpectrumJit (37-221)
🔇 Additional comments (1)
benchmarks/test_spectrum_benchmarks.py (1)

1-221: Well-structured benchmark suite with good practices.

The overall benchmark structure is solid:

  • Session-scoped fixtures for efficiency
  • Proper JIT warmup strategy (create separate warmup instances)
  • Good use of benchmark.pedantic with explicit rounds in comparison tests
  • Effective use of benchmark.extra_info for annotating results with metadata
  • The TestMemoryUsage tests demonstrate the correct pattern for benchmarking in-place operations

Once the mutation issues in the JIT operation tests are addressed, this will provide reliable performance metrics for spectrum_utils.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
benchmarks/test_spectrum_benchmarks.py (2)

15-67: Fixtures are clear; consider avoiding global RNG seeding side effects

The session-scoped fixtures produce reproducible spectra, which is good for benchmarks. To avoid mutating NumPy’s global RNG state (and potentially affecting other tests), consider switching from repeated np.random.seed(42) calls to a local generator (e.g., rng = np.random.default_rng(42) and then rng.uniform(...), rng.exponential(...)) inside each fixture.


188-226: Unify benchmark.extra_info metadata for easier result comparison

In test_creation_performance_comparison you only set benchmark.extra_info["spectrum_size"], while the JIT variant also tags "implementation": "jit". For symmetric reporting and simpler downstream analysis, consider tagging the regular implementation as well:

     def test_creation_performance_comparison(
         self, benchmark, comparison_data, size
     ):
         """Compare creation performance across different spectrum sizes."""
         data = comparison_data[f"size_{size}"]

         # This will show in benchmark results which size is being tested
-        benchmark.extra_info["spectrum_size"] = size
+        benchmark.extra_info["spectrum_size"] = size
+        benchmark.extra_info["implementation"] = "regular"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8398d14 and 34921ab.

📒 Files selected for processing (1)
  • benchmarks/test_spectrum_benchmarks.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/test_spectrum_benchmarks.py (1)
spectrum_utils/spectrum.py (2)
  • MsmsSpectrum (224-868)
  • MsmsSpectrumJit (37-221)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build (windows-latest, 3.11)
  • GitHub Check: build (windows-latest, 3.12)
  • GitHub Check: build (macos-latest, 3.10)
  • GitHub Check: build (windows-latest, 3.10)
🔇 Additional comments (2)
benchmarks/test_spectrum_benchmarks.py (2)

73-107: Creation benchmarks and JIT warmup usage look correct

Using benchmark.pedantic(..., rounds=50, iterations=1) with kwargs=sample_data/large_sample_data is idiomatic, and the separate MsmsSpectrumJit(**...) warmup before timing avoids contaminating the measured constructor cost. The simple length assertions keep overhead low while still validating results.


109-185: Operation benchmarks now correctly avoid in-place mutation skew

Wrapping round, filter_intensity, and scale_intensity in small helpers that construct a fresh MsmsSpectrum/MsmsSpectrumJit per iteration, plus doing JIT warmup on separate instances, fixes the earlier “benchmark on already-mutated spectrum” problem and uses positional args where required for jitclasses. The assertions and parameters (decimals=2, min_intensity=0.01, scaling="root") align with the spectrum_utils APIs.

@bittremieuxlab bittremieuxlab deleted a comment from github-actions bot Nov 21, 2025
@bittremieuxlab bittremieuxlab deleted a comment from github-actions bot Nov 21, 2025
@bittremieuxlab bittremieuxlab deleted a comment from github-actions bot Nov 21, 2025
@bittremieuxlab bittremieuxlab deleted a comment from github-actions bot Nov 21, 2025
@bittremieuxlab bittremieuxlab deleted a comment from github-actions bot Nov 21, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
benchmarks/test_spectrum_benchmarks.py (2)

15-67: Consider documenting the rationale for different intensity distributions.

The sample_data and large_sample_data fixtures use exponential distribution for intensities, while comparison_data uses uniform distribution (0 to 1). Different distributions may affect benchmark results due to varying data characteristics (e.g., number of peaks filtered, computational complexity). If this is intentional for testing different scenarios, consider adding comments to document the rationale.


109-191: Consider using positional arguments consistently for clarity.

The regular spectrum tests use keyword arguments (e.g., decimals=2, min_intensity=0.01, scaling="root"), while JIT tests use positional arguments. Although technically correct (MsmsSpectrum supports keywords, but MsmsSpectrumJit requires positional due to Numba jitclass constraints), using positional arguments consistently in both variants would improve clarity and make the API constraints more obvious.

Example for consistency:

     def test_spectrum_round_regular(self, benchmark, sample_data):
         """Benchmark rounding operation on regular spectrum."""

         def setup():
             return (MsmsSpectrum(**sample_data),), {}

         result = benchmark.pedantic(
-            lambda s: s.round(decimals=2), setup=setup, rounds=50, iterations=1
+            lambda s: s.round(2), setup=setup, rounds=50, iterations=1
         )
         assert result is not None
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 34921ab and 983bf8f.

📒 Files selected for processing (1)
  • benchmarks/test_spectrum_benchmarks.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/test_spectrum_benchmarks.py (1)
spectrum_utils/spectrum.py (2)
  • MsmsSpectrum (224-868)
  • MsmsSpectrumJit (37-221)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build (macos-latest, 3.10)
  • GitHub Check: build (macos-latest, 3.12)
  • GitHub Check: build (windows-latest, 3.11)
  • GitHub Check: build (windows-latest, 3.12)
  • GitHub Check: build (windows-latest, 3.10)
🔇 Additional comments (1)
benchmarks/test_spectrum_benchmarks.py (1)

194-232: LGTM! Well-structured parametric benchmarks.

The comparison tests effectively leverage pytest's parametrization to benchmark across multiple spectrum sizes. The use of benchmark.extra_info to track size and implementation type will help with result analysis.

@bittremieuxlab bittremieuxlab deleted a comment from github-actions bot Nov 21, 2025
@bittremieuxlab bittremieuxlab deleted a comment from github-actions bot Nov 21, 2025
@bittremieuxlab bittremieuxlab deleted a comment from github-actions bot Nov 21, 2025
@github-actions
Copy link

🚀 Performance Benchmark Results (Python 3.11)

Comparing PR branch vs main branch

Benchmark PR (ms) Main (ms) Δ (ms) Change % Rounds Status
test_spectrum_creation_regular 0.037 0.035 +0.001 +3.7% 50
test_spectrum_creation_jit 0.030 0.029 +0.001 +2.9% 50
test_spectrum_creation_large_regular 0.165 0.164 +0.002 +1.0% 50
test_spectrum_creation_large_jit 0.159 0.159 -0.001 -0.4% 50
test_spectrum_round_regular 0.059 0.054 +0.006 +10.3% 5 ⚠️ 10.3% slower
test_spectrum_round_jit 0.018 0.018 +0.000 +1.3% 34479
test_spectrum_filter_intensity_regular 0.025 0.024 +0.001 +4.9% 5
test_spectrum_filter_intensity_jit 0.019 0.018 +0.000 +1.3% 13110
test_spectrum_scale_intensity_regular 0.056 0.054 +0.002 +3.7% 5
test_spectrum_scale_intensity_jit 0.013 0.013 +0.000 +2.2% 15761
test_creation_performance_comparison[100] 0.025 0.023 +0.002 +8.5% 50 ⚠️ 8.5% slower
test_creation_performance_comparison[1000] 0.035 0.034 +0.001 +3.6% 50
test_creation_performance_comparison[5000] 0.090 0.089 +0.001 +1.6% 50
test_creation_performance_comparison[10000] 0.165 0.162 +0.003 +1.6% 50
test_jit_creation_performance_comparison[100] 0.019 0.018 +0.001 +4.9% 50
test_jit_creation_performance_comparison[1000] 0.029 0.028 +0.001 +2.4% 50
test_jit_creation_performance_comparison[5000] 0.085 0.083 +0.001 +1.7% 50
test_jit_creation_performance_comparison[10000] 0.158 0.156 +0.002 +1.2% 50

Summary

  • 0 improvements (>5% faster)
  • ⚠️ 2 regressions (>5% slower)
  • 16 unchanged (within ±5%)

Changes smaller than ±5% are not considered significant.
Lower times are better.

@bittremieuxlab bittremieuxlab deleted a comment from github-actions bot Nov 21, 2025
@github-actions
Copy link

🚀 Performance Benchmark Results (Python 3.11)

Comparing PR branch vs main branch

Benchmark PR (ms) Main (ms) Δ (ms) Change % Rounds Status
test_spectrum_creation_regular 0.035 0.035 +0.000 +1.2% 50
test_spectrum_creation_jit 0.030 0.029 +0.000 +1.1% 50
test_spectrum_creation_large_regular 0.162 0.163 -0.000 -0.3% 50
test_spectrum_creation_large_jit 0.155 0.156 -0.001 -0.4% 50
test_spectrum_round_regular 0.080 0.050 +0.029 +58.6% 5 ⚠️ 58.6% slower
test_spectrum_round_jit 0.019 0.018 +0.000 +1.8% 34658
test_spectrum_filter_intensity_regular 0.033 0.023 +0.010 +42.5% 5 ⚠️ 42.5% slower
test_spectrum_filter_intensity_jit 0.018 0.018 -0.000 -1.3% 10001
test_spectrum_scale_intensity_regular 0.048 0.056 -0.008 -14.4% 5 ✅ 14.4% faster
test_spectrum_scale_intensity_jit 0.013 0.013 +0.000 +0.6% 16683
test_creation_performance_comparison[100] 0.024 0.023 +0.001 +5.3% 50 ⚠️ 5.3% slower
test_creation_performance_comparison[1000] 0.034 0.034 +0.000 +0.3% 50
test_creation_performance_comparison[5000] 0.089 0.088 +0.001 +0.8% 50
test_creation_performance_comparison[10000] 0.163 0.161 +0.002 +1.2% 50
test_jit_creation_performance_comparison[100] 0.019 0.018 +0.001 +5.2% 50 ⚠️ 5.2% slower
test_jit_creation_performance_comparison[1000] 0.030 0.028 +0.001 +4.2% 50
test_jit_creation_performance_comparison[5000] 0.083 0.082 +0.001 +1.2% 50
test_jit_creation_performance_comparison[10000] 0.157 0.155 +0.002 +1.0% 50

Summary

  • 1 improvement (>5% faster)
  • ⚠️ 4 regressions (>5% slower)
  • 13 unchanged (within ±5%)

Changes smaller than ±5% are not considered significant.
Lower times are better.

@github-actions
Copy link

🚀 Performance Benchmark Results (Python 3.11)

Comparing PR branch vs main branch

Benchmark PR (ms) Main (ms) Δ (ms) Change % Rounds Status
test_spectrum_creation_regular 0.036 0.035 +0.001 +1.6% 50
test_spectrum_creation_jit 0.030 0.029 +0.001 +2.7% 50
test_spectrum_creation_large_regular 0.162 0.161 +0.001 +0.6% 50
test_spectrum_creation_large_jit 0.158 0.155 +0.002 +1.5% 50
test_spectrum_round_regular 0.061 0.052 +0.009 +17.3% 5 ⚠️ 17.3% slower
test_spectrum_round_jit 0.018 0.018 +0.000 +0.1% 35046
test_spectrum_filter_intensity_regular 0.023 0.034 -0.012 -33.7% 5 ✅ 33.7% faster
test_spectrum_filter_intensity_jit 0.018 0.018 +0.000 +0.2% 13122
test_spectrum_scale_intensity_regular 0.051 0.049 +0.003 +5.5% 5 ⚠️ 5.5% slower
test_spectrum_scale_intensity_jit 0.013 0.013 +0.000 +3.2% 17171
test_creation_performance_comparison[100] 0.024 0.023 +0.002 +6.6% 50 ⚠️ 6.6% slower
test_creation_performance_comparison[1000] 0.034 0.034 +0.001 +2.2% 50
test_creation_performance_comparison[5000] 0.089 0.088 +0.001 +1.1% 50
test_creation_performance_comparison[10000] 0.162 0.162 +0.000 +0.2% 50
test_jit_creation_performance_comparison[100] 0.019 0.018 +0.000 +2.6% 50
test_jit_creation_performance_comparison[1000] 0.029 0.029 +0.000 +0.9% 50
test_jit_creation_performance_comparison[5000] 0.083 0.082 +0.001 +0.9% 50
test_jit_creation_performance_comparison[10000] 0.156 0.155 +0.001 +0.8% 50

Summary

  • 1 improvement (>5% faster)
  • ⚠️ 3 regressions (>5% slower)
  • 14 unchanged (within ±5%)

Changes smaller than ±5% are not considered significant.
Lower times are better.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/proforma_test.py (1)

2370-2395: Skip logic is reasonable for known upstream issues.

The try-except with conditional skip handles the known XLMOD OBO parsing issue without masking other errors. The dual condition check ("XLMOD" and "OBO file" in error message) ensures only the specific known issue triggers the skip.

Consider whether pytest.xfail would be more semantically appropriate than pytest.skip to indicate this is an expected failure rather than a test that should be skipped. However, the current approach is acceptable.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13345bb and 7470b81.

📒 Files selected for processing (2)
  • spectrum_utils/proforma.py (2 hunks)
  • tests/proforma_test.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/proforma_test.py (1)
spectrum_utils/proforma.py (9)
  • proteoform (373-395)
  • proforma (366-371)
  • parse (642-691)
  • mass (334-342)
  • accession (118-121)
  • accession (124-125)
  • name (128-133)
  • name (136-137)
  • _import_cv (695-823)
🪛 GitHub Actions: Run tests
tests/proforma_test.py

[error] 1-1: pytest failed: test_proforma_name failed, test_proforma_accession failed, test_proforma_xlink failed in tests/proforma_test.py


[error] 1-1: Multiple Proforma-related tests failed during pytest run.

spectrum_utils/proforma.py

[error] 858-865: SyntaxError: Failed to parse XLMOD controlled vocabulary OBO file. This is likely due to a format issue in the upstream vocabulary file. Original error: expected QuotedString. Command: pytest --cov=spectrum_utils --verbose tests/.

🔇 Additional comments (3)
spectrum_utils/proforma.py (2)

186-188: LGTM - Documentation updated correctly.

The addition of SyntaxError to the docstring accurately documents the new error handling behavior for OBO parsing failures.


857-867: The error handling is correct — no changes needed.

fastobo.load() returns an OboDoc (the OBO document deserialized into an Abstract Syntax Tree), not a generator. Since it's eagerly evaluated, any SyntaxError is raised when fastobo.load(obo_fh) is called on line 858, which is inside the try-except block. The error will be caught and re-raised with your custom message as intended.

tests/proforma_test.py (1)

2453-2467: Consistent and correct skip logic for CV imports.

This skip block follows the same pattern as the earlier change, properly scoping the skip to XLMOD with OBO parsing errors. The implementation correctly re-raises errors for other vocabularies.

@github-actions
Copy link

🚀 Performance Benchmark Results (Python 3.11)

Comparing PR branch vs main branch

Benchmark PR (ms) Main (ms) Δ (ms) Change % Rounds Status
test_spectrum_creation_regular 0.034 0.035 -0.001 -2.0% 50
test_spectrum_creation_jit 0.029 0.030 -0.000 -1.1% 50
test_spectrum_creation_large_regular 0.164 0.163 +0.001 +0.4% 50
test_spectrum_creation_large_jit 0.156 0.156 +0.000 +0.0% 50
test_spectrum_round_regular 0.049 0.050 -0.001 -1.9% 5
test_spectrum_round_jit 0.018 0.018 +0.000 +1.1% 35571
test_spectrum_filter_intensity_regular 0.024 0.023 +0.001 +5.7% 5 ⚠️ 5.7% slower
test_spectrum_filter_intensity_jit 0.018 0.018 -0.000 -0.2% 13068
test_spectrum_scale_intensity_regular 0.049 0.048 +0.001 +1.8% 5
test_spectrum_scale_intensity_jit 0.013 0.012 +0.000 +3.2% 18132
test_creation_performance_comparison[100] 0.023 0.023 -0.000 -1.0% 50
test_creation_performance_comparison[1000] 0.034 0.034 -0.000 -0.6% 50
test_creation_performance_comparison[5000] 0.087 0.088 -0.001 -0.6% 50
test_creation_performance_comparison[10000] 0.162 0.162 -0.000 -0.2% 50
test_jit_creation_performance_comparison[100] 0.019 0.018 +0.000 +1.6% 50
test_jit_creation_performance_comparison[1000] 0.028 0.028 +0.000 +0.2% 50
test_jit_creation_performance_comparison[5000] 0.082 0.082 -0.001 -0.6% 50
test_jit_creation_performance_comparison[10000] 0.155 0.156 -0.001 -0.6% 50

Summary

  • 0 improvements (>5% faster)
  • ⚠️ 1 regression (>5% slower)
  • 17 unchanged (within ±5%)

Changes smaller than ±5% are not considered significant.
Lower times are better.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7470b81 and f941067.

📒 Files selected for processing (1)
  • tests/proforma_test.py (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/proforma_test.py (1)
spectrum_utils/proforma.py (8)
  • proforma (366-371)
  • parse (642-691)
  • mass (334-342)
  • accession (118-121)
  • accession (124-125)
  • name (128-133)
  • name (136-137)
  • _import_cv (695-823)
🔇 Additional comments (1)
tests/proforma_test.py (1)

208-232: Pattern for handling known XLMOD issue is reasonable.

The try-except pattern used here appropriately handles the known upstream XLMOD/OBO parsing issue by:

  • Catching SyntaxError specifically
  • Checking for diagnostic strings before skipping
  • Re-raising unexpected errors
  • Providing clear skip messages

The string matching on "XLMOD" in str(e) and "OBO file" in str(e) is pragmatic for tests, though it's fragile to upstream error message changes.

Consider adding a test that verifies the XLMOD issue resolution once upstream fixes are available:

@pytest.mark.xfail(reason="XLMOD OBO parsing issue - remove xfail when upstream is fixed")
def test_xlmod_parsing_recovered():
    """Verify XLMOD parsing works again after upstream fix."""
    proteoform = proforma.parse("EMEVTK[X:DSS#XL1]SESPEK")[0]
    assert proteoform.modifications[0].source[0].controlled_vocabulary == "XLMOD"

Also applies to: 403-415, 2451-2476, 2534-2548

@rukubrakov rukubrakov self-assigned this Nov 25, 2025
@github-actions
Copy link

🚀 Performance Benchmark Results (Python 3.11)

Comparing PR branch vs main branch

Benchmark PR (ms) Main (ms) Δ (ms) Change % Rounds Status
test_spectrum_creation_regular 0.035 0.037 -0.002 -5.4% 50 ✅ 5.4% faster
test_spectrum_creation_jit 0.029 0.029 -0.000 -0.1% 50
test_spectrum_creation_large_regular 0.162 0.163 -0.001 -0.4% 50
test_spectrum_creation_large_jit 0.156 0.156 -0.000 -0.3% 50
test_spectrum_round_regular 0.052 0.051 +0.002 +3.6% 5
test_spectrum_round_jit 0.018 0.018 +0.000 +0.1% 35295
test_spectrum_filter_intensity_regular 0.024 0.023 +0.000 +2.0% 5
test_spectrum_filter_intensity_jit 0.018 0.018 -0.000 -0.2% 13074
test_spectrum_scale_intensity_regular 0.051 0.049 +0.001 +2.8% 5
test_spectrum_scale_intensity_jit 0.013 0.012 +0.000 +3.5% 16633
test_creation_performance_comparison[100] 0.024 0.024 -0.000 -0.9% 50
test_creation_performance_comparison[1000] 0.035 0.035 -0.000 -0.7% 50
test_creation_performance_comparison[5000] 0.090 0.090 +0.001 +0.8% 50
test_creation_performance_comparison[10000] 0.161 0.163 -0.002 -1.2% 50
test_jit_creation_performance_comparison[100] 0.019 0.018 +0.001 +3.1% 50
test_jit_creation_performance_comparison[1000] 0.030 0.028 +0.002 +5.9% 50 ⚠️ 5.9% slower
test_jit_creation_performance_comparison[5000] 0.083 0.083 -0.001 -1.0% 50
test_jit_creation_performance_comparison[10000] 0.156 0.157 -0.001 -0.7% 50

Summary

  • 1 improvement (>5% faster)
  • ⚠️ 1 regression (>5% slower)
  • 16 unchanged (within ±5%)

Changes smaller than ±5% are not considered significant.
Lower times are better.

@bittremieux bittremieux merged commit b6e5d48 into main Nov 25, 2025
21 of 22 checks passed
@bittremieux bittremieux deleted the feature/automated-benchmarking branch November 25, 2025 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants