Skip to content

fix(incidents): compute resolution correctly in metric issue detector#112623

Merged
kcons merged 2 commits intomasterfrom
kcons/resolt
Apr 13, 2026
Merged

fix(incidents): compute resolution correctly in metric issue detector#112623
kcons merged 2 commits intomasterfrom
kcons/resolt

Conversation

@kcons
Copy link
Copy Markdown
Member

@kcons kcons commented Apr 9, 2026

This updates the Detector implementation of metric alerts to scale resolution based on query frequency to manage snuba capacity the same way we do for AlertRules.

We had to do a few type clean-ups in the process; primarily a switch to timedelta, as we've had a few bugs related to inconsistent interpretation of int durations.

Fixes ISWF-2127.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 9, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

Backend Test Failures

Failures on 8decb0a in this run:

tests/sentry/workflow_engine/endpoints/test_organization_detector_details.py::OrganizationDetectorDetailsPutTest::test_update_config_validlog
[gw0] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/workflow_engine/endpoints/test_organization_detector_details.py:834: in test_update_config_valid
    response = self.get_success_response(
src/sentry/testutils/cases.py:629: in get_success_response
    assert_status_code(response, status_code)
src/sentry/testutils/asserts.py:46: in assert_status_code
    assert minimum <= response.status_code < maximum, response
E   AssertionError: <Response status_code=500, "application/json">
E   assert 500 < 201
E    +  where 500 = <Response status_code=500, "application/json">.status_code
tests/sentry/workflow_engine/endpoints/test_organization_detector_details.py::OrganizationDetectorDetailsPutTest::test_anomaly_detection_to_staticlog
[gw1] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/workflow_engine/endpoints/test_organization_detector_details.py:950: in test_anomaly_detection_to_static
    response = self.get_success_response(
src/sentry/testutils/cases.py:629: in get_success_response
    assert_status_code(response, status_code)
src/sentry/testutils/asserts.py:46: in assert_status_code
    assert minimum <= response.status_code < maximum, response
E   AssertionError: <Response status_code=500, "application/json">
E   assert 500 < 201
E    +  where 500 = <Response status_code=500, "application/json">.status_code
tests/sentry/workflow_engine/endpoints/test_organization_detector_details.py::OrganizationDetectorDetailsPutCacheInvalidationTest::test_put_invalidates_cachelog
[gw1] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/workflow_engine/endpoints/test_organization_detector_details.py:1158: in test_put_invalidates_cache
    self.get_success_response(
src/sentry/testutils/cases.py:629: in get_success_response
    assert_status_code(response, status_code)
src/sentry/testutils/asserts.py:46: in assert_status_code
    assert minimum <= response.status_code < maximum, response
E   AssertionError: <Response status_code=500, "application/json">
E   assert 500 < 201
E    +  where 500 = <Response status_code=500, "application/json">.status_code

@kcons kcons changed the title fix(metric detectors): Scale time windows fix(incidents): compute resolution correctly in metric issue detector Apr 10, 2026
Comment thread src/sentry/incidents/metric_issue_detector.py
@linear-code
Copy link
Copy Markdown

linear-code bot commented Apr 10, 2026

@kcons kcons marked this pull request as ready for review April 10, 2026 23:11
@kcons kcons requested review from a team as code owners April 10, 2026 23:11
Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 518a1a6. Configure here.

Comment thread src/sentry/incidents/metric_issue_detector.py
Copy link
Copy Markdown
Contributor

@saponifi3d saponifi3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Random thought while looking at this PR; in the future we should also revisit the resolution a little to make sure the jitter is working as expected so we can clean up some saw-toothing on evaluations.

aggregate=validated_data["aggregate"],
time_window=timedelta(seconds=validated_data["time_window"]),
resolution=timedelta(minutes=1),
resolution=validated_data.get("resolution", timedelta(minutes=1)),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀 - should we do anything to see if the existing metric alerts have the correct resolutions?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, but by my count there are fewer than 87 potentially impacted cases so far in US (and probably far less than that), so I'm not entirely sure it's worth pursuing. This is more of a thing where we want to close the door before we let in the crowds.

Comment on lines +382 to +383
query_type=data_source.get("query_type", SnubaQuery.Type(snuba_query.type)),
dataset=data_source.get("dataset", Dataset(snuba_query.dataset)),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3

@kcons kcons merged commit e186fae into master Apr 13, 2026
65 checks passed
@kcons kcons deleted the kcons/resolt branch April 13, 2026 15:54
wedamija pushed a commit that referenced this pull request Apr 13, 2026
…#112623)

This updates the Detector implementation of metric alerts to scale
resolution based on query frequency to manage snuba capacity the same
way we do for AlertRules.

We had to do a few type clean-ups in the process; primarily a switch to
timedelta, as we've had a few bugs related to inconsistent
interpretation of int durations.

Fixes ISWF-2127.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants