Skip to content

feat(autofix): Update seer explorer autofix last triggered on completion#111663

Merged
Zylphrex merged 9 commits intomasterfrom
txiao/feat/update-seer-explorer-autofix-last-triggered-on-completion
Mar 30, 2026
Merged

feat(autofix): Update seer explorer autofix last triggered on completion#111663
Zylphrex merged 9 commits intomasterfrom
txiao/feat/update-seer-explorer-autofix-last-triggered-on-completion

Conversation

@Zylphrex
Copy link
Copy Markdown
Member

Instead of updating the timestamp when the job is triggered, wait until the completion webhook fires. This ensured that it only updates when we know there is at least a root cause.

Instead of updating the timestamp when the job is triggered, wait until the
completion webhook fires. This ensured that it only updates when we know there
is at least a root cause.
@Zylphrex Zylphrex requested a review from a team as a code owner March 26, 2026 18:38
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 26, 2026
Copy link
Copy Markdown
Contributor

@sentry-warden sentry-warden bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Organization scoping removed from Group fetch - defense-in-depth weakened (src/sentry/seer/autofix/on_completion_hook.py:238)

The code change removes organization scoping from the Group.objects.get() call. Previously in _maybe_continue_pipeline, the group was fetched with Group.objects.get(id=group_id, project__organization=organization). Now in execute() at line 86, it uses Group.objects.get(id=group_id) without verifying the group belongs to the organization. While this is an internal Seer webhook (not user-facing), removing the org scope weakens defense-in-depth. If Seer's metadata were ever corrupted or buggy, operations (webhook broadcasts, group timestamp updates, autofix triggers) could affect groups from other organizations.

Identified by Warden sentry-security

Comment thread src/sentry/seer/autofix/on_completion_hook.py Outdated
Comment thread src/sentry/seer/autofix/on_completion_hook.py Outdated
Comment thread src/sentry/seer/autofix/on_completion_hook.py
if group_id is None:
return

group = Group.objects.get(id=group_id, project__organization_id=organization.id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The Group.objects.get() call lacks exception handling. If a group is deleted mid-process, the hook will fail and abort subsequent steps.
Severity: MEDIUM

Suggested Fix

Wrap the Group.objects.get(id=group_id, ...) call in a try/except Group.DoesNotExist block. Inside the except block, log a warning and return gracefully, similar to how the previous version of the code handled this case.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/seer/autofix/on_completion_hook.py#L85

Potential issue: The `Group.objects.get()` call at line 85 is not wrapped in a
`try/except` block. If a group is deleted between when an autofix is triggered and when
the completion webhook fires, a `Group.DoesNotExist` exception will be raised. This
unhandled exception propagates up and causes the entire hook to abort. As a result,
subsequent actions like sending completion webhooks, continuing the autofix pipeline,
and triggering supergroup embeddings will not execute. This is a regression from the
previous implementation, which defensively handled this known scenario.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Unhandled Group.DoesNotExist exception in execute
    • Added try-except block around Group.objects.get() with proper logging and early return to handle cases where group is deleted between job start and webhook completion.
  • ✅ Fixed: Test passes None for group, causing AttributeError
    • Updated _maybe_trigger_supergroups_embedding signature to accept Group | None and added None check before accessing group.project to handle the edge case tested in test_skips_embedding_when_no_group.

Create PR

Or push these changes by commenting:

@cursor push 0d5bd0eb8b
Preview (0d5bd0eb8b)
diff --git a/src/sentry/seer/autofix/on_completion_hook.py b/src/sentry/seer/autofix/on_completion_hook.py
--- a/src/sentry/seer/autofix/on_completion_hook.py
+++ b/src/sentry/seer/autofix/on_completion_hook.py
@@ -82,7 +82,15 @@
         if group_id is None:
             return
 
-        group = Group.objects.get(id=group_id, project__organization_id=organization.id)
+        try:
+            group = Group.objects.get(id=group_id, project__organization_id=organization.id)
+        except Group.DoesNotExist:
+            logger.info(
+                "autofix.on_completion_hook.group_not_found",
+                extra={"run_id": run_id, "organization_id": organization.id, "group_id": group_id},
+            )
+            return
+
         group.update(seer_explorer_autofix_last_triggered=timezone.now())
 
         # Send webhook for the completed step
@@ -227,13 +235,16 @@
         organization: Organization,
         run_id: int,
         state: SeerRunState,
-        group: Group,
+        group: Group | None,
     ) -> None:
         """Trigger supergroups embedding if feature flag is enabled."""
         current_step = cls._get_current_step(state)
         if current_step != AutofixStep.ROOT_CAUSE:
             return
 
+        if group is None:
+            return
+
         if not features.has("projects:supergroup-embeddings-explorer", group.project):
             return

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Comment thread src/sentry/seer/autofix/on_completion_hook.py
Comment thread tests/sentry/seer/autofix/test_autofix_on_completion_hook.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Backend Test Failures

Failures on 719468a in this run:

tests/sentry/seer/autofix/test_autofix_on_completion_hook.py::TestAutofixOnCompletionHookSupergroups::test_skips_embedding_when_no_grouplog
tests/sentry/seer/autofix/test_autofix_on_completion_hook.py:418: in test_skips_embedding_when_no_group
    AutofixOnCompletionHook._maybe_trigger_supergroups_embedding(
src/sentry/seer/autofix/on_completion_hook.py:237: in _maybe_trigger_supergroups_embedding
    if not features.has("projects:supergroup-embeddings-explorer", group.project):
E   AttributeError: 'NoneType' object has no attribute 'project'

Comment thread src/sentry/seer/autofix/on_completion_hook.py
Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread src/sentry/seer/autofix/on_completion_hook.py
Copy link
Copy Markdown
Contributor

@kddubey kddubey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this change cause multiple autofix runs to be triggered at the same time? IIRC there's a 5 min guard for preventing duplicate autofix runs in the automation flow, but i'm not sure it covers everything

one concern might be that on new events, if explorer/autofix is breaking on some issue for a systematic reason, e.g., we add some buggy tool, we'll keep triggering failing runs after the 5 min TTL expires. or if b/c of an inc our queue is backed up, we'll pile on to it

@Mihir-Mavalankar is the most familiar w/ how we use last_triggered_at

@kddubey
Copy link
Copy Markdown
Contributor

kddubey commented Mar 27, 2026

could it make sense to update the value both when it's triggered and when it completes?

@Mihir-Mavalankar
Copy link
Copy Markdown
Contributor

can this change cause multiple autofix runs to be triggered at the same time? IIRC there's a 5 min guard for preventing duplicate autofix runs in the automation flow, but i'm not sure it covers everything

one concern might be that on new events, if explorer/autofix is breaking on some issue for a systematic reason, e.g., we add some buggy tool, we'll keep triggering failing runs after the 5 min TTL expires. or if b/c of an inc our queue is backed up, we'll pile on to it

@Mihir-Mavalankar is the most familiar w/ how we use last_triggered_at

We do check for this here to not trigger duplicate runs -

if group.seer_explorer_autofix_last_triggered is not None:
return "already_triggered_explorer"

This prevents multiple task workers from starting the automation task. Since automations can still trigger runs we should check this even with autofix explorer right?

@Zylphrex Zylphrex requested review from a team as code owners March 27, 2026 21:23
Copy link
Copy Markdown
Contributor

@Mihir-Mavalankar Mihir-Mavalankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense with new increased timeout.

@Zylphrex Zylphrex merged commit 4ab83b7 into master Mar 30, 2026
107 checks passed
@Zylphrex Zylphrex deleted the txiao/feat/update-seer-explorer-autofix-last-triggered-on-completion branch March 30, 2026 17:18
@github-actions github-actions bot locked and limited conversation to collaborators Apr 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants