Conversation
Instead of updating the timestamp when the job is triggered, wait until the completion webhook fires. This ensured that it only updates when we know there is at least a root cause.
There was a problem hiding this comment.
Organization scoping removed from Group fetch - defense-in-depth weakened (src/sentry/seer/autofix/on_completion_hook.py:238)
The code change removes organization scoping from the Group.objects.get() call. Previously in _maybe_continue_pipeline, the group was fetched with Group.objects.get(id=group_id, project__organization=organization). Now in execute() at line 86, it uses Group.objects.get(id=group_id) without verifying the group belongs to the organization. While this is an internal Seer webhook (not user-facing), removing the org scope weakens defense-in-depth. If Seer's metadata were ever corrupted or buggy, operations (webhook broadcasts, group timestamp updates, autofix triggers) could affect groups from other organizations.
Identified by Warden sentry-security
…st-triggered-on-completion
| if group_id is None: | ||
| return | ||
|
|
||
| group = Group.objects.get(id=group_id, project__organization_id=organization.id) |
There was a problem hiding this comment.
Bug: The Group.objects.get() call lacks exception handling. If a group is deleted mid-process, the hook will fail and abort subsequent steps.
Severity: MEDIUM
Suggested Fix
Wrap the Group.objects.get(id=group_id, ...) call in a try/except Group.DoesNotExist block. Inside the except block, log a warning and return gracefully, similar to how the previous version of the code handled this case.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/seer/autofix/on_completion_hook.py#L85
Potential issue: The `Group.objects.get()` call at line 85 is not wrapped in a
`try/except` block. If a group is deleted between when an autofix is triggered and when
the completion webhook fires, a `Group.DoesNotExist` exception will be raised. This
unhandled exception propagates up and causes the entire hook to abort. As a result,
subsequent actions like sending completion webhooks, continuing the autofix pipeline,
and triggering supergroup embeddings will not execute. This is a regression from the
previous implementation, which defensively handled this known scenario.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Unhandled
Group.DoesNotExistexception inexecute- Added try-except block around Group.objects.get() with proper logging and early return to handle cases where group is deleted between job start and webhook completion.
- ✅ Fixed: Test passes
Noneforgroup, causingAttributeError- Updated _maybe_trigger_supergroups_embedding signature to accept Group | None and added None check before accessing group.project to handle the edge case tested in test_skips_embedding_when_no_group.
Or push these changes by commenting:
@cursor push 0d5bd0eb8b
Preview (0d5bd0eb8b)
diff --git a/src/sentry/seer/autofix/on_completion_hook.py b/src/sentry/seer/autofix/on_completion_hook.py
--- a/src/sentry/seer/autofix/on_completion_hook.py
+++ b/src/sentry/seer/autofix/on_completion_hook.py
@@ -82,7 +82,15 @@
if group_id is None:
return
- group = Group.objects.get(id=group_id, project__organization_id=organization.id)
+ try:
+ group = Group.objects.get(id=group_id, project__organization_id=organization.id)
+ except Group.DoesNotExist:
+ logger.info(
+ "autofix.on_completion_hook.group_not_found",
+ extra={"run_id": run_id, "organization_id": organization.id, "group_id": group_id},
+ )
+ return
+
group.update(seer_explorer_autofix_last_triggered=timezone.now())
# Send webhook for the completed step
@@ -227,13 +235,16 @@
organization: Organization,
run_id: int,
state: SeerRunState,
- group: Group,
+ group: Group | None,
) -> None:
"""Trigger supergroups embedding if feature flag is enabled."""
current_step = cls._get_current_step(state)
if current_step != AutofixStep.ROOT_CAUSE:
return
+ if group is None:
+ return
+
if not features.has("projects:supergroup-embeddings-explorer", group.project):
returnThis Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
Backend Test FailuresFailures on
|
…st-triggered-on-completion
…st-triggered-on-completion
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
There was a problem hiding this comment.
can this change cause multiple autofix runs to be triggered at the same time? IIRC there's a 5 min guard for preventing duplicate autofix runs in the automation flow, but i'm not sure it covers everything
one concern might be that on new events, if explorer/autofix is breaking on some issue for a systematic reason, e.g., we add some buggy tool, we'll keep triggering failing runs after the 5 min TTL expires. or if b/c of an inc our queue is backed up, we'll pile on to it
@Mihir-Mavalankar is the most familiar w/ how we use last_triggered_at
|
could it make sense to update the value both when it's triggered and when it completes? |
We do check for this here to not trigger duplicate runs - sentry/src/sentry/seer/autofix/trigger.py Lines 152 to 153 in e3cd415 This prevents multiple task workers from starting the automation task. Since automations can still trigger runs we should check this even with autofix explorer right? |
Mihir-Mavalankar
left a comment
There was a problem hiding this comment.
Makes sense with new increased timeout.
…st-triggered-on-completion


Instead of updating the timestamp when the job is triggered, wait until the completion webhook fires. This ensured that it only updates when we know there is at least a root cause.