-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
feat(seer): Add lightweight RCA clustering endpoint integration #112229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1b7c082
98a5f16
a3bbef2
6f9d2d1
4905a86
c984fcf
2b84529
2b9b855
87ecf74
ce103c2
aa18de9
4270ced
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -376,6 +376,14 @@ class SupergroupsEmbeddingRequest(TypedDict): | |||||
| artifact_data: dict[str, Any] | ||||||
|
|
||||||
|
|
||||||
| class LightweightRCAClusterRequest(TypedDict): | ||||||
| group_id: int | ||||||
| issue: dict[str, Any] | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seer has this thing where issue is the word used in APIs, I am mimicking IssueSummary endpoint here, and its in other places as well, the model used in code there is IssueDetails, and so theres this weird thing of like Issue is the model, group_id is the number it gets... |
||||||
| organization_slug: str | ||||||
| organization_id: int | ||||||
| project_id: int | ||||||
|
|
||||||
|
|
||||||
| class SupergroupsGetRequest(TypedDict): | ||||||
| organization_id: int | ||||||
| supergroup_id: int | ||||||
|
|
@@ -485,6 +493,20 @@ def make_supergroups_embedding_request( | |||||
| ) | ||||||
|
|
||||||
|
|
||||||
| def make_lightweight_rca_cluster_request( | ||||||
| body: LightweightRCAClusterRequest, | ||||||
| timeout: int | float | None = None, | ||||||
| viewer_context: SeerViewerContext | None = None, | ||||||
| ) -> BaseHTTPResponse: | ||||||
| return make_signed_seer_api_request( | ||||||
| seer_autofix_default_connection_pool, | ||||||
| "/v0/issues/supergroups/cluster-lightweight", | ||||||
| body=orjson.dumps(body, option=orjson.OPT_NON_STR_KEYS), | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. curious, what is this orjson option?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. its to allow dict keys that are non string, in this case integers - from what I understand when we send event data it contains these kind of keys and its requried that we allow it, the issue summary endpoint does the same thing for the same reason I believe |
||||||
| timeout=timeout, | ||||||
| viewer_context=viewer_context, | ||||||
| ) | ||||||
|
|
||||||
|
|
||||||
| def make_supergroups_get_request( | ||||||
| body: SupergroupsGetRequest, | ||||||
| viewer_context: SeerViewerContext, | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| from __future__ import annotations | ||
|
|
||
| import logging | ||
|
|
||
| from sentry.api.serializers import EventSerializer, serialize | ||
| from sentry.eventstore import backend as eventstore | ||
| from sentry.models.group import Group | ||
| from sentry.seer.models import SeerApiError | ||
|
sentry[bot] marked this conversation as resolved.
|
||
| from sentry.seer.signed_seer_api import ( | ||
| LightweightRCAClusterRequest, | ||
| SeerViewerContext, | ||
| make_lightweight_rca_cluster_request, | ||
| ) | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def trigger_lightweight_rca_cluster(group: Group) -> None: | ||
| """ | ||
| Call Seer's lightweight RCA clustering endpoint for the given group. | ||
|
|
||
| Sends issue event data to Seer, which generates a lightweight root cause analysis | ||
| and clusters the issue into supergroups based on embedding similarity. | ||
| """ | ||
| event = group.get_latest_event() | ||
| if not event: | ||
| logger.info( | ||
| "lightweight_rca_cluster.no_event", | ||
| extra={"group_id": group.id}, | ||
| ) | ||
| return | ||
|
|
||
| ready_event = eventstore.get_event_by_id(group.project.id, event.event_id, group_id=group.id) | ||
| if not ready_event: | ||
| logger.info( | ||
| "lightweight_rca_cluster.event_not_ready", | ||
| extra={"group_id": group.id, "event_id": event.event_id}, | ||
| ) | ||
| return | ||
|
|
||
| serialized_event = serialize(ready_event, None, EventSerializer()) | ||
|
|
||
| body = LightweightRCAClusterRequest( | ||
| group_id=group.id, | ||
| issue={ | ||
| "id": group.id, | ||
| "title": group.title, | ||
| "short_id": group.qualified_short_id, | ||
| "events": [serialized_event], | ||
| }, | ||
| organization_slug=group.organization.slug, | ||
| organization_id=group.organization.id, | ||
| project_id=group.project.id, | ||
| ) | ||
| viewer_context = SeerViewerContext(organization_id=group.organization.id) | ||
|
sentry-warden[bot] marked this conversation as resolved.
|
||
|
|
||
| response = make_lightweight_rca_cluster_request(body, timeout=30, viewer_context=viewer_context) | ||
| if response.status >= 400: | ||
| raise SeerApiError("Lightweight RCA cluster request failed", response.status) | ||
|
|
||
| logger.info( | ||
| "lightweight_rca_cluster.success", | ||
| extra={ | ||
| "group_id": group.id, | ||
| "project_id": group.project.id, | ||
| "organization_id": group.organization.id, | ||
| }, | ||
| ) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| import logging | ||
|
|
||
| from sentry.models.group import Group | ||
| from sentry.seer.supergroups.lightweight_rca_cluster import trigger_lightweight_rca_cluster | ||
| from sentry.tasks.base import instrumented_task | ||
| from sentry.taskworker.namespaces import ingest_errors_tasks | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| @instrumented_task( | ||
| name="sentry.tasks.seer.lightweight_rca_cluster.trigger_lightweight_rca_cluster_task", | ||
| namespace=ingest_errors_tasks, | ||
| ) | ||
| def trigger_lightweight_rca_cluster_task(group_id: int, **kwargs) -> None: | ||
| try: | ||
| group = Group.objects.get(id=group_id) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't we have the group in
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. its standard for these tasks, you cant pass a model into them, they gotta refetch |
||
| except Group.DoesNotExist: | ||
| logger.info( | ||
| "lightweight_rca_cluster_task.group_not_found", | ||
| extra={"group_id": group_id}, | ||
| ) | ||
| return | ||
|
|
||
| try: | ||
| trigger_lightweight_rca_cluster(group) | ||
| except Exception: | ||
| logger.exception( | ||
| "lightweight_rca_cluster_task.failed", | ||
| extra={"group_id": group_id}, | ||
| ) | ||
| raise | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably name this lightweight_rca_embedding or just lightweight rca?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its like the command - to trigger clustering, because I basically treat the task as not a task to generate lgithweight-rca and a side effect of clustering, but instead of purposely triggering clustering, because before we didnt even save the lightweight-rca.
The endpoint I added is even called
/cluster-lightweight, so thats like the command here, so I think the name fits. Does that make sense? I dont feel strongly about it though, just want it all to be coherentThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it should be more explicit like just "trigger_supergroup_clustering_lightweight"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i also don't feel too strongly, i think a slightly more consistent name would be 'lightweight rca embedding' / 'lightweight rca generation' but i think cluster also fits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer cluster to both of these I think, like embedding is kinda technical, its not what the caller really intends, and generation is sort of inaccurate because of the way we set it up where the point is to cluster by lightweightRCA - not to generate it, we didnt even save the generated RCA until we realized we need it for resummarization, so its like a side effect now.
Just so were all on the same page - I am just trying to be consistent with the way I phrased and treated it up until now, I even leaned in the direction of making it all about lightweight RCA generation and the clustering being a side effect, but went the other way around in the API and flow, so I think we should stay consistent.