feat: Add repo indexing job by shruthilayaj · Pull Request #112136 · getsentry/sentry

shruthilayaj · 2026-04-02T19:00:58Z

Schedule repo indexing job for context engine. This is behind a new
"experimental" feature flag so we can see how this context works
out on sentry seer explorer runs. Only runs index
job on Sunday because we don't want to eat into GH API quotas and
interfere with code review and autofix.

Depends on: https://github.com/getsentry/seer/pull/5594

Add test coverage for the new index_repos task including early return conditions, correct payload construction, and repo deduplication across projects. Also fix broken import path and add missing response status check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-02T19:22:49Z

Backend Test Failures

Failures on 82293b8 in this run:

tests/sentry/autofix/test_utils.py::TestGetRepoFromCodeMappings::test_get_repos_from_project_code_mappings_with_data — log

[gw1] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/autofix/test_utils.py:49: in test_get_repos_from_project_code_mappings_with_data
    assert repos == expected_repos
E   AssertionError: assert [{'external_i...sentry', ...}] == [{'external_i...2577440, ...}]
E     
E     At index 0 diff: {'repository_id': 6, 'organization_id': 4557900082577440, 'integration_id': '234', 'provider': 'github', 'owner': 'getsentry', 'name': 'sentry', 'external_id': '123', 'languages': []} != {'repository_id': 6, 'integration_id': '234', 'organization_id': 4557900082577440, 'provider': 'github', 'owner': 'getsentry', 'name': 'sentry', 'external_id': '123'}
E     
E     Full diff:
E       [
E           {
E               'external_id': '123',
E               'integration_id': '234',
E     +         'languages': [],
E               'name': 'sentry',
E               'organization_id': 4557900082577440,
E               'owner': 'getsentry',
E               'provider': 'github',
E               'repository_id': 6,
E           },
E       ]

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: NoneType AttributeError when project has no preferences
- Updated the preferences lookup to default to an empty dict so projects without Seer preferences no longer raise an AttributeError.

Or push these changes by commenting:

@cursor push 11d75ba62c

Preview (11d75ba62c)

diff --git a/src/sentry/tasks/seer/context_engine_index.py b/src/sentry/tasks/seer/context_engine_index.py
--- a/src/sentry/tasks/seer/context_engine_index.py
+++ b/src/sentry/tasks/seer/context_engine_index.py
@@ -259,7 +259,7 @@
     preferences_by_id = bulk_get_project_preferences(organization_id, list(project_map.keys()))
 
     for project_id, project in project_map.items():
-        existing_pref = preferences_by_id.get(str(project_id))
+        existing_pref = preferences_by_id.get(str(project_id), {})
         project_pref_repos = existing_pref.get("repositories") or []
         autofix_repos = get_autofix_repos_from_project_code_mappings(project_map[project_id])

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

cursor · 2026-04-07T15:04:42Z

+
+    for project_id, project in project_map.items():
+        existing_pref = preferences_by_id.get(str(project_id))
+        project_pref_repos = existing_pref.get("repositories") or []


NoneType AttributeError when project has no preferences

High Severity

preferences_by_id.get(str(project_id)) returns None when a project has no Seer preferences, then existing_pref.get("repositories") raises AttributeError: 'NoneType' object has no attribute 'get'. The bulk_get_project_preferences function returns a sparse dict — only projects with configured preferences appear as keys. Using .get(str(project_id), {}) as the default would prevent the crash.

^{Reviewed by Cursor Bugbot for commit 3243fdb. Configure here.}

We should skip projects that don't have preferences setup. If a project does not have preferences then customers basically can't use Seer for that project.

github-actions · 2026-04-07T15:17:24Z

Backend Test Failures

Failures on 955b73a in this run:

tests/sentry/tasks/seer/test_context_engine_index.py::TestIndexRepos::test_deduplicates_repos_across_projects — log

[gw0] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
.venv/lib/python3.13/site-packages/urllib3/connection.py:204: in _new_conn
    sock = connection.create_connection(
.venv/lib/python3.13/site-packages/urllib3/util/connection.py:85: in create_connection
    raise err
.venv/lib/python3.13/site-packages/urllib3/util/connection.py:73: in create_connection
    sock.connect(sa)
E   ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:
.venv/lib/python3.13/site-packages/urllib3/connectionpool.py:787: in urlopen
    response = self._make_request(
.venv/lib/python3.13/site-packages/urllib3/connectionpool.py:493: in _make_request
    conn.request(
.venv/lib/python3.13/site-packages/urllib3/connection.py:500: in request
    self.endheaders()
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/http/client.py:1331: in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/http/client.py:1091: in _send_output
    self.send(msg)
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/http/client.py:1035: in send
    self.connect()
.venv/lib/python3.13/site-packages/urllib3/connection.py:331: in connect
    self.sock = self._new_conn()
.venv/lib/python3.13/site-packages/urllib3/connection.py:219: in _new_conn
    raise NewConnectionError(
E   urllib3.exceptions.NewConnectionError: HTTPConnection(host='127.0.0.1', port=9091): Failed to establish a new connection: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:
tests/sentry/tasks/seer/test_context_engine_index.py:320: in test_deduplicates_repos_across_projects
    index_repos(self.org.id)
.venv/lib/python3.13/site-packages/taskbroker_client/task.py:92: in __call__
    return self._func(*args, **kwargs)
src/sentry/tasks/seer/context_engine_index.py:259: in index_repos
    preferences_by_id = bulk_get_project_preferences(organization_id, list(project_map.keys()))
src/sentry/seer/autofix/utils.py:735: in bulk_get_project_preferences
    response = make_bulk_get_project_preferences_request(
src/sentry/seer/autofix/utils.py:258: in make_bulk_get_project_preferences_request
    return make_signed_seer_api_request(
.venv/lib/python3.13/site-packages/sentry_sdk/tracing_utils.py:916: in sync_wrapper
    result = f(*args, **kwargs)
src/sentry/seer/signed_seer_api.py:164: in make_signed_seer_api_request
    return connection_pool.urlopen(
.venv/lib/python3.13/site-packages/urllib3/connectionpool.py:871: in urlopen
    return self.urlopen(
.venv/lib/python3.13/site-packages/urllib3/connectionpool.py:871: in urlopen
    return self.urlopen(
.venv/lib/python3.13/site-packages/urllib3/connectionpool.py:871: in urlopen
    return self.urlopen(
.venv/lib/python3.13/site-packages/urllib3/connectionpool.py:841: in urlopen
... (4 more lines)

tests/sentry/tasks/seer/test_context_engine_index.py::TestIndexRepos::test_calls_seer_with_correct_org_and_repos — log

[gw0] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
.venv/lib/python3.13/site-packages/urllib3/connection.py:204: in _new_conn
    sock = connection.create_connection(
.venv/lib/python3.13/site-packages/urllib3/util/connection.py:85: in create_connection
    raise err
.venv/lib/python3.13/site-packages/urllib3/util/connection.py:73: in create_connection
    sock.connect(sa)
E   ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:
.venv/lib/python3.13/site-packages/urllib3/connectionpool.py:787: in urlopen
    response = self._make_request(
.venv/lib/python3.13/site-packages/urllib3/connectionpool.py:493: in _make_request
    conn.request(
.venv/lib/python3.13/site-packages/urllib3/connection.py:500: in request
    self.endheaders()
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/http/client.py:1331: in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/http/client.py:1091: in _send_output
    self.send(msg)
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/http/client.py:1035: in send
    self.connect()
.venv/lib/python3.13/site-packages/urllib3/connection.py:331: in connect
    self.sock = self._new_conn()
.venv/lib/python3.13/site-packages/urllib3/connection.py:219: in _new_conn
    raise NewConnectionError(
E   urllib3.exceptions.NewConnectionError: HTTPConnection(host='127.0.0.1', port=9091): Failed to establish a new connection: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:
tests/sentry/tasks/seer/test_context_engine_index.py:281: in test_calls_seer_with_correct_org_and_repos
    index_repos(self.org.id)
.venv/lib/python3.13/site-packages/taskbroker_client/task.py:92: in __call__
    return self._func(*args, **kwargs)
src/sentry/tasks/seer/context_engine_index.py:259: in index_repos
    preferences_by_id = bulk_get_project_preferences(organization_id, list(project_map.keys()))
src/sentry/seer/autofix/utils.py:735: in bulk_get_project_preferences
    response = make_bulk_get_project_preferences_request(
src/sentry/seer/autofix/utils.py:258: in make_bulk_get_project_preferences_request
    return make_signed_seer_api_request(
.venv/lib/python3.13/site-packages/sentry_sdk/tracing_utils.py:916: in sync_wrapper
    result = f(*args, **kwargs)
src/sentry/seer/signed_seer_api.py:164: in make_signed_seer_api_request
    return connection_pool.urlopen(
.venv/lib/python3.13/site-packages/urllib3/connectionpool.py:871: in urlopen
    return self.urlopen(
.venv/lib/python3.13/site-packages/urllib3/connectionpool.py:871: in urlopen
    return self.urlopen(
.venv/lib/python3.13/site-packages/urllib3/connectionpool.py:871: in urlopen
    return self.urlopen(
.venv/lib/python3.13/site-packages/urllib3/connectionpool.py:841: in urlopen
... (4 more lines)

cursor · 2026-04-07T15:42:37Z

+            key = (repo["provider"], repo["owner"], repo["name"])
+            if key in org_repo_definitions:
+                repo_definition = org_repo_definitions[key]
+                repo_definition["project_ids"].append(project_id)


Repo languages lost during cross-project deduplication

Low Severity

When a repo already exists in org_repo_definitions, only project_ids is appended — the languages field is never backfilled. If the first project to register a repo uses seer preferences (where the repo isn't in that project's autofix code mappings), languages is set to [] via language_map.get(key, []). When a later project encounters the same repo from its autofix repos (which do have language data), the existing entry's empty languages is never updated, permanently losing that information.

Additional Locations (1)

src/sentry/tasks/seer/context_engine_index.py#L285-L286

^{Reviewed by Cursor Bugbot for commit e1ab127. Configure here.}

sentry · 2026-04-07T15:49:28Z

+                    "provider": repo["provider"],
+                    "owner": repo["owner"],
+                    "name": repo["name"],
+                    "external_id": repo["external_id"],


Bug: The code unsafely accesses keys on a raw dictionary from an API response, which will raise a KeyError if the response is malformed or missing expected keys.
_{Severity: HIGH}

Suggested Fix

Use the safe .get() method when accessing keys from the repo dictionary to prevent KeyError exceptions. For a more robust solution, validate the raw API response with a Pydantic model before processing the data to ensure the data structure is correct.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/sentry/tasks/seer/context_engine_index.py#L285 Potential issue: The function `index_repos` processes repository data fetched from the Seer API via `bulk_get_project_preferences()`. The code directly accesses dictionary keys like `repo["external_id"]`, `repo["provider"]`, `repo["owner"]`, and `repo["name"]` without using safe access methods like `.get()`. The API response is not validated against a schema. If the Seer API returns a malformed response object that is missing one of these required keys, the operation will fail with a `KeyError`. This will cause the `index_repos` background task to crash, preventing repository indexing for the affected organization.

sentry · 2026-04-07T16:02:08Z

+        language_map: dict[tuple[str, str, str], list[str]] = {}
+        for autofix_repo in autofix_repos:
+            key = (autofix_repo["provider"], autofix_repo["owner"], autofix_repo["name"])
+            language_map[key] = autofix_repo["languages"]


Bug: The index_repos task will crash with a KeyError if a repository configured via SEER_AUTOFIX_FORCE_USE_REPOS is missing the languages key.
_{Severity: MEDIUM}

Suggested Fix

Use the .get() method with a default value when accessing the languages key to prevent a KeyError. Change autofix_repo["languages"] to autofix_repo.get("languages", []). This will provide a safe fallback to an empty list if the key is not present in the repository configuration.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/sentry/tasks/seer/context_engine_index.py#L271 Potential issue: When the `SEER_AUTOFIX_FORCE_USE_REPOS` setting is used, for example in testing or staging environments, the `index_repos` task can fail. The code iterates through the configured repositories and directly accesses the `languages` key from each repository dictionary. However, unlike the standard code path, the logic for this setting does not ensure the `languages` key is present. If a repository is configured without this key, the task will raise a `KeyError` and fail, as this exception is not configured for retries. This will halt the repository indexing process in environments that use this override.

Mihir-Mavalankar · 2026-04-07T17:30:59Z

+
+    for project_id, project in project_map.items():
+        existing_pref = preferences_by_id.get(str(project_id))
+        project_pref_repos = existing_pref.get("repositories") or []


We should skip projects that don't have preferences setup. If a project does not have preferences then customers basically can't use Seer for that project.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit cf5c5a4. Configure here.}

cursor · 2026-04-07T18:01:02Z

+    )
+
+    if response.status >= 400:
+        raise SeerApiError("Seer request failed", response.status)


Missing early return when no repos are collected

Low Severity

The index_repos function makes a Seer API call even when org_repo_definitions is empty (e.g., when all projects lack preferences or have empty/None repository lists). Other similar tasks like build_service_map and index_org_project_knowledge include early returns for analogous "no data" scenarios (no nodes, no high-volume projects). Adding an early return when org_repo_definitions is empty would avoid unnecessary API calls, which can add up since this runs across many orgs.

^{Reviewed by Cursor Bugbot for commit cf5c5a4. Configure here.}

Schedule repo indexing job for context engine. This is behind a new "experimental" feature flag so we can see how this context works out on sentry seer explorer runs. Only runs index job on Sunday because we don't want to eat into GH API quotas and interfere with code review and autofix. Depends on: getsentry/seer#5594 --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

shruthilayaj and others added 2 commits April 2, 2026 14:42

feat: Add repo indexing job

ebc8938

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 2, 2026

vercel bot deployed to Preview April 2, 2026 19:02 View deployment

log

8ece1bf

sentry-warden bot reviewed Apr 2, 2026

View reviewed changes

Comment thread src/sentry/tasks/seer/context_engine_index.py Outdated

typing

b7850b5

vercel bot deployed to Preview April 2, 2026 19:08 View deployment

catch org does not exist

a5ed0a4

vercel bot deployed to Preview April 2, 2026 19:14 View deployment

fix failing test

1d53649

vercel bot deployed to Preview April 2, 2026 19:31 View deployment

shruthilayaj marked this pull request as ready for review April 2, 2026 20:04

shruthilayaj requested a review from a team as a code owner April 2, 2026 20:04

sentry bot reviewed Apr 2, 2026

View reviewed changes

Comment thread src/sentry/tasks/seer/context_engine_index.py Outdated

cursor bot reviewed Apr 2, 2026

View reviewed changes

Comment thread src/sentry/seer/signed_seer_api.py

Comment thread tests/sentry/tasks/seer/test_context_engine_index.py

Mihir-Mavalankar reviewed Apr 2, 2026

View reviewed changes

Comment thread src/sentry/tasks/seer/context_engine_index.py Outdated

shruthilayaj added 2 commits April 7, 2026 11:00

use project_pref_repos and autofix repos as fallback

3243fdb

comment

4abf5aa

vercel bot deployed to Preview April 7, 2026 15:03 View deployment

sentry bot reviewed Apr 7, 2026

View reviewed changes

Comment thread src/sentry/tasks/seer/context_engine_index.py

cursor bot reviewed Apr 7, 2026

View reviewed changes

tests

f8a59de

shruthilayaj requested a review from Mihir-Mavalankar April 7, 2026 15:25

vercel bot deployed to Preview April 7, 2026 15:28 View deployment

sentry bot reviewed Apr 7, 2026

View reviewed changes

Comment thread src/sentry/tasks/seer/context_engine_index.py Outdated

cursor bot reviewed Apr 7, 2026

View reviewed changes

Comment thread src/sentry/tasks/seer/context_engine_index.py

key error, viewer context

58b0a70

typing anotations

e1ab127

sentry-warden bot reviewed Apr 7, 2026

View reviewed changes

Comment thread src/sentry/tasks/seer/context_engine_index.py

vercel bot deployed to Preview April 7, 2026 15:37 View deployment

sentry bot reviewed Apr 7, 2026

View reviewed changes

Comment thread src/sentry/tasks/seer/context_engine_index.py

cursor bot reviewed Apr 7, 2026

View reviewed changes

test

f63942d

vercel bot deployed to Preview April 7, 2026 15:48 View deployment

sentry bot reviewed Apr 7, 2026

View reviewed changes

add retry for transient errors

f3b33d2

vercel bot deployed to Preview April 7, 2026 16:01 View deployment

sentry bot reviewed Apr 7, 2026

View reviewed changes

Mihir-Mavalankar reviewed Apr 7, 2026

View reviewed changes

skip projects that don't have seer pref

cf5c5a4

vercel bot deployed to Preview April 7, 2026 17:57 View deployment

cursor bot reviewed Apr 7, 2026

View reviewed changes

Mihir-Mavalankar approved these changes Apr 7, 2026

View reviewed changes

shruthilayaj merged commit d5c1cfe into master Apr 7, 2026
80 checks passed

shruthilayaj deleted the shruthi/feat/add-repo-indexing-job branch April 7, 2026 18:33

Uh oh!

Conversation

shruthilayaj commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 2, 2026

Backend Test Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 7, 2026

Choose a reason for hiding this comment

NoneType AttributeError when project has no preferences

Uh oh!

Mihir-Mavalankar Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Test Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Apr 7, 2026

Choose a reason for hiding this comment

Repo languages lost during cross-project deduplication

Uh oh!

sentry bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

sentry bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Mihir-Mavalankar Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 7, 2026

Choose a reason for hiding this comment

Missing early return when no repos are collected

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shruthilayaj commented Apr 2, 2026 •

edited

Loading

cursor bot left a comment •

edited

Loading

github-actions bot commented Apr 7, 2026 •

edited

Loading