Skip to content

perf(preprod): Parallelize and deduplicate snapshot image fetches#111755

Merged
NicoHinderling merged 2 commits intomasterfrom
nicohinderling/perf/parallel-snapshot-image-fetch-16
Mar 30, 2026
Merged

perf(preprod): Parallelize and deduplicate snapshot image fetches#111755
NicoHinderling merged 2 commits intomasterfrom
nicohinderling/perf/parallel-snapshot-image-fetch-16

Conversation

@NicoHinderling
Copy link
Copy Markdown
Contributor

@NicoHinderling NicoHinderling commented Mar 27, 2026

Replaces sequential session.get() calls in compare_snapshots with a
ContextPropagatingThreadPoolExecutor (8 workers) that prefetches all unique
content hashes for each pixel batch concurrently. A per-batch dict cache
keyed by content hash deduplicates fetches — if the same blob appears as
head_hash for one pair and base_hash for another, it's only downloaded
once.

Previously, each image pair fetched its head and base images sequentially,
with no dedup across pairs. For a batch of 50 pairs sharing some hashes,
this could mean 100 individual blocking HTTP round-trips when only ~70
unique hashes exist.

The objectstore Session.get() is thread-safe (no mutable instance state,
urllib3 pool handles concurrency internally), so no changes to the
objectstore client are needed.

Closes EME-828

Fetch unique image hashes concurrently using a thread pool instead of
sequential HTTP calls. A per-batch cache keyed by content hash avoids
re-downloading the same blob when it appears across multiple pairs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 27, 2026
@linear-code
Copy link
Copy Markdown

linear-code bot commented Mar 27, 2026

@NicoHinderling NicoHinderling marked this pull request as ready for review March 27, 2026 20:22
@NicoHinderling NicoHinderling requested a review from a team as a code owner March 27, 2026 20:22
Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Unconsumed executor.map iterator cancels pending fetch futures
    • The executor.map result is now eagerly consumed with list(...), ensuring all hash-fetch futures run to completion and preventing missing cache entries.

Create PR

Or push these changes by commenting:

@cursor push 2c8ccbee25
Preview (2c8ccbee25)
diff --git a/src/sentry/preprod/snapshots/tasks.py b/src/sentry/preprod/snapshots/tasks.py
--- a/src/sentry/preprod/snapshots/tasks.py
+++ b/src/sentry/preprod/snapshots/tasks.py
@@ -375,7 +375,7 @@
 
                 # Fetch unique hashes in parallel; session.get() is thread-safe
                 with ContextPropagatingThreadPoolExecutor(max_workers=8) as executor:
-                    executor.map(_fetch_hash, unique_hashes)
+                    list(executor.map(_fetch_hash, unique_hashes))
 
                 for candidate in batch:
                     if candidate.head_hash in failed_hashes or candidate.base_hash in failed_hashes:

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Comment thread src/sentry/preprod/snapshots/tasks.py Outdated
…ation

The discarded iterator from executor.map() gets garbage-collected in
CPython, triggering its finally clause which cancels pending futures.
This silently skips fetches for hashes not yet picked up by workers,
causing KeyErrors when looking up results in fetch_cache.

Wrapping in list() forces full consumption before proceeding.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@NicoHinderling NicoHinderling merged commit eb411f8 into master Mar 30, 2026
63 of 64 checks passed
@NicoHinderling NicoHinderling deleted the nicohinderling/perf/parallel-snapshot-image-fetch-16 branch March 30, 2026 16:35
@github-actions github-actions bot locked and limited conversation to collaborators Apr 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants