perf(preprod): Parallelize and deduplicate snapshot image fetches#111755
Merged
NicoHinderling merged 2 commits intomasterfrom Mar 30, 2026
Merged
Conversation
Fetch unique image hashes concurrently using a thread pool instead of sequential HTTP calls. A per-batch cache keyed by content hash avoids re-downloading the same blob when it appears across multiple pairs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Unconsumed
executor.mapiterator cancels pending fetch futures- The
executor.mapresult is now eagerly consumed withlist(...), ensuring all hash-fetch futures run to completion and preventing missing cache entries.
- The
Or push these changes by commenting:
@cursor push 2c8ccbee25
Preview (2c8ccbee25)
diff --git a/src/sentry/preprod/snapshots/tasks.py b/src/sentry/preprod/snapshots/tasks.py
--- a/src/sentry/preprod/snapshots/tasks.py
+++ b/src/sentry/preprod/snapshots/tasks.py
@@ -375,7 +375,7 @@
# Fetch unique hashes in parallel; session.get() is thread-safe
with ContextPropagatingThreadPoolExecutor(max_workers=8) as executor:
- executor.map(_fetch_hash, unique_hashes)
+ list(executor.map(_fetch_hash, unique_hashes))
for candidate in batch:
if candidate.head_hash in failed_hashes or candidate.base_hash in failed_hashes:This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
…ation The discarded iterator from executor.map() gets garbage-collected in CPython, triggering its finally clause which cancels pending futures. This silently skips fetches for hashes not yet picked up by workers, causing KeyErrors when looking up results in fetch_cache. Wrapping in list() forces full consumption before proceeding. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
rbro112
approved these changes
Mar 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Replaces sequential
session.get()calls incompare_snapshotswith aContextPropagatingThreadPoolExecutor(8 workers) that prefetches all uniquecontent hashes for each pixel batch concurrently. A per-batch
dictcachekeyed by content hash deduplicates fetches — if the same blob appears as
head_hashfor one pair andbase_hashfor another, it's only downloadedonce.
Previously, each image pair fetched its head and base images sequentially,
with no dedup across pairs. For a batch of 50 pairs sharing some hashes,
this could mean 100 individual blocking HTTP round-trips when only ~70
unique hashes exist.
The objectstore
Session.get()is thread-safe (no mutable instance state,urllib3 pool handles concurrency internally), so no changes to the
objectstore client are needed.
Closes EME-828