Skip to content

improve cache feature#61

Merged
rfievet merged 1 commit intomainfrom
825-Robin--cache-copernicus-fetch
Apr 8, 2026
Merged

improve cache feature#61
rfievet merged 1 commit intomainfrom
825-Robin--cache-copernicus-fetch

Conversation

@rfievet
Copy link
Copy Markdown
Collaborator

@rfievet rfievet commented Apr 7, 2026

Product-level deduplication for Copernicus downloads

Problem

The old cache was keyed on exact request parameters (bbox, dates, etc.). Shifting the bbox by even 1 meter produced a different cache key and re-downloaded the same Copernicus tiles.

Solution

Deduplication now happens at the product level instead of the request level. The Copernicus product UUID is embedded directly in the filename:

s2/a8dd0899-7a3b-4e4b-9b3a-5e7f1234abcd__S2A_MSIL1C_20220101_R10m.zip

Before downloading, process_products() globs for {product_id}__* in the cache directory. If found, the download is skipped. The filesystem is the registry — no extra state files to manage.

Zip files are also validated on lookup (zipfile.testzip()) so truncated/corrupted downloads from interrupted connections are detected, cleaned up, and re-downloaded automatically.

What changed

  • common.py — added find_product_on_disk(), _is_valid_zip(), updated process_products() with dedup logic
  • s1.py / s2.py — filename format changed to {product_id}__{safe_name}.ext
  • tests/test_product_dedup.py — 16 tests covering dedup, corruption detection, cross-bbox scenarios

Embed Copernicus product UUID in filenames ({product_id}__{safe_name}.ext)
so that different queries returning the same tile share the download.

- Add find_product_on_disk() to detect already-downloaded products by UUID
- Add zip integrity check to catch corrupted/truncated downloads
- Update process_products() to skip downloads for existing products
- Update S1/S2 filename format to include product ID
- Add 16 tests covering dedup, corruption detection, and cross-bbox scenarios
@rfievet rfievet requested a review from gabrieltseng April 7, 2026 17:24
@rfievet rfievet marked this pull request as ready for review April 8, 2026 19:47
@rfievet rfievet merged commit 2582af2 into main Apr 8, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant