Add archival sidecar for dataset mirroring

## Summary

Build an archival service that watches for new dataset records, downloads and mirrors the actual data shards, and publishes mirror records with provenance tracking. Prevents link rot for datasets stored on ephemeral or unreliable storage.

## Architecture

Runs alongside the AppView (separate process or integrated worker):

1. Watches for new dataset records via the indexed database
2. Downloads shard data from the storage location (HTTP, S3, or PDS blobs)
3. Stores mirrored data in a centrally-hosted archive (object storage)
4. Publishes `ac.foundation.dataset.mirror` records linking originals to mirrors

## Design considerations

- Storage costs scale with dataset size — need a policy for what to mirror (e.g. only datasets under a size threshold, or only from verified publishers)
- Provenance tracking: mirrors should clearly indicate they're copies, not originals
- May require a new `ac.foundation.dataset.mirror` lexicon (see forecast-bio/atdata#33)
- Graceful handling of unreachable storage (retry, mark as unavailable)

- Ref: forecast-bio/atdata#33 (non-MVP: "archival sidecar with shard mirroring")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add archival sidecar for dataset mirroring #5

Summary

Architecture

Design considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add archival sidecar for dataset mirroring #5

Description

Summary

Architecture

Design considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions