Skip to content

Add archival sidecar for dataset mirroring #5

@maxine-at-forecast

Description

@maxine-at-forecast

Summary

Build an archival service that watches for new dataset records, downloads and mirrors the actual data shards, and publishes mirror records with provenance tracking. Prevents link rot for datasets stored on ephemeral or unreliable storage.

Architecture

Runs alongside the AppView (separate process or integrated worker):

  1. Watches for new dataset records via the indexed database
  2. Downloads shard data from the storage location (HTTP, S3, or PDS blobs)
  3. Stores mirrored data in a centrally-hosted archive (object storage)
  4. Publishes ac.foundation.dataset.mirror records linking originals to mirrors

Design considerations

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions