Skip to content

Design decision: should backfill ingestion publish to the change stream? #40

@maxine-at-forecast

Description

@maxine-at-forecast

Context

From adversarial review of v0.4.0b1 (W11).

Problem

Records ingested via the backfill path (ingestion/backfill.py) do not emit change events to the ChangeStream. Any client subscribed to subscribeChanges will miss records that arrive through backfill.

Considerations

Arguments for publishing backfill events:

  • Subscribers get a complete view of all data changes regardless of ingestion path
  • Simplifies client logic — no need for separate backfill awareness

Arguments against:

  • Backfill is historical data, not real-time changes — semantically different
  • Backfill can produce thousands of events in rapid succession, overwhelming subscriber queues and triggering backpressure disconnects
  • Clients that care about historical completeness should use query endpoints, not the live stream

Decision needed

  • Should backfill publish to the change stream?
  • If yes, should events be tagged with a source: "backfill" field so clients can filter?
  • If no, should this be documented explicitly in the subscribeChanges endpoint docs?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions