Skip to content

sendInteractions: validate that referenced datasets exist before recording analytics #41

@maxine-at-forecast

Description

@maxine-at-forecast

Context

From adversarial review of v0.4.0b1 (W7).

Problem

The sendInteractions endpoint validates that datasetUri is a syntactically valid AT-URI with the correct collection (science.alt.dataset.entry), but never checks that the referenced dataset actually exists in the entries table. Compare with publishLabel which does query_get_entry(pool, d_did, d_rkey) and returns 400 if not found.

Without this check, the analytics tables accumulate orphan events for nonexistent datasets, which could pollute analytics dashboards and waste storage.

Trade-offs

  • Adding an existence check means a DB query per interaction item (up to 100 per batch), which increases latency for a fire-and-forget endpoint
  • Could batch the existence checks with a single query_get_entries call for the whole batch instead of per-item lookups
  • Alternatively, could do a soft check (log a warning but still record) to avoid rejecting valid interactions for recently-deleted datasets

Acceptance criteria

  • Interactions referencing nonexistent datasets are either rejected or flagged
  • Performance impact is minimal (batch lookup preferred over per-item)
  • Tests cover both existing and nonexistent dataset URIs

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions