-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Problem
The schema does not prevent inserting multiple task_run rows that represent the same logical run (same task same inputs same VCS state). Because task_run.id is a surrogate key and there is no UNIQUE constraint over the “run identity”, duplicates are allowed.
This breaks idempotency and makes queries like “give me the run for this input digest” ambiguous.
$$ mpact
Duplicate runs inflate metrics (success rate, run count, etc.).
Downstream relations (task_run_output, task_run_*_input) become ambiguous: multiple task_run.id can correspond to the same logical run.
Run these two inserts:
INSERT INTO task_run (vcs_id, task_id, total_input_digest, start_timestamp, stop_timestamp, result)
VALUES (5, 12, 'abc', now(), now(), 'success');
INSERT INTO task_run (vcs_id, task_id, total_input_digest, start_timestamp, stop_timestamp, result)
VALUES (5, 12, 'abc', now(), now(), 'success');
Expected behavior
The second insert should fail with a uniqueness violation, because it represents the same logical run identity.
Root cause
No uniqueness constraint/index on the logical identity of a run, typically:
(task_id, vcs_id, total_input_digest
Also: vcs_id is nullable, and Postgres UNIQUE treats NULL as “not equal”, so even adding UNIQUE(task_id, vcs_id, total_input_digest) won’t prevent duplicates where vcs_id IS NULL.
Clarification:
if reruns are valid then issue is that we lack a stable external execution identifier (idempotency key), so accidental duplicate inserts are indistinguishable from legitimate reruns.