Skip to content

Issue: DB allows duplicate task_run rows for the same logical run (missing idempotency constraint) #688

@syedasmarhaider

Description

@syedasmarhaider

Problem

The schema does not prevent inserting multiple task_run rows that represent the same logical run (same task same inputs same VCS state). Because task_run.id is a surrogate key and there is no UNIQUE constraint over the “run identity”, duplicates are allowed.
This breaks idempotency and makes queries like “give me the run for this input digest” ambiguous.

$$ mpact
Duplicate runs inflate metrics (success rate, run count, etc.).
Downstream relations (task_run_output, task_run_*_input) become ambiguous: multiple task_run.id can correspond to the same logical run.

Run these two inserts:

INSERT INTO task_run (vcs_id, task_id, total_input_digest, start_timestamp, stop_timestamp, result)
VALUES (5, 12, 'abc', now(), now(), 'success');

INSERT INTO task_run (vcs_id, task_id, total_input_digest, start_timestamp, stop_timestamp, result)
VALUES (5, 12, 'abc', now(), now(), 'success');

Expected behavior

The second insert should fail with a uniqueness violation, because it represents the same logical run identity.

Root cause

No uniqueness constraint/index on the logical identity of a run, typically:
(task_id, vcs_id, total_input_digest

Also: vcs_id is nullable, and Postgres UNIQUE treats NULL as “not equal”, so even adding UNIQUE(task_id, vcs_id, total_input_digest) won’t prevent duplicates where vcs_id IS NULL.

Clarification:

if reruns are valid then issue is that we lack a stable external execution identifier (idempotency key), so accidental duplicate inserts are indistinguishable from legitimate reruns.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions