Skip to content

feat: cross-run regression detection #3

@jpleva91

Description

@jpleva91

Problem

When a task that previously solved starts failing, there's no automatic alert. Regressions hide in the noise.

Acceptance Criteria

  • Compare current run results against bench_history.json
  • Flag tasks that solved in previous run but failed in current run
  • Add "Regressions" section to sentinel report with task names and what changed

Hints

  • evolve/history.py:detect_trends() already has improving/regressing logic
  • Wire into sentinel_analyze.py after the per-task breakdown

Generated by /forge cascade

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent:claimedAgent dispatched — do not re-dispatchenhancementNew feature or requestsprintCurrent sprint priority

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions