Add Audacity eval harness and regression reporting by sehawq · Pull Request #96 · HKUDS/CLI-Anything

sehawq · 2026-03-17T16:23:53Z

Summary

Add a task-based evaluation harness for Audacity CLI with discovery, reporting, and baseline regression checks.
Introduce 4 stdlib-only eval tasks:
- Project roundtrip (create → save → open → info)
- Track + clip flow (add track, add clip, split)
- Effects registry validation (normalize/fade_in)
- WAV export (render_mix to file)
Add eval command to the CLI with --out, --baseline, --update-baseline, and --fail-on-regression.
Document eval usage and outputs in Audacity README.
Add pytest coverage for task discovery, report generation, and baseline regression detection.

New eval runner under audacity/agent-harness/cli_anything/audacity/eval/.
Task modules live in audacity/agent-harness/cli_anything/audacity/eval/tasks/.
Reports:
- eval_report.json (machine-readable)
- eval_report.md (human-readable)
- artifacts/ for generated outputs
Baseline regression rules:
- Pass → Fail on any task is regression
- Overall success_rate decrease is regression

Eval tasks are Audacity-only (other harnesses not yet wired).
Baseline comparison is MVP-level (status + success_rate only, no metric thresholds).
No CI integration in this PR (manual execution).
SoX-backed tests require a local SoX binary on PATH for full E2E coverage.

Add Audacity eval harness

1cdc4e2

sehawq force-pushed the codex/audacity-eval-harness branch from e9a5fae to 1cdc4e2 Compare March 17, 2026 16:25

sehawq requested a review from yuh-yang March 17, 2026 16:53