CLI for agent evaluation. Capture trajectories, run trials with pass@k metrics, and score with polyglot graders (TypeScript, Python, any language).
cli typescript acp grader ai-agents bun jsonl llm-evaluation agent-evaluation agent-client-protocol trajectory-capture eval-harness pass-at-k
-
Updated
Jan 21, 2026 - TypeScript