Evaluate AI agents with Unix-style pipeline commands. Schema-driven adapters for any CLI agent, trajectory capture, pass@k metrics, and multi-run comparison.
cli typescript grader ai-agents bun jsonl llm-evaluation agent-evaluation agent-client-protocol unix-pipeline agent-comparison trajectory-capture eval-harness pass-at-k headless-adapter
-
Updated
Jan 22, 2026 - TypeScript