eval-harness

Here is 1 public repository matching this topic...

plaited / agent-eval-harness

Evaluate AI agents with Unix-style pipeline commands. Schema-driven adapters for any CLI agent, trajectory capture, pass@k metrics, and multi-run comparison.

cli typescript grader ai-agents bun jsonl llm-evaluation agent-evaluation agent-client-protocol unix-pipeline agent-comparison trajectory-capture eval-harness pass-at-k headless-adapter

Updated Jan 22, 2026
TypeScript

Improve this page

Add a description, image, and links to the eval-harness topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the eval-harness topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval-harness

Here is 1 public repository matching this topic...

plaited / agent-eval-harness

Improve this page

Add this topic to your repo