@plaited/acp-harness

CLI tool for capturing agent trajectories from ACP-compatible agents. Execute prompts, capture full trajectories (tools, thoughts, plans), and output structured JSONL for downstream scoring. Available as both a CLI tool and as installable skills for AI coding agents.

CLI Tool

Use these tools directly via the CLI without installation:

# Using built-in headless adapter (recommended - no extra install needed)
export ANTHROPIC_API_KEY=sk-...
bunx @plaited/acp-harness capture prompts.jsonl \
  bunx @plaited/acp-harness headless --schema ./schemas/claude-headless.json \
  -o results.jsonl

# Or with an external ACP adapter
bunx @plaited/acp-harness capture prompts.jsonl bunx claude-code-acp -o results.jsonl

Prerequisite: Set your API key. The headless command works with any CLI agent that supports JSON output - no adapter installation required:

export ANTHROPIC_API_KEY=sk-...   # For Claude
export GEMINI_API_KEY=...         # For Gemini

Pre-built schemas are available in .claude/skills/acp-adapters/schemas/ for Claude and Gemini.

Commands

Command	Description
`capture <prompts> <cmd>`	Trajectory capture (full JSONL)
`trials <prompts> <cmd>`	Multi-run with pass@k metrics
`summarize <results>`	Derive compact views from results
`calibrate <results>`	Sample failures for review
`validate-refs <prompts>`	Check reference solutions
`balance <prompts>`	Analyze test set coverage
`schemas [name]`	Export JSON schemas
`headless --schema <path>`	Schema-driven adapter for any CLI agent
`adapter:check <cmd>`	Validate adapter ACP compliance

Examples

# Capture trajectories using headless adapter (recommended)
bunx @plaited/acp-harness capture prompts.jsonl \
  bunx @plaited/acp-harness headless --schema ./schemas/claude-headless.json \
  -o results.jsonl

# Run trials for pass@k analysis
bunx @plaited/acp-harness trials prompts.jsonl \
  bunx @plaited/acp-harness headless --schema ./schemas/claude-headless.json \
  -k 5 --grader ./grader.ts

# Summarize results
bunx @plaited/acp-harness summarize results.jsonl -o summary.jsonl

# Export schemas
bunx @plaited/acp-harness schemas CaptureResult --json

# Validate adapter compliance
bunx @plaited/acp-harness adapter:check \
  bunx @plaited/acp-harness headless --schema ./schemas/claude-headless.json

Skills for AI Agents

Install skills for use with AI coding agents:

curl -fsSL https://raw.githubusercontent.com/plaited/skills-installer/main/install.sh | bash -s -- --agent <agent-name> --project acp-harness

Replace <agent-name> with your agent: claude, cursor, copilot, opencode, amp, goose, factory

Update skills:

curl -fsSL https://raw.githubusercontent.com/plaited/skills-installer/main/install.sh | bash -s -- update --agent <agent-name> --project acp-harness

Available Skills

ACP Harness

CLI tool for capturing agent trajectories, optimized for TypeScript/JavaScript projects using Bun.

Commands:

Command	Description
`capture`	Execute prompts and capture full trajectories
`trials`	Multi-run trials with pass@k/pass^k metrics
`summarize`	Derive compact views from trajectory results
`calibrate`	Sample failures for grader calibration
`validate-refs`	Validate reference solutions against graders
`balance`	Analyze test set coverage distribution
`schemas`	Export Zod schemas as JSON Schema

Use cases:

Capturing trajectories for downstream evaluation (Braintrust, custom scorers)
Generating training data (SFT/DPO) with full context
Building regression test fixtures for agent behavior
Comparing agent responses across configurations

ACP Adapters

Discover, create, and validate ACP adapters for agent integration.

Commands:

Command	Description
`headless`	Schema-driven adapter for any CLI agent
`adapter:scaffold`	Generate new adapter project with handlers
`adapter:check`	Validate ACP protocol compliance

Use cases:

Wrapping headless CLI agents with schema-driven adapter
Finding existing adapters for your agent
Building custom ACP adapters from scratch
Validating adapter implementations

Input Format

{"id":"test-001","input":"Create a primary button","hint":"should contain <button>","metadata":{"category":"ui"}}
{"id":"test-002","input":["Create a component","Now add tests"],"metadata":{"category":"multi-turn"}}

Field	Required	Description
`id`	Yes	Unique identifier
`input`	Yes	Single prompt (string) or conversation turns (string[])
`hint`	No	Grader context - what to look for
`reference`	No	Reference solution (for validate-refs)
`metadata`	No	Tags, category, difficulty for filtering

Output Format

The harness outputs full trajectory JSONL (CaptureResult schema):

{
  "id": "test-001",
  "input": "Create a primary button",
  "output": "Here's a button component...",
  "hint": "should contain <button>",
  "trajectory": [...],
  "metadata": {"category": "ui", "agent": "bunx claude-code-acp", "trajectoryRichness": "full", "turnCount": 1},
  "timing": {"start": 1234567890, "end": 1234567900, "sessionCreation": 234, "total": 10},
  "toolErrors": false,
  "score": {"pass": true, "score": 1.0, "reasoning": "Contains hint"}
}

Key fields:

toolErrors: Boolean indicating if any tool calls failed
score: Grader result (only if --grader provided)
trajectory: Full execution trace (thoughts, messages, tool calls, plans)
metadata.trajectoryRichness: "full" | "messages-only" | "minimal"
timing.sessionCreation: Time to initialize session (ms)
timing.total: End-to-end duration (ms)

Graders

Graders score agent outputs. The harness supports two types:

TypeScript/JavaScript Graders

Export a grade function:

import type { Grader } from '@plaited/acp-harness/schemas'

export const grade: Grader = async ({ input, output, hint, trajectory }) => {
  const pass = output.toLowerCase().includes(hint?.toLowerCase() ?? '')
  return {
    pass,
    score: pass ? 1.0 : 0.0,
    reasoning: pass ? 'Contains hint content' : 'Missing hint content'
  }
}

acp-harness capture prompts.jsonl bunx claude-code-acp --grader ./grader.ts

Polyglot Graders (Python, etc.)

Any executable script using stdin/stdout JSON protocol:

#!/usr/bin/env python3
import json
import sys

data = json.load(sys.stdin)
output = data["output"].lower()
hint = (data.get("hint") or "").lower()

pass_result = hint in output if hint else True
print(json.dumps({
    "pass": pass_result,
    "score": 1.0 if pass_result else 0.0,
    "reasoning": "Contains hint" if pass_result else "Missing hint"
}))

chmod +x grader.py
acp-harness capture prompts.jsonl bunx claude-code-acp --grader ./grader.py

Protocol:

Input (stdin): {"input": "...", "output": "...", "hint": "...", "trajectory": [...]}
Output (stdout): {"pass": true, "score": 1.0, "reasoning": "..."}

Downstream Integration

# Filter failures
cat results.jsonl | jq 'select(.score.pass == false)'

# Extract tool usage patterns
cat results.jsonl | jq '.trajectory[] | select(.type == "tool_call") | .name'

# Use with your scoring pipeline
cat results.jsonl | your-scoring-script.ts

Development

bun install          # Install dependencies
bun run check        # Type check + lint + format
bun test             # Run unit tests

# Run integration tests in Docker (requires API keys)
ANTHROPIC_API_KEY=sk-... docker compose -f docker-compose.test.yml run --rm acp-test

Requirements

Runtime: Bun >= 1.2.9
ACP Adapter: Built-in headless command (recommended) or external adapter
API Key: ANTHROPIC_API_KEY for Claude, GEMINI_API_KEY for Gemini

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.claude		.claude
.gemini		.gemini
.github		.github
.plaited/rules		.plaited/rules
bin		bin
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Dockerfile.test		Dockerfile.test
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
bun.lock		bun.lock
docker-compose.test.yml		docker-compose.test.yml
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

@plaited/acp-harness

CLI Tool

Commands

Examples

Skills for AI Agents

Available Skills

ACP Harness

ACP Adapters

Input Format

Output Format

Graders

TypeScript/JavaScript Graders

Polyglot Graders (Python, etc.)

Downstream Integration

Development

Requirements

License

About

Uh oh!

Releases 17

Uh oh!

Contributors 2

Uh oh!

Languages

License

plaited/acp-harness

Folders and files

Latest commit

History

Repository files navigation

@plaited/acp-harness

CLI Tool

Commands

Examples

Skills for AI Agents

Available Skills

ACP Harness

ACP Adapters

Input Format

Output Format

Graders

TypeScript/JavaScript Graders

Polyglot Graders (Python, etc.)

Downstream Integration

Development

Requirements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 17

Uh oh!

Contributors 2

Uh oh!

Languages