A CLI tool that scans repositories, infers their structure, and generates agent harness artifacts — AGENTS.md files, wrapper scripts, and tracking state — so AI coding agents can work effectively in any codebase.
repo-harness operates in two phases:
Bootstrap (one-shot): Scans a repo, optionally runs LLM-assisted architecture inference, asks a few clarifying questions, and writes initial harness artifacts.
Maintain (recurring): Re-scans the repo, detects drift from the stored model, and updates artifacts to stay in sync. Failures from agent runs can be recorded and used to improve documentation over time.
go install github.com/pmiddleton/repo-harness/cmd/repo-harness@latest
Or build from source:
git clone https://github.com/pmiddleton/repo-harness.git
cd repo-harness
go build -o repo-harness ./cmd/repo-harness
# Scan a repo and generate all artifacts (deterministic, no API key needed)
repo-harness bootstrap --skip-inference --non-interactive /path/to/repo
# Full bootstrap with Vertex AI (Google Cloud auth)
gcloud auth application-default login
repo-harness bootstrap --vertex --gcp-project my-project --gcp-region us-east5 /path/to/repo
# Full bootstrap with Anthropic API key
export ANTHROPIC_API_KEY=sk-ant-...
repo-harness bootstrap /path/to/repo
# Just scan — see what the tool detects without writing anything
repo-harness scan /path/to/repo
repo-harness scan --json /path/to/repo| Command | Description |
|---|---|
scan [path] |
Deterministic scan — outputs the repo's structural model |
infer [path] |
LLM-assisted inference — architecture summary, command matrix, risks, constraints |
bootstrap [path] |
Full pipeline: scan → infer → interview → generate → validate → score → remediate |
refresh [path] |
Re-scan and update artifacts if drift is detected |
check [path] |
CI-friendly drift check — exits non-zero if artifacts are stale |
learn [path] |
Record a failure from stdin JSON for future remediation |
score [path] |
Score harness effectiveness (0-100) across 7 dimensions |
mcp |
Start MCP server on stdio (Claude Code plugin) |
--skip-inference Skip LLM inference (deterministic output only, no auth needed)
--non-interactive Skip the clarifying interview, use defaults
--skip-validation Skip post-generation validation
--min-score int Minimum acceptable score (0-100); artifacts are removed if below (default 60)
--vertex Use Vertex AI with Google Cloud auth
--gcp-project string GCP project ID (required with --vertex, or set GOOGLE_CLOUD_PROJECT / ANTHROPIC_VERTEX_PROJECT_ID)
--gcp-region string GCP region (required with --vertex, or set GOOGLE_CLOUD_REGION / CLOUD_ML_REGION)
--api-key string Anthropic API key (defaults to ANTHROPIC_API_KEY env)
--model string Claude model for inference (defaults to claude-sonnet-4-5)
--json Output structured JSON
--details Show per-item breakdown under each dimension
The scanner is deterministic — no LLM calls, just file existence checks and light parsing.
Monorepo tools: pnpm, yarn workspaces, npm workspaces, nx, turborepo, cargo workspaces, go workspaces, bazel, pants, buck2, lerna
Languages: TypeScript, JavaScript, Go, Rust, Python, Ruby, Java, Kotlin, Zig, Elixir, Swift, C#, PHP
Build systems: npm/yarn/pnpm/bun, go, cargo, make, gradle, maven, zig, mix, bundler, pip, cmake, swift
Test frameworks: jest, vitest, mocha, pytest, rspec, minitest, go test, cargo test, zig test, playwright, cypress
CI providers: GitHub Actions, GitLab CI, CircleCI, Jenkins, Buildkite, Travis CI
Conventions: eslint, prettier, biome, golangci-lint, clippy, rubocop, ruff, flake8, dep-cruiser, editorconfig, and more
Entry points: Go cmd/ pattern, Node.js bin/main, Dockerfile, docker-compose, serverless/SAM/CDK, Terraform, Pulumi, Helm, Kubernetes manifests
AGENTS.md # Root: commands, constraints, repo map, sources of truth
packages/*/AGENTS.md # Scoped: per-package commands and notes (monorepos)
scripts/agent-test # Wrapper: runs tests with deterministic output
scripts/agent-lint # Wrapper: runs linter
scripts/agent-build # Wrapper: runs build
agent/feature_status.json # Tracking: feature/task status across sessions
agent/progress.md # Tracking: append-only progress log
.repo-harness/repo_model.json # Internal: stored scan for drift detection
.repo-harness/interview_answers.json # Internal: cached interview answers
The inference engine enhances generated artifacts with:
- Architecture summary — systems, domains, data flow described for agent consumption
- Command matrix — verified build/test/lint/format/dev commands per scope
- Risk list — ambiguities, conflicts, and potential issues with severity ratings
- Inferred constraints — architectural rules with enforceability analysis
The inference call sends the structural model JSON plus a curated selection of representative files (README, build configs, CI workflows, key source modules) to Claude via tool_use for structured output. Total context is capped at ~20k tokens.
Two authentication methods are supported:
Vertex AI (Google Cloud) — uses Application Default Credentials from gcloud auth:
gcloud auth login --update-adc
repo-harness infer --vertex --gcp-project my-project --gcp-region us-east5 /path/to/repo
# Or use environment variables (also picks up ANTHROPIC_VERTEX_PROJECT_ID and CLOUD_ML_REGION)
export GOOGLE_CLOUD_PROJECT=my-project
export GOOGLE_CLOUD_REGION=us-east5
repo-harness infer --vertex /path/to/repoIf you already have ANTHROPIC_VERTEX_PROJECT_ID and CLOUD_ML_REGION set (e.g. for Claude Code), just pass --vertex:
repo-harness bootstrap --vertex /path/to/repoAnthropic API — uses an API key directly:
export ANTHROPIC_API_KEY=sk-ant-...
repo-harness infer /path/to/repo
# Or pass inline
repo-harness infer --api-key sk-ant-... /path/to/repo# Standalone inference with JSON output
repo-harness infer --json /path/to/repo
# Override model
repo-harness infer --model claude-sonnet-4-5 /path/to/repo
# Full bootstrap with Vertex AI
repo-harness bootstrap --vertex --gcp-project my-project --gcp-region us-east5 /path/to/repoThe score command evaluates generated artifacts across 7 dimensions and produces a composite 0-100 score:
| Dimension | Max | What it measures |
|---|---|---|
| Command coverage | 25 | build/test/lint/format commands present across all scopes (root + packages) |
| Script coverage | 15 | agent-{test,lint,build} scripts: missing (0), stub (half), functional (full) |
| Reference quality | 15 | Config files (CI, conventions, build, test) referenced in AGENTS.md |
| Freshness | 20 | No drift = full score, -4 per drift item, floor at 0 |
| Constraint enforcement | 10 | % of detected conventions that are mechanically enforced |
| Monorepo coverage | 10 | % of packages with scoped AGENTS.md (non-monorepo = full score) |
| Ambiguity resolution | 5 | % of detected ambiguities with matching interview answers |
# Standalone scoring
repo-harness score /path/to/repo
repo-harness score --json /path/to/repo
repo-harness score --details /path/to/repoScoring also runs automatically at the end of bootstrap. If the score falls below the minimum threshold (default: 60), all generated artifacts are removed and the command exits with an error. Use --min-score to adjust:
# Accept lower-quality output
repo-harness bootstrap --min-score 40 /path/to/repo
# Require high quality
repo-harness bootstrap --min-score 80 /path/to/repoWhen bootstrap scores below the minimum threshold (default: 60) and LLM inference is enabled, the tool automatically attempts to fix deficiencies before giving up. The remediation loop:
- Parses the score breakdown to identify actionable gaps (missing commands, stub scripts, unreferenced configs, unresolved ambiguities, uncovered packages)
- Sends all deficiencies to Claude in a single API call with structured tools
- Applies the suggested fixes to disk (rewrites scripts, updates AGENTS.md, creates scoped docs, merges interview answers)
- Re-scores and repeats if still below threshold (max 2 attempts)
Non-remediable dimensions (Freshness and Constraint enforcement) are skipped — Freshness is handled by re-scanning, and constraint enforcement requires actual tool config files.
Remediation is skipped when --skip-inference is set (no LLM access available).
# Bootstrap with remediation (default behavior when LLM is available)
repo-harness bootstrap --min-score 60 /path/to/repo
# Skip remediation (deterministic only)
repo-harness bootstrap --skip-inference /path/to/repoAfter an initial bootstrap, refresh and check compare the current repo state against the stored model:
# Re-scan and update artifacts
repo-harness refresh /path/to/repo
# CI check — exits non-zero on drift
repo-harness check /path/to/repoDetected drift categories: added/removed packages, changed commands, CI changes, toolchain changes.
When an agent run fails, pipe the failure details to learn so the harness can improve over time:
echo '{"command":"go test ./...", "exit_code":1, "stderr":"missing import", "context":"ran during feature X"}' \
| repo-harness learn /path/to/repoFailures are appended to .repo-harness/failure_log.jsonl and can inform future doc updates.
repo-harness can run as an MCP server, allowing Claude Code to call it as a tool during conversations.
repo-harness mcpThis starts an MCP server on stdio, speaking the Model Context Protocol.
Add to your Claude Code settings (.claude/settings.json or project settings):
{
"mcpServers": {
"repo-harness": {
"command": "repo-harness",
"args": ["mcp"]
}
}
}| Tool | Description | Input |
|---|---|---|
repo_scan |
Scan a repository and return its structural model | {path} |
repo_score |
Score harness effectiveness (0-100) | {path} |
repo_bootstrap |
Full bootstrap pipeline (always non-interactive) | {path, min_score?, skip_inference?} |
repo_check |
Check for drift between artifacts and repo | {path} |
repo_remediate |
Fix scoring deficiencies using LLM remediation | {path} |
All tools return JSON. Authentication uses environment variables (same as CLI: ANTHROPIC_API_KEY or Google Cloud ADC for Vertex AI).
cmd/repo-harness/ CLI entrypoint (cobra) + MCP server
internal/
model/ Core data types (RepoModel, FeatureStatus, etc.)
scanner/ Deterministic repo scanner (7 detector modules)
inference/ LLM-assisted architecture inference (Claude tool_use)
interview/ Interactive clarifying interview
generator/ Artifact generation (AGENTS.md, scripts, tracking files)
validator/ Post-generation validation (commands, refs, secrets, constraints)
scorer/ Harness effectiveness scoring (7 dimensions, 0-100)
remediation/ LLM-powered self-healing (fix deficiencies, re-score)
maintainer/ Drift detection and failure learning
- Deterministic first: The scanner and generator work without any LLM. Inference is additive.
- Pointers over prose: AGENTS.md links to source-of-truth files (CI configs, linter configs) rather than duplicating their content.
- Executable instructions: If a command can't be wrapped in a script and validated, it will rot. Wrapper scripts make commands testable.
- Structured state for continuity: JSON for machine-readable status, Markdown for human-readable progress.
- Mechanical guardrails: Where possible, constraints map to enforceable checks rather than prose-only rules.
- Don't overwrite: Existing scoped
AGENTS.mdfiles are never overwritten. The tool respects human edits.
MIT