repo-harness

A CLI tool that scans repositories, infers their structure, and generates agent harness artifacts — AGENTS.md files, wrapper scripts, and tracking state — so AI coding agents can work effectively in any codebase.

How it works

repo-harness operates in two phases:

Bootstrap (one-shot): Scans a repo, optionally runs LLM-assisted architecture inference, asks a few clarifying questions, and writes initial harness artifacts.

Maintain (recurring): Re-scans the repo, detects drift from the stored model, and updates artifacts to stay in sync. Failures from agent runs can be recorded and used to improve documentation over time.

Install

go install github.com/pmiddleton/repo-harness/cmd/repo-harness@latest

Or build from source:

git clone https://github.com/pmiddleton/repo-harness.git
cd repo-harness
go build -o repo-harness ./cmd/repo-harness

Quick start

# Scan a repo and generate all artifacts (deterministic, no API key needed)
repo-harness bootstrap --skip-inference --non-interactive /path/to/repo

# Full bootstrap with Vertex AI (Google Cloud auth)
gcloud auth application-default login
repo-harness bootstrap --vertex --gcp-project my-project --gcp-region us-east5 /path/to/repo

# Full bootstrap with Anthropic API key
export ANTHROPIC_API_KEY=sk-ant-...
repo-harness bootstrap /path/to/repo

# Just scan — see what the tool detects without writing anything
repo-harness scan /path/to/repo
repo-harness scan --json /path/to/repo

Commands

Command	Description
`scan [path]`	Deterministic scan — outputs the repo's structural model
`infer [path]`	LLM-assisted inference — architecture summary, command matrix, risks, constraints
`bootstrap [path]`	Full pipeline: scan → infer → interview → generate → validate → score → remediate
`refresh [path]`	Re-scan and update artifacts if drift is detected
`check [path]`	CI-friendly drift check — exits non-zero if artifacts are stale
`learn [path]`	Record a failure from stdin JSON for future remediation
`score [path]`	Score harness effectiveness (0-100) across 7 dimensions
`mcp`	Start MCP server on stdio (Claude Code plugin)

Bootstrap flags

--skip-inference       Skip LLM inference (deterministic output only, no auth needed)
--non-interactive      Skip the clarifying interview, use defaults
--skip-validation      Skip post-generation validation
--min-score int        Minimum acceptable score (0-100); artifacts are removed if below (default 60)
--vertex               Use Vertex AI with Google Cloud auth
--gcp-project string   GCP project ID (required with --vertex, or set GOOGLE_CLOUD_PROJECT / ANTHROPIC_VERTEX_PROJECT_ID)
--gcp-region string    GCP region (required with --vertex, or set GOOGLE_CLOUD_REGION / CLOUD_ML_REGION)
--api-key string       Anthropic API key (defaults to ANTHROPIC_API_KEY env)
--model string         Claude model for inference (defaults to claude-sonnet-4-5)

Score flags

--json       Output structured JSON
--details    Show per-item breakdown under each dimension

What it detects

The scanner is deterministic — no LLM calls, just file existence checks and light parsing.

Monorepo tools: pnpm, yarn workspaces, npm workspaces, nx, turborepo, cargo workspaces, go workspaces, bazel, pants, buck2, lerna

Languages: TypeScript, JavaScript, Go, Rust, Python, Ruby, Java, Kotlin, Zig, Elixir, Swift, C#, PHP

Build systems: npm/yarn/pnpm/bun, go, cargo, make, gradle, maven, zig, mix, bundler, pip, cmake, swift

Test frameworks: jest, vitest, mocha, pytest, rspec, minitest, go test, cargo test, zig test, playwright, cypress

CI providers: GitHub Actions, GitLab CI, CircleCI, Jenkins, Buildkite, Travis CI

Conventions: eslint, prettier, biome, golangci-lint, clippy, rubocop, ruff, flake8, dep-cruiser, editorconfig, and more

Entry points: Go cmd/ pattern, Node.js bin/main, Dockerfile, docker-compose, serverless/SAM/CDK, Terraform, Pulumi, Helm, Kubernetes manifests

What it generates

AGENTS.md                         # Root: commands, constraints, repo map, sources of truth
packages/*/AGENTS.md              # Scoped: per-package commands and notes (monorepos)
scripts/agent-test                # Wrapper: runs tests with deterministic output
scripts/agent-lint                # Wrapper: runs linter
scripts/agent-build               # Wrapper: runs build
agent/feature_status.json         # Tracking: feature/task status across sessions
agent/progress.md                 # Tracking: append-only progress log
.repo-harness/repo_model.json     # Internal: stored scan for drift detection
.repo-harness/interview_answers.json  # Internal: cached interview answers

LLM inference

The inference engine enhances generated artifacts with:

Architecture summary — systems, domains, data flow described for agent consumption
Command matrix — verified build/test/lint/format/dev commands per scope
Risk list — ambiguities, conflicts, and potential issues with severity ratings
Inferred constraints — architectural rules with enforceability analysis

The inference call sends the structural model JSON plus a curated selection of representative files (README, build configs, CI workflows, key source modules) to Claude via tool_use for structured output. Total context is capped at ~20k tokens.

Authentication

Two authentication methods are supported:

Vertex AI (Google Cloud) — uses Application Default Credentials from gcloud auth:

gcloud auth login --update-adc
repo-harness infer --vertex --gcp-project my-project --gcp-region us-east5 /path/to/repo

# Or use environment variables (also picks up ANTHROPIC_VERTEX_PROJECT_ID and CLOUD_ML_REGION)
export GOOGLE_CLOUD_PROJECT=my-project
export GOOGLE_CLOUD_REGION=us-east5
repo-harness infer --vertex /path/to/repo

If you already have ANTHROPIC_VERTEX_PROJECT_ID and CLOUD_ML_REGION set (e.g. for Claude Code), just pass --vertex:

repo-harness bootstrap --vertex /path/to/repo

Anthropic API — uses an API key directly:

export ANTHROPIC_API_KEY=sk-ant-...
repo-harness infer /path/to/repo

# Or pass inline
repo-harness infer --api-key sk-ant-... /path/to/repo

Examples

# Standalone inference with JSON output
repo-harness infer --json /path/to/repo

# Override model
repo-harness infer --model claude-sonnet-4-5 /path/to/repo

# Full bootstrap with Vertex AI
repo-harness bootstrap --vertex --gcp-project my-project --gcp-region us-east5 /path/to/repo

Harness scoring

The score command evaluates generated artifacts across 7 dimensions and produces a composite 0-100 score:

Dimension	Max	What it measures
Command coverage	25	build/test/lint/format commands present across all scopes (root + packages)
Script coverage	15	agent-{test,lint,build} scripts: missing (0), stub (half), functional (full)
Reference quality	15	Config files (CI, conventions, build, test) referenced in AGENTS.md
Freshness	20	No drift = full score, -4 per drift item, floor at 0
Constraint enforcement	10	% of detected conventions that are mechanically enforced
Monorepo coverage	10	% of packages with scoped AGENTS.md (non-monorepo = full score)
Ambiguity resolution	5	% of detected ambiguities with matching interview answers

# Standalone scoring
repo-harness score /path/to/repo
repo-harness score --json /path/to/repo
repo-harness score --details /path/to/repo

Scoring also runs automatically at the end of bootstrap. If the score falls below the minimum threshold (default: 60), all generated artifacts are removed and the command exits with an error. Use --min-score to adjust:

# Accept lower-quality output
repo-harness bootstrap --min-score 40 /path/to/repo

# Require high quality
repo-harness bootstrap --min-score 80 /path/to/repo

Self-healing remediation

When bootstrap scores below the minimum threshold (default: 60) and LLM inference is enabled, the tool automatically attempts to fix deficiencies before giving up. The remediation loop:

Parses the score breakdown to identify actionable gaps (missing commands, stub scripts, unreferenced configs, unresolved ambiguities, uncovered packages)
Sends all deficiencies to Claude in a single API call with structured tools
Applies the suggested fixes to disk (rewrites scripts, updates AGENTS.md, creates scoped docs, merges interview answers)
Re-scores and repeats if still below threshold (max 2 attempts)

Non-remediable dimensions (Freshness and Constraint enforcement) are skipped — Freshness is handled by re-scanning, and constraint enforcement requires actual tool config files.

Remediation is skipped when --skip-inference is set (no LLM access available).

# Bootstrap with remediation (default behavior when LLM is available)
repo-harness bootstrap --min-score 60 /path/to/repo

# Skip remediation (deterministic only)
repo-harness bootstrap --skip-inference /path/to/repo

Drift detection

After an initial bootstrap, refresh and check compare the current repo state against the stored model:

# Re-scan and update artifacts
repo-harness refresh /path/to/repo

# CI check — exits non-zero on drift
repo-harness check /path/to/repo

Detected drift categories: added/removed packages, changed commands, CI changes, toolchain changes.

Failure learning

When an agent run fails, pipe the failure details to learn so the harness can improve over time:

echo '{"command":"go test ./...", "exit_code":1, "stderr":"missing import", "context":"ran during feature X"}' \
  | repo-harness learn /path/to/repo

Failures are appended to .repo-harness/failure_log.jsonl and can inform future doc updates.

MCP Server (Claude Code plugin)

repo-harness can run as an MCP server, allowing Claude Code to call it as a tool during conversations.

Starting the server

repo-harness mcp

This starts an MCP server on stdio, speaking the Model Context Protocol.

Claude Code configuration

Add to your Claude Code settings (.claude/settings.json or project settings):

{
  "mcpServers": {
    "repo-harness": {
      "command": "repo-harness",
      "args": ["mcp"]
    }
  }
}

Available tools

Tool	Description	Input
`repo_scan`	Scan a repository and return its structural model	`{path}`
`repo_score`	Score harness effectiveness (0-100)	`{path}`
`repo_bootstrap`	Full bootstrap pipeline (always non-interactive)	`{path, min_score?, skip_inference?}`
`repo_check`	Check for drift between artifacts and repo	`{path}`
`repo_remediate`	Fix scoring deficiencies using LLM remediation	`{path}`

All tools return JSON. Authentication uses environment variables (same as CLI: ANTHROPIC_API_KEY or Google Cloud ADC for Vertex AI).

Project structure

cmd/repo-harness/          CLI entrypoint (cobra) + MCP server
internal/
  model/                   Core data types (RepoModel, FeatureStatus, etc.)
  scanner/                 Deterministic repo scanner (7 detector modules)
  inference/               LLM-assisted architecture inference (Claude tool_use)
  interview/               Interactive clarifying interview
  generator/               Artifact generation (AGENTS.md, scripts, tracking files)
  validator/               Post-generation validation (commands, refs, secrets, constraints)
  scorer/                  Harness effectiveness scoring (7 dimensions, 0-100)
  remediation/             LLM-powered self-healing (fix deficiencies, re-score)
  maintainer/              Drift detection and failure learning

Design principles

Deterministic first: The scanner and generator work without any LLM. Inference is additive.
Pointers over prose: AGENTS.md links to source-of-truth files (CI configs, linter configs) rather than duplicating their content.
Executable instructions: If a command can't be wrapped in a script and validated, it will rot. Wrapper scripts make commands testable.
Structured state for continuity: JSON for machine-readable status, Markdown for human-readable progress.
Mechanical guardrails: Where possible, constraints map to enforceable checks rather than prose-only rules.
Don't overwrite: Existing scoped AGENTS.md files are never overwritten. The tool respects human edits.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
cmd/repo-harness		cmd/repo-harness
internal		internal
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

repo-harness

How it works

Install

Quick start

Commands

Bootstrap flags

Score flags

What it detects

What it generates

LLM inference

Authentication

Examples

Harness scoring

Self-healing remediation

Drift detection

Failure learning

MCP Server (Claude Code plugin)

Starting the server

Claude Code configuration

Available tools

Project structure

Design principles

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

repo-harness

How it works

Install

Quick start

Commands

Bootstrap flags

Score flags

What it detects

What it generates

LLM inference

Authentication

Examples

Harness scoring

Self-healing remediation

Drift detection

Failure learning

MCP Server (Claude Code plugin)

Starting the server

Claude Code configuration

Available tools

Project structure

Design principles

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages