bestwork-agent

Harness engineering for Claude Code. Your agent types one line — the harness catches everything else.

The problem

AI coding agents hallucinate, loop, miss requirements, and ship security flaws. 45% of AI-generated code contains vulnerabilities (Veracode). Vibe-coded apps fail because nobody validated the idea before building.

bestwork-agent adds the quality gates that professional engineering teams use — without changing how you work.

Benchmark: harness ON vs OFF

═══════════════════════════════════
  HARNESS EFFECTIVENESS BENCHMARK
═══════════════════════════════════

  Scenarios:      13
  Accuracy:       100.0%

  Harness ON:
    Catch rate:   100% (10/10)
    False pos:    0

  Harness OFF (vanilla):
    Catch rate:   0% (0/10)

  Categories:
    hallucination    3/4 caught
    platform         4/4 caught
    deprecated       1/1 caught
    security         1/1 caught
═══════════════════════════════════

Run it yourself: npm run benchmark

What the harness does

Gate	When	What it catches
Grounding	PreToolUse (Edit/Write)	Editing files the agent hasn't read
Scope lock	PreToolUse	Edits outside the locked directory
Strict mode	PreToolUse	`rm -rf`, `git push --force`
Type check	PostToolUse (Edit/Write)	TypeScript errors after every change
Review	On demand / PostToolUse	Fake imports, hallucinated methods, platform mismatch, deprecated APIs, type safety bypass
Requirement check	PostToolUse (Edit/Write)	Unmet requirements from clarify/validate sessions
Validate	Before building	Evidence-based go/no-go — is this feature worth building?

All gates run automatically. You just type your prompt.

Install

Option 1: Claude Code Plugin (recommended)

/plugin marketplace add https://github.com/rlaope/bestwork-agent
/plugin install bestwork-agent

Option 2: npm

npm install -g bestwork-agent
bestwork install

How it works

You: "Refactor the auth module to support OAuth2"
                    │
                    ▼
            ┌──────────────┐
            │ Smart Gateway │  classifies intent, allocates agents
            └──────┬───────┘
                   │
         ┌─────────┼──────────┐
         ▼         ▼          ▼
     PreToolUse  Execution  PostToolUse
     ┌─────────┐           ┌──────────┐
     │Grounding│           │Type check│
     │Scope    │           │Review    │
     │Strict   │           │Req check │
     └─────────┘           └──────────┘

The gateway analyzes your prompt and picks the right scale:

Solo — simple fix, rename, format (1 agent)
Pair — two related sub-tasks (2 agents + critic)
Trio — multiple sub-tasks with quality gates (tech + PM + critic per task)
Hierarchy — large scope, architecture decisions (CTO → Lead → Senior → Junior)
Squad — localized feature, fast consensus (flat, parallel)

For non-solo work, the gateway shows you the plan and asks to confirm.

49 Domain Specialists

bestwork agents    # full catalog

25 Tech: backend, frontend, fullstack, infra, database, API, mobile, testing, security, performance, devops, data, ML, CLI, realtime, auth, migration, config, agent-engineer, plugin, accessibility, i18n, graphql, monorepo, writer

10 PM: product, API, platform, data, infra, migration, security, growth, compliance, DX

14 Critic: performance, scalability, security, consistency, reliability, testing, hallucination, DX, type safety, cost, accessibility, devsecops, i18n, agent

Agent prompts live in prompts/ — edit without rebuilding.

22 Skills

Natural language or slash command — the gateway routes automatically.

Skill	What it does
`validate`	Evidence-based feature validation before building
`clarify`	Targeted requirement questions before execution
`review`	Hallucination and platform mismatch scan
`trio`	Parallel execution with quality gates
`plan`	Scope analysis and team recommendation
`delegate`	Autonomous execution without confirmation
`deliver`	Persistent completion — retry until done
`waterfall`	Sequential staged processing with gates
`blitz`	Maximum parallelism burst
`doctor`	Deploy config vs code integrity check
`pipeline-run`	Queue and auto-process multiple GitHub issues
`superthinking`	1000-iteration thought simulation

And 10 more: agents, changelog, docs, health, install, meetings, onboard, sessions, status, update.

vs. other tools

	bestwork-agent	CrewAI	MetaGPT	Vanilla Claude Code
Target	Claude Code users	General Python	General Python	Everyone
Integration	Native hooks (zero config)	Separate runtime	Separate runtime	Built-in
Hallucination catch	100% (10/10 benchmark)	No built-in	No built-in	0%
Overhead	~0 (shell hooks)	3x tokens	2-5x tokens	0
Feature validation	Built-in (validate skill)	None	None	None
Requirement tracking	Auto (clarify → PostToolUse)	Manual	Manual	None

Harness Controls

./scope src/auth/       Lock edits to directory
./unlock                Remove lock
./strict                Block rm -rf, force read-before-edit
./relax                 Disable strict
./tdd add user auth     Test-driven development flow
./review                Hallucination scan
./validate              Is this feature worth building?
./clarify               Requirement deep-check before execution

Observability

bestwork                  # TUI dashboard
bestwork sessions         # Session list
bestwork heatmap          # 365-day activity grid
bestwork loops            # Loop detection
bestwork replay <id>      # Session playback
bestwork effectiveness    # Prompt efficiency trend

Notifications

./discord <webhook_url>
./slack <webhook_url>

Rich notifications per prompt: summary, git diff, review results, session health. Color-coded green/yellow/red.

Security

Everything is local. No data leaves your machine. See SECURITY.md.

Contributing

See CONTRIBUTING.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
.analysis		.analysis
.claude-plugin		.claude-plugin
.github		.github
benchmarks		benchmarks
dist		dist
docs		docs
hooks		hooks
mcp		mcp
prompts		prompts
schemas		schemas
scripts		scripts
skills		skills
src		src
templates		templates
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.ja.md		README.ja.md
README.ko.md		README.ko.md
README.md		README.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bestwork-agent

The problem

Benchmark: harness ON vs OFF

What the harness does

Install

Option 1: Claude Code Plugin (recommended)

Option 2: npm

How it works

49 Domain Specialists

22 Skills

vs. other tools

Harness Controls

Observability

Notifications

Security

Contributing

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bestwork-agent

The problem

Benchmark: harness ON vs OFF

What the harness does

Install

Option 1: Claude Code Plugin (recommended)

Option 2: npm

How it works

49 Domain Specialists

22 Skills

vs. other tools

Harness Controls

Observability

Notifications

Security

Contributing

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages