axiom /ˈak.si.əm/ — from Greek axioma ("that which is thought worthy"), from axios ("worthy"). A statement accepted as true without proof, serving as a starting point for further reasoning.
In mathematics, axioms are the foundational truths from which all theorems are derived. You don't prove an axiom — you build on it. Everything that follows must be consistent with it, or the system collapses.
That's what this pipeline does. You state the premise — the feature you want. Axiom treats it as a foundational constraint and derives everything from it: the design, the contracts, the implementation, the tests, the review. Every stage must be consistent with the premise, or it doesn't pass the gate. No hand-waving. No "it looks about right." If the proof contradicts the axiom, the proof is wrong.
The most rigorous autonomous development pipeline for Claude Code. Contract-enforced, test-first, self-correcting.
One command. Eight stages. A pull request built from constraints — not vibes.
Opus derives the plan, challenges its own design choices, and reviews the result through three independent specialist agents. Sonnet implements in parallel against machine-enforceable contracts. Tests are generated before implementation. Failures are classified and corrected with surgical precision. The pipeline learns from every run.
/axiom "add stripe webhook handler"Most AI coding tools generate code and hope it works. Axiom enforces correctness at every boundary.
| Capability | Axiom | Cursor / Copilot / Devin / Others |
|---|---|---|
| Contract enforcement | Per-unit contracts with exact signatures, return shapes, side effects, blast radius restrictions. Machine-enforced, not advisory. | No formal contracts. Plans are natural language guidance. |
| Tests before implementation | ATDD: executable acceptance tests generated from EARS-formatted requirements before agents write code. Agents implement against failing tests. | Tests generated after implementation (biased toward confirming what was built). |
| Failure classification | Failures typed as SYNTAX/LOGIC/ARCH/AMBIGUOUS with per-type correction strategies and budgets. SYNTAX gets cheap self-correction; ARCH triggers contract rewriting. | All failures treated the same: retry with error context. |
| Independent review | Three parallel specialist agents (Security, Code Quality, Acceptance) that cannot see the design intent — prevents anchoring bias. | Single reviewer with full context, or no independent review. |
| Structured deliberation | RALPLAN-DR: mandatory antithesis (steelman the alternative), tradeoff tensions, Architecture Decision Records. Architect and Critic agents challenge every design choice. | Plan-then-execute without structured challenge. |
| Cross-run learning | Project memory records codebase patterns, successful corrections, and reviewer feedback. Run N+1 is smarter than run N. | Sessions are ephemeral or cloud-persisted but not learnable. |
| Stage rollback | Git tags at every stage boundary. --rollback-to-stage 2 to try a different decomposition without restarting. |
Start from scratch on failure, or hope the retry works. |
| Requirements notation | EARS (Easy Approach to Requirements Syntax) — structured patterns that force completeness and are directly testable. | Free-text requirements prone to ambiguity. |
| Reusable playbooks | .axiom/playbooks/ encode domain knowledge (procedures, constraints, forbidden actions) for repeatable task types. |
Start from scratch each time, or rely on prompt engineering. |
| Worktree isolation | Per-feature worktree pools with per-unit sub-worktrees. Atomic slot claiming. Concurrent features without conflicts. | Most tools share a single working directory. |
Prerequisites: Claude Code installed, git, and gh CLI authenticated.
# Install the skill
mkdir -p ~/.claude/skills/axiom
cp SKILL.md ~/.claude/skills/axiom/SKILL.md
# Or clone and symlink (edits stay in sync)
git clone https://github.com/Guipetris/axiom.git
ln -s "$(pwd)/axiom/SKILL.md" ~/.claude/skills/axiom/SKILL.mdThen in any project:
/axiom "add user authentication with JWT"What happens next:
- Brainstorming explores your idea — clarifying questions, approaches, design spec with automated review
- Opus scores the spec using EARS notation, asks targeted refinement questions, extracts structured design
- Haiku inspects the repo state (clean tree, branch protection, conflicts, worktree claim)
- Opus decomposes into units, authors per-unit contracts, runs RALPLAN-DR deliberation (Architect challenges, Critic validates, ADR recorded)
- Acceptance tests generated from EARS criteria — tests exist before a single line of implementation
- Parallel agents implement each unit against its contract + failing tests, with per-unit model routing (Haiku/Sonnet/Opus by complexity)
- Iterative validation — up to 5 fix-recheck cycles before escalating to the classified correction system
- Three independent reviewers (Security, Code Quality, Acceptance) evaluate without seeing the design conversation
- Haiku commits and opens a PR with the Architecture Decision Record in the body
You interact during the design phase. Everything after that is autonomous.
Axiom is a single Claude Code skill (SKILL.md) that orchestrates an end-to-end feature pipeline. It replaces the manual workflow of: brainstorm → plan → implement → test → review → commit — with a gated, self-correcting autonomous process governed by constraints and contracts.
| Stage | What happens | Model |
|---|---|---|
| Pre-stage | Establish the Premise. Explore context, clarify requirements, propose approaches, write & review a design spec | Opus |
| Stage 0 | Refine the Premise. Score clarity, EARS-format requirements, extract structured design with project memory context | Opus |
| Stage 1 | Inspect the System. Check git state, branch protection, detect conflicts, claim worktree. Optional MCP context engine for large codebases | Haiku |
| Stage 2 | Derive the Axioms. RALPLAN-DR deliberation (Architect + Critic challenge design), decompose into units, author per-unit contracts with path-scoped instructions, generate PRD, record ADR | Opus |
| Stage 2.5 | Generate the Tests. ATDD: create executable acceptance tests from EARS criteria. Tests must fail (RED). Agents implement against them | Sonnet |
| Stage 3 | Execute the Inference. Parallel agents with per-unit model routing, each implementing against one contract + failing tests in an isolated worktree. Inter-agent signal protocol for ambiguities | Haiku/Sonnet/Opus |
| Stage 4 | Verify the Proof. Iterative validation cycling (max 5 fix-recheck rounds), update PRD per-story status, optional browser-based verification via Playwright | Haiku |
| Stage 5 | Test for Contradictions. Three parallel independent specialist reviewers (Security, Code Quality, Acceptance) who cannot see the design intent | Opus x 3 |
| Stage 6 | Commit the Result. PRD completion gate, conventional commit, PR with ADR in body, stage snapshot cleanup | Haiku |
Each stage must pass its gate before the next begins. Failures enter a classified correction chain with per-type budgets and cross-run learning.
# Describe what you want — full brainstorming + pipeline
/axiom "add stripe webhook handler"
# From a GitHub issue — ticket context seeds brainstorming
/axiom --ticket GH-142
# From an existing design spec — validates and refines before pipeline
/axiom --spec docs/specs/2026-03-15-webhooks-design.md
# With a reusable playbook — domain knowledge seeds the design
/axiom --playbook api-endpoint "add user preferences endpoint"
# Plan only — design + contracts, no implementation
/axiom --plan-only "add billing integration"
# Review only — validate an existing branch against its contracts
/axiom --review-only --branch feature/billing
# Roll back to a stage and re-run with different approach
/axiom --rollback-to-stage 2
# Resume an interrupted run
/axiom --resume ax-a3f2c1d9- You want a complete feature implemented end-to-end from a brief description
- The feature touches multiple files and needs coordinated design
- You want Opus reviewing every architectural decision, not just the final PR
- You want contract-enforced implementation — agents can't deviate from the plan
- You want parallel development with git worktree isolation
- You have a GitHub issue and want it turned into a PR automatically
- You need a quick single-file fix — just ask Claude directly
- You want to explore options without committing to implementation — use
/brainstormingalone - You only want a plan without execution — ask for a plan
- The task is fully specified and trivial — direct implementation is faster
- Your project has no git repository and you don't want one
Most AI coding workflows generate code and hope for the best. Even sophisticated tools like Cursor, Copilot Agent, and Devin lack formal contracts between their planning and implementation phases. The result: implementation drift, unverified assumptions, and "tests that pass because they test what was built, not what was specified."
Axiom applies five constraints that no other tool enforces together:
Contract enforcement. Each implementation unit gets an Opus-authored contract specifying exact function signatures, return shapes, side effects, and restricted files. Agents implement against these contracts — not a vague description. Violations are caught at the contract gate.
Tests before code. Acceptance tests are generated from EARS-formatted requirements before implementation begins. Agents implement against failing tests, not just contract text. This eliminates the "tests confirm what was built" bias.
Separation of design and review. The model that designed the architecture (Opus) is not the same agent reviewing its implementation. Three independent specialist reviewers (Security, Quality, Acceptance) evaluate the code without seeing the design conversation — preventing anchoring bias.
Structured deliberation. Every design choice goes through a RALPLAN-DR challenge: an independent Architect steelmans the alternative, a Critic validates principles, and an Architecture Decision Record documents why the chosen approach was selected over alternatives.
Classified self-correction. Failures are typed (SYNTAX, LOGIC, ARCH, AMBIGUOUS) and routed to the appropriate fix strategy with per-unit budgets. The pipeline learns from corrections across runs via project memory.
User: "add stripe webhook handler"
│
▼
Brainstorming (Opus)
│ Explore codebase, clarify requirements, propose approaches
│ Write design spec → automated review → user approval
│
▼
Stage 0 — Refine the Premise (Opus)
│ Load project memory (cross-run learnings)
│ EARS-format requirements, score clarity, extract design
│ [snapshot]
│
▼
Stage 1 — Inspect the System (Haiku)
│ Git status, branch protection, conflict scan
│ Claim worktree slot, optional MCP context engine
│ [snapshot]
│
▼
Stage 2 — Derive the Axioms (Opus)
│ RALPLAN-DR deliberation (Architect challenges, Critic validates)
│ Decompose → per-unit contracts + path-scoped instructions
│ Generate PRD, record ADR
│ [snapshot]
│
▼
Stage 2.5 — Generate Tests (Sonnet)
│ ATDD: create acceptance tests from EARS criteria
│ Tests MUST fail (RED phase)
│
▼
Stage 3 — Execute the Inference (Haiku/Sonnet/Opus × N)
│ Per-unit model routing by complexity
│ Each agent: contract + failing tests + isolated worktree
│ Inter-agent signal protocol for ambiguities
│ [snapshot]
│
▼
Stage 4 — Verify the Proof (Haiku)
│ Iterative validation cycling (max 5 rounds)
│ Optional browser-based verification (Playwright)
│ Update PRD per-story status
│ [snapshot]
│
▼
Stage 5 — Test for Contradictions (Opus × 3)
│ Three parallel independent reviewers:
│ Security │ Code Quality │ Acceptance
│ Reviewers cannot see design intent
│ [snapshot]
│
▼
Stage 6 — Commit the Result (Haiku)
│ PRD completion gate, conventional commit
│ PR with ADR in body, worktree cleanup
│ Update project memory with learnings
│
▼
Pull request.
| Failure Type | Trigger | Resolution |
|---|---|---|
SYNTAX |
Parse/compile error | Sonnet self-corrects (max 2 retries per unit) |
LOGIC |
Test failure | Haiku diagnoses → Sonnet re-implements |
ARCH |
Contract violation | Opus revises contract → Sonnet retries |
AMBIGUOUS |
Unclear failure | Opus arbitrates contract vs implementation |
Per-unit budgets prevent infinite loops. When budgets exhaust, the pipeline produces a clear escalation report.
All state lives under .axiom/ in your project root:
.axiom/
├── config.json # Delivery config (PR vs direct merge)
├── project-memory.json # Cross-run learnings (committed to repo)
├── playbooks/ # Reusable task templates
│ └── api-endpoint.md
├── instructions/ # Path-scoped conventions
│ └── api-conventions.md
├── worktrees/
│ ├── pool.json # Slot pool for concurrent features
│ └── {slot}/ # Git worktrees, one per active feature
└── state/
├── {run-id}/
│ ├── design.json # Goals, constraints, EARS criteria
│ ├── adr.json # Architecture Decision Record
│ ├── interview-spec.md # Crystallized feature spec
│ ├── feature-plan.json # Unit decomposition
│ ├── contracts/{unit-id}.json # Per-unit contracts with conventions
│ ├── prd.json # PRD with per-story pass/fail tracking
│ ├── corrections.json # Retry budget tracking
│ ├── correction-loop.json # Session-persistent correction state
│ └── validation-report.json # Stage 4 → Stage 5 handoff
└── {run-id}-snapshots/ # Stage boundary snapshots for rollback
├── stage-0/
├── stage-2/
└── ...
No external tools, no MCP servers, no plugin dependencies required. Only git and gh CLI. Optional enhancements: MCP context engine for large codebases, OMC notifications, Playwright for browser verification.
On first run, Axiom asks three setup questions and writes answers to .axiom/config.json:
| Setting | Options | Default |
|---|---|---|
| Delivery mode | PR to target branch, direct merge | PR |
| Target branch | Auto-detected (main/master) | main |
| Checkpoint mode | Fully autonomous, pause between stages | Autonomous |
Config is project-local. .axiom/state/ and .axiom/worktrees/ are added to .gitignore automatically.
Enable checkpoint_mode to pause between stages for review:
Stage N completes → summary shown → you choose: proceed / pause / cancel
Useful for inspecting contracts before implementation, or reviewing the proof before the contradiction check.
touch .axiom/CANCELThe pipeline checks for this sentinel at each stage boundary, releases the worktree slot, archives the branch, and cleans up.
- Context window pressure. Large codebases can push context limits. Mitigated by optional MCP context engine integration (Augment, Zilliz) and targeted exploration, but very large monorepos may still need scope constraints.
- Test infrastructure benefits quality. Stage 2.5 generates acceptance tests from contracts, and Stage 4 can generate baseline tests if none exist. But projects with mature test suites get stronger verification.
- Single-session pipeline. Each run lives in one Claude Code session. If the session crashes,
--resumepicks up from the last completed stage. Correction loops persist across session boundaries viacorrection-loop.json. - No CI integration. The pipeline runs locally. It creates the PR — your CI takes over from there.
- Monolithic architecture. The pipeline is a single SKILL.md. Partial usage is supported via
--plan-onlyand--review-only, but individual stages cannot be mixed with other tools' outputs (yet).
Want to see what the pipeline actually produces? Check out the example artifacts from a Stripe webhook handler run — includes design.json, per-unit contracts, and an explanation of what each stage outputs.
axiom/
├── SKILL.md # The skill definition — this IS the product
├── README.md # You are here
├── LICENSE # Apache 2.0
└── docs/
├── architecture.md
├── faq.md
└── examples/ # Example pipeline artifacts
See CONTRIBUTING.md for guidelines on reporting bugs, suggesting features, and submitting improvements.
Apache 2.0 — see LICENSE.
From premise to pull request.
If this project helped you, consider giving it a star. It helps others find it.

