"To grant an entity absolute autonomy, you must first bind it with absolute constraints."
Ouro Loop is an open-source framework that gives AI coding agents (Claude Code, Cursor, Aider, Codex) a structured autonomous loop with runtime-enforced guardrails. It implements bounded autonomy — the developer defines absolute constraints (DANGER ZONES, NEVER DO rules, IRON LAWS) using the BOUND system, then the agent loops autonomously through Build → Verify → Self-Fix cycles. When verification fails, the agent doesn't ask for help — it consults its remediation playbook, reverts, tries a different approach, and reports what it did. Constraints are enforced at the runtime level through Claude Code Hooks (exit 2 hard-block), not by relying on the agent's "good behavior." Inspired by Karpathy's autoresearch, extended from ML to general software engineering. Zero dependencies, pure Python 3.10+.
| What it does | Let AI agents code overnight without breaking things |
| How it works | Define boundaries (BOUND) → AI loops: Build → Verify → Self-Fix |
| What you get | program.md (method) + framework.py (runtime) + 4 hooks (enforcement) |
| Requirements | Python 3.10+, Git, any AI agent. Zero dependencies. |
This is for you if:
- You want to let an AI agent run autonomously for hours (overnight builds, long refactors)
- Your project has files that must never be touched without review (payments, auth, consensus)
- You've experienced AI agents hallucinating paths, breaking constraints, or getting stuck in loops
- You want auditable, structured autonomous development — not "vibe and pray"
This is NOT for you if:
- You're building a quick prototype or hackathon project (BOUND setup overhead isn't worth it)
- You're writing single-file scripts (the methodology overhead exceeds the benefit)
- You want real-time interactive coding (Ouro Loop is designed for "set it and let it run")
In the era of "vibe coding," unbound AI agents hallucinate file paths, break production constraints, regress architectural patterns, and get stuck in infinite fix-break loops. The current solution — pausing to ask humans — negates the promise of autonomous coding.
Ouro Loop formally introduces The Event Horizon: the developer establishes absolute constraints (Iron Laws, Danger Zones). The AI Agent is granted full autonomy to continuously build, verify, and self-correct within that boundary — never crossing into catastrophic failure. When something breaks inside the boundary, the agent doesn't ask for help — it consults its remediation playbook, reverts, tries a different approach, and reports what it did.
The repo is deliberately kept small. Only three files matter:
program.md— the methodology instructions. Defines the six-stage loop (BOUND, MAP, PLAN, BUILD, VERIFY, LOOP) and the autonomous remediation rules. Point your agent here and let it go. This file is iterated on by the human.framework.py— lightweight runtime for state tracking, verification gates, and logging. The agent uses this CLI to check its own work. This file can be extended by the agent.prepare.py— project scanning and initialization. Creates the.ouro/state directory. Not modified.
graph TD
classDef default fill:#1e1e1e,stroke:#4a4a4a,stroke-width:1px,color:#d4d4d4;
classDef highlight fill:#2d3748,stroke:#63b3ed,stroke-width:2px,color:#fff;
classDef boundary fill:#4a1c40,stroke:#fc8181,stroke-width:2px,color:#fff;
BOUND(("Step 0<br/><b>BOUND</b><br/>(Set Constraints)")):::boundary
MAP["Step 1<br/><b>MAP</b><br/>(Understand)"]
PLAN["Step 2<br/><b>PLAN</b><br/>(Decompose)"]
BUILD["Step 3<br/><b>BUILD</b><br/>(Create)"]
VERIFY{"Step 4<br/><b>VERIFY</b><br/>(Judge)"}
REMEDIATE["<b>REMEDIATE</b><br/>(Autonomous Fix)"]:::highlight
NEXT(("Next<br/>Phase"))
BOUND --> MAP
MAP --> PLAN
PLAN --> BUILD
BUILD --> VERIFY
VERIFY -- "FAIL (inside BOUND)" --> REMEDIATE
REMEDIATE --> BUILD
VERIFY -- "PASS" --> NEXT
NEXT -.-> BUILD
When verification fails, the agent does not ask for human help. It consults modules/remediation.md, decides on a fix, reverts/retries, and loops — so long as it hasn't breached the outer edge of the BOUND. Read The Manifesto for the deep-dive on Precision Autonomy.
Requirements: Python 3.10+, Git, any AI coding agent.
git clone https://github.com/VictorVVedtion/ouro-loop.git ~/.ouro-loopcd /path/to/your/project
python ~/.ouro-loop/prepare.py scan .This shows you what Ouro Loop sees:
============================================================
Ouro Loop — Project Scan
============================================================
Project: my-payment-service
Types: Python
Files: 42 Lines: 3,200
Languages:
Python 35 files ###############
SQL 7 files #######
CLAUDE.md: Not found
BOUND: Not defined
Tests: Found
CI: Found
Recommendations:
1. Define BOUND (DANGER ZONES, NEVER DO, IRON LAWS) before building
2. Create CLAUDE.md with BOUND section
============================================================
python ~/.ouro-loop/prepare.py init . # Creates .ouro/ state directory
python ~/.ouro-loop/prepare.py template claude . # Generates CLAUDE.md templateEdit CLAUDE.md to define your project's actual boundaries. Here's what a real BOUND looks like:
## BOUND
### DANGER ZONES
- `src/payments/calculator.py` — financial calculations, penny-level precision
- `migrations/` — database schema, irreversible in production
### NEVER DO
- Never use float for monetary values — always Decimal
- Never delete or rename migration files
- Never commit without running the test suite
### IRON LAWS
- All monetary values use Decimal with 2-digit precision
- All API responses include request_id field
- Test coverage for payment module never drops below 90%See examples/ for complete BOUND definitions and real session logs showing the methodology in action:
- Blockchain L1 — BOUND for consensus/p2p, plus a real session log where the agent tested 5 hypotheses, autonomously remediated 4 failures, and found a root cause that was architectural (HTTP routing), not code-level.
- Consumer Product — BOUND for audio/sync, plus a real session log where ROOT_CAUSE gate caught a lazy fix and pushed the agent toward a genuinely better solution.
- Financial System — DANGER ZONES around payments, Decimal precision IRON LAWS.
- ML Research — autoresearch-style single-metric optimization reframed as BOUND.
Spin up Claude Code, Cursor, Aider, or any AI coding agent in your project and prompt:
Read program.md from ~/.ouro-loop/ and the CLAUDE.md in this project.
Let's start the Ouro Loop for this task: [describe your task].
The program.md file is essentially a lightweight "skill" — it tells the agent how to work within your boundaries. The agent will read your BOUND, MAP the problem, PLAN phases, BUILD incrementally, VERIFY through multi-layer gates, and LOOP feedback back into BOUND.
framework.py is the state machine and verification gatekeeper. The agent calls these during the loop:
python framework.py scan . # Scan project structure
python framework.py status . # Where are we in the loop?
python framework.py bound-check . # Are DANGER ZONES / IRON LAWS defined?
python framework.py verify . # Run all verification gates
python framework.py log PASS --path . --notes "phase 1 done" # Record result
python framework.py advance . # Move to next phaseVerify runs multi-layer gate checks and shows exactly what's healthy and what's not:
==================================================
Ouro Loop — Verification
==================================================
Layer 1 — Gates:
[+] EXIST CLAUDE.md exists
[+] RELEVANCE 3 files changed
[!] ROOT_CAUSE Hot files: src/payments/calc.py
[+] MOMENTUM 5 recent commits
Layer 2 — Self-Assessment:
[+] bound_compliance BOUND section found
[+] tests_exist Test files found
Overall: PASS
==================================================
When verification fails, the agent autonomously remediates and reports what it did:
[REMEDIATED] gate=ROOT_CAUSE action=revert_and_retry
was: editing src/payments/calc.py for the 4th time (same TypeError)
did: reverted to commit a1b2c3d, re-analyzed from scratch
now: trying middleware pattern instead
bound: no DANGER ZONE touched, no IRON LAW affected
Results are logged to ouro-results.tsv for full auditability:
phase verdict bound_violations notes
1/3 PASS 0 transactions endpoint + tests
2/3 RETRY 0 ROOT_CAUSE warning, fixing
2/3 PASS 0 fixed after retry
3/3 PASS 0 validation complete
Ouro Loop includes 4 hooks that enforce BOUND constraints at the tool level. When installed, the agent physically cannot edit DANGER ZONE files without user approval — no matter what instructions it receives.
# Install hooks in your project
cp ~/.ouro-loop/hooks/settings.json.template .claude/settings.json
# Edit paths in settings.json to point to your ouro-loop installation| Hook | Event | What it does |
|---|---|---|
bound-guard.sh |
PreToolUse:Edit/Write | Parses CLAUDE.md DANGER ZONES, blocks edits to protected files (exit 2) |
root-cause-tracker.sh |
PostToolUse:Edit/Write | Tracks per-file edit count, warns at 3+, strongly warns at 5+ |
drift-detector.sh |
PreToolUse:Edit/Write | Warns when touching 5+ directories (scope drift) |
recall-gate.sh |
PreCompact | Re-injects BOUND into context before compression |
Verified in live Claude Code sessions:
framework.py(DANGER ZONE): BLOCKED with exit 2, agent sees denial reasonhooks/bound-guard.sh(DANGER ZONE): BLOCKED — hook protects itselfCONTRIBUTING.md(safe): passed silently, zero overhead
This is what makes Ouro Loop different from instruction-only approaches: BOUND is enforced by the runtime, not by the agent's good behavior.
These results come from real Ouro Loop sessions on production codebases. Full session logs with methodology observations are available in examples/.
An AI agent was tasked with investigating why precommit latency spiked from 4ms to 200ms under transaction load on a 4-validator PBFT blockchain. Working inside a DANGER ZONE (consensus/):
- 5 hypotheses tested, 4 autonomous remediations — the ROOT_CAUSE gate fired 4 times, each time correctly identifying that the agent was fixing symptoms, not the cause
- After 3 consecutive failed hypotheses, the 3-failure step-back rule kicked in: "stop fixing symptoms, re-examine the architecture" — which led to discovering the real root cause
- Root cause was architectural, not code-level — a single-node HTTP bottleneck was causing consensus-wide delays. The fix was a Caddy reverse proxy, not a code change
- The agent caught its own flawed experiment — when testing an alternative, it ran 4x full stress instead of 1x distributed. It identified the flaw before drawing wrong conclusions
| Metric | Before | After | Delta |
|---|---|---|---|
| Precommit (under load) | 100-200ms | 4ms | -98% |
| Block time (under load) | 111-200ms | 52-57ms | -53% |
| TPS Variance | 40.6% | 1.6% | -96% |
| SysErr rate | 0.00% | 0.00% | = (IRON LAW) |
| Blocks/sec (soak load) | ~8.0 | ~18.5 | +131% |
A simpler session where the ROOT_CAUSE gate prevented a lazy fix: instead of suppressing ESLint errors, the agent was pushed toward genuinely better solutions (replacing <a> with Next.js <Link>, properly handling useEffect state patterns).
See the full session logs: Blockchain L1 | Consumer Product
program.md — methodology instructions (human edits this)
framework.py — runtime CLI: state, verification, logging (agent uses this)
prepare.py — project scanning and initialization (not modified)
hooks/
bound-guard.sh DANGER ZONE enforcement (PreToolUse, exit 2 block)
root-cause-tracker.sh repeated-edit detection (PostToolUse)
drift-detector.sh scope drift warning (PreToolUse)
recall-gate.sh BOUND re-injection on context compaction (PreCompact)
settings.json.template ready-to-use hooks configuration
modules/
bound.md Stage 0: define DANGER ZONES, NEVER DO, IRON LAWS
map.md Stage 1: understand the problem space
plan.md Stage 2: decompose into severity-ordered phases
build.md Stage 3: RED-GREEN-REFACTOR-COMMIT
verify.md Stage 4: three-layer verification gates
loop.md Stage 5: feedback closure
remediation.md autonomous remediation playbook
templates/ starter templates for CLAUDE.md, phase plans, checklists
examples/ real-world BOUND definitions from four project types
- Boundaries before code. Define what must never break before writing any code. A financial system's NEVER DO list prevents more bugs than any test suite.
- Autonomous remediation. Unlike monitoring tools that detect-and-alert, Ouro Loop agents detect-decide-act-report. Like
autoresearchauto-reverts when val_bpb regresses, Ouro Loop auto-remediates when verification fails — inside BOUND, no human permission needed. - Three files that matter.
program.md(methodology),framework.py(runtime),prepare.py(init). Everything else is reference. - Zero dependencies. Pure Python 3.10+. Works with any AI agent, any language, any project type.
| autoresearch | ouro-loop | |
|---|---|---|
| Core idea | Give AI a training loop, let it experiment | Give AI a methodology, let it guard |
| Human programs | program.md (experiment strategy) |
program.md (dev strategy) + CLAUDE.md (boundaries) |
| AI modifies | train.py (model code) |
Target project code + framework.py |
| Fixed constraint | 5-minute training budget | BOUND (DANGER ZONES, NEVER DO, IRON LAWS) |
| Core metric | val_bpb (lower is better) | Multi-layer verification (gates + self-assessment) |
| On failure | Auto-revert, try next experiment | Auto-remediate, try alternative approach |
| Read-only | prepare.py |
prepare.py + modules/ |
Ouro Loop is designed for any scenario where you need an AI agent to work autonomously for extended periods without human babysitting:
- Overnight autonomous development — Define BOUND, start the agent, sleep. Wake up to a log of phases completed and verified, not a broken codebase.
- Long-running refactoring — Let the agent refactor a large codebase in phases, with verification gates ensuring nothing breaks between phases.
- AI-assisted code review and remediation — The agent identifies issues, fixes them, and verifies the fixes — all within defined safety boundaries.
- Continuous integration with AI agents — Plug Ouro Loop into your CI/CD pipeline to let agents handle build failures, test regressions, and dependency updates autonomously.
- Multi-phase feature development — Break complex features into severity-ordered phases. The agent handles CRITICAL changes first, then HIGH, then MEDIUM.
- Production-safe AI coding — For financial systems, blockchain infrastructure, medical software, and any domain where "move fast and break things" is unacceptable.
Ouro Loop is agent-agnostic. It works with any AI coding assistant that can read files and execute terminal commands:
- Claude Code — Anthropic's CLI agent. Native
program.mdskill support. - Cursor — AI-powered IDE. Use
.cursorrulesto reference Ouro Loop modules. - Aider — Terminal-based AI pair programmer.
- Codex CLI — OpenAI's coding agent.
- Windsurf — Codeium's AI IDE.
- Any agent that can read Markdown instructions and run Python scripts.
- karpathy/autoresearch — The inspiration. Autonomous ML experiment loop. Ouro Loop extends this paradigm to general software engineering.
- anthropics/claude-code — The AI agent Ouro Loop was primarily developed with.
- everything-claude-code — Comprehensive Claude Code skills and agents collection.
Q: How is this different from just using .cursorrules or CLAUDE.md?
A: .cursorrules and CLAUDE.md define static instructions that the agent can ignore. Ouro Loop adds a runtime loop — state tracking, multi-layer verification, autonomous remediation, and phase management. The agent doesn't just follow rules; it verifies compliance, detects drift, and self-corrects. Most importantly, BOUND constraints are enforced by runtime hooks (exit 2 hard-block), not by the agent's good behavior.
Q: Can the agent really fix its own mistakes?
A: Yes, within BOUND. When verification fails and the issue is inside the boundary (not a DANGER ZONE), the agent consults modules/remediation.md for a decision playbook: revert, retry with a different approach, or escalate. It reports what it did, not what it's thinking of doing. In a real blockchain session, the agent autonomously remediated 4 failures across 5 hypotheses and found a root cause that was architectural (HTTP routing), not code-level.
Q: How do I add guardrails to Claude Code?
A: Ouro Loop provides 4 Claude Code Hooks that enforce constraints at the tool level. Install them by copying hooks/settings.json.template to .claude/settings.json and editing the paths. The bound-guard.sh hook parses your CLAUDE.md DANGER ZONES and physically blocks edits to protected files. No agent can bypass exit code 2.
Q: How do I let an AI agent code overnight?
A: Define your BOUND (DANGER ZONES, NEVER DO, IRON LAWS) in CLAUDE.md, install the hooks, then tell the agent to read program.md and start the loop. The agent will iterate through Build → Verify → Self-Fix cycles autonomously. When verification fails, it remediates and retries. When it passes, it advances to the next phase. The loop continues until all phases are complete.
Q: What is bounded autonomy in AI coding? A: Bounded autonomy is a paradigm where AI coding agents are granted full autonomous decision-making power within explicitly defined constraints. By defining the 20 things an agent cannot do (the BOUND), you implicitly authorize it to do the 10,000 things required to solve the problem. It's the middle path between human-in-the-loop (constant interruptions) and unbounded agents (unconstrained risk).
Q: How to prevent AI agents from hallucinating file paths?
A: Ouro Loop's EXIST verification gate checks whether referenced files, APIs, and modules actually exist before the agent proceeds. The bound-guard.sh hook also validates file paths against the project structure. If a file doesn't exist, the gate fails and triggers autonomous remediation — the agent corrects its reference instead of proceeding with hallucinated paths.
Q: Can AI agents fix their own bugs autonomously? A: Yes — this is called autonomous remediation. When a verification gate fails, the agent doesn't alert a human. It reads its error logs, consults its remediation playbook, and takes action: revert, retry with a different approach, or escalate only if a DANGER ZONE is involved. The key constraint is BOUND — the agent can self-fix anything inside the boundary.
Q: How to prevent context decay in long AI coding sessions?
A: Ouro Loop addresses context decay through the RECALL verification gate and the recall-gate.sh hook. The gate monitors whether the agent can still recall key constraints. The hook fires before context compression (PreCompact event) and re-injects the BOUND section into the compressed context, preventing constraint amnesia during long sessions.
Q: How long can the agent run autonomously?
A: As long as phases remain. Each phase is independently verifiable, so the agent can run for hours across many phases. The NEVER STOP instruction in program.md keeps the loop going until all phases pass or an EMERGENCY-level issue is hit.
Q: Do I need to install anything?
A: No. Zero dependencies. Pure Python 3.10+ standard library. Clone the repo and point your agent at program.md. Optionally install hooks for runtime enforcement.
Q: When should I NOT use Ouro Loop? A: Don't use Ouro Loop for quick prototypes, hackathon projects, or single-file scripts — the BOUND setup overhead isn't worth it. It's also not designed for real-time interactive coding sessions. Ouro Loop shines when you need to "set it and let it run" — overnight builds, long-running refactors, and multi-phase autonomous development on codebases with critical constraints.
The most valuable contributions are real-world bindings. If you used Ouro Loop to bound an agent in a complex domain, submit a sanitized CLAUDE.md and phase-plan.md to examples/. See CONTRIBUTING.md.
By explicitly defining the 20 things an agent can never do (the BOUND), you implicitly authorize it to autonomously do the 10,000 things required to solve the problem. The constraint space defines the creative space. Read The Manifesto for the full deep-dive.
MIT
