Multi-agent orchestration for Claude Code
Other AI coding tools optimize for speed. BMB optimizes for correctness.
Solo AI coding assistants are fast — but they hallucinate, skip edge cases, and approve their own work. BMB fixes this by running multiple specialized agents that challenge, verify, and compress each other's output.
| Problem | BMB's Solution |
|---|---|
| Self-review bias | Cross-model blind verification — a different model reviews without seeing the original reasoning |
| Design tunnel vision | Council debate with AI challengers arguing alternatives before a single line is written |
| Context explosion | 3-layer compression protocol keeps token budgets tight across long pipelines |
| "Works for me" testing | Divergent framing — verifier receives a deliberately reworded spec to catch assumption leaks |
| Lost knowledge | FTS5 knowledge base + auto-learning promotes recurring lessons automatically |
BMB doesn't replace your judgment — it gives you 5 specialized agents who challenge and verify each other's work before you decide.
Prerequisites: Claude Code CLI, tmux, python3, sqlite3, git
# 1. Install BMB
curl -fsSL https://raw.githubusercontent.com/project820/be-my-butler/main/install.sh | bash
# 2. Verify installation
bmb doctor
# 3. Run your first pipeline
# Open Claude Code in any project and type:
/BMBThat's it. BMB registers its agents, skills, and scripts into your Claude Code environment. Type /BMB in any project to start the full 12-step pipeline.
Every /BMB run walks through these stages. Steps adapt based on the selected recipe — some steps are skipped or shortened for lighter workflows.
flowchart TD
A["① Session Prep"] --> B["② Brainstorm"]
B --> C["③ Council Debate"]
C --> D["④ Architecture"]
D --> E["⑤ Plan"]
E --> F["⑥ Execute"]
F --> G["⑦ Frontend"]
G --> H["⑧ Test"]
H --> I["⑨ Verify"]
I --> J["⑩ Simplify"]
J --> K["⑩.⑤ Analyst"]
K --> L["⑪ Retrospective"]
L --> M["⑫ Cleanup"]
style A fill:#1a1a2e,stroke:#e94560,color:#fff
style B fill:#1a1a2e,stroke:#e94560,color:#fff
style C fill:#16213e,stroke:#0f3460,color:#fff
style D fill:#16213e,stroke:#0f3460,color:#fff
style E fill:#16213e,stroke:#0f3460,color:#fff
style F fill:#0f3460,stroke:#53a8b6,color:#fff
style G fill:#0f3460,stroke:#53a8b6,color:#fff
style H fill:#0f3460,stroke:#53a8b6,color:#fff
style I fill:#533483,stroke:#e94560,color:#fff
style J fill:#533483,stroke:#e94560,color:#fff
style K fill:#1a3a2e,stroke:#22c55e,color:#fff
style L fill:#1a3a2e,stroke:#22c55e,color:#fff
style M fill:#533483,stroke:#e94560,color:#fff
| Step | Agent | What Happens |
|---|---|---|
| 1 | Lead | Session Prep — loads session-prep.md, restores context from prior sessions |
| 2 | Lead | Brainstorm — generates divergent ideas with blind framing |
| 3 | Lead | Council Debate — multi-round structured argument; Lead decides |
| 4 | Architect | Architecture — produces file tree, interface contracts, dependency map |
| 5 | Lead | Plan — converts architecture into ordered execution steps |
| 6 | Lead | Execute — implements changes in an isolated git worktree |
| 7 | Lead | Frontend — UI/UX work (skipped for backend-only recipes) |
| 8 | Tester | Test — writes and runs tests with coverage targets |
| 9 | Verifier | Verify — blind review with divergent spec framing |
| 10 | Lead | Simplify — removes dead code, flattens unnecessary abstractions |
| 10.5 | Analyst | Retrospective Analysis — queries analytics.db, classifies events by Bird's Law severity, identifies promotion candidates from pattern_counts |
| 11 | Lead | Retrospective — bmb_learn calls, analyst report relay, promotion check |
| 12 | Lead | Cleanup — commit, push, session-prep, carry-forward, worktree cleanup |
|
The Verifier agent reviews code using a deliberately reworded specification. If the verifier finds issues the implementer missed, you know the solution has assumption leaks — not just bugs. |
Before any code is written, the Consultant and Lead engage in multi-round structured debate. The Consultant proposes alternatives, plays devil's advocate, and stress-tests assumptions. The Lead makes the final call — but only after hearing the opposition. |
|
Each agent that writes code operates in its own git worktree. Parallel execution without merge conflicts. Changes are reviewed and merged only after verification passes. |
Lessons flow upward: project-local learnings (per-repo) → global learnings (cross-project) → CLAUDE.md promotion (permanent rules). Recurring mistakes automatically become enforced rules. |
|
Long pipelines bleed context. BMB compresses at three layers: intra-step (within each agent), inter-step (handoff summaries), and session-level ( |
Not every task needs 12 steps. Pick a recipe to skip what you don't need — a bugfix skips brainstorm and council; a research task skips execution entirely. |
|
Every pipeline run emits structured telemetry to |
The Architect agent queries live library documentation via Context7 MCP before writing code. No stale API assumptions — always write against the current SDK. |
| Recipe | Steps Used | Best For |
|---|---|---|
feature |
All 12 | New features, large changes |
bugfix |
1 → 5 → 6 → 8 → 9 → 10 → 11 → 12 | Bug investigation and fix |
refactor |
1 → 4 → 5 → 6 → 8 → 9 → 10 → 11 → 12 | Code restructuring |
research |
1 → 2 → 3 → 11 → 12 | Exploration, spikes, design decisions |
review |
1 → 9 → 11 → 12 | Code review only |
infra |
1 → 4 → 5 → 6 → 8 → 9 → 11 → 12 | CI/CD, tooling, config changes |
| Command | Description |
|---|---|
/BMB |
Full 12-step pipeline — select a recipe interactively |
/BMB-brainstorm |
Brainstorm + Council only — explore ideas without executing |
/BMB-refactoring |
Refactor recipe shortcut — skip brainstorm, go straight to architecture |
/BMB-setup |
First-time project setup — generates session-prep.md and config |
/BMB-status |
Project/idea dashboard — stale idea nudges, lifecycle overview |
| Agent | Role | Model |
|---|---|---|
| Architect | System design, file tree, contracts. Queries Context7 for live library docs. | Claude |
| Tester | Test writing and execution | Claude |
| Verifier | Blind review with divergent spec framing | Claude |
| Writer | Documentation generation | Claude |
| Analyst | Retrospective analytics: Bird's Law severity classification, pattern_counts promotion candidates |
Claude (bypassPermissions, read-only) |
The Lead agent orchestrates all pipeline steps that don't have a dedicated specialist agent.
| Dependency | Required | Notes |
|---|---|---|
| Claude Code CLI | Yes | Core runtime |
tmux |
Yes | Agent session management |
python3 |
Yes | Script tooling |
sqlite3 |
Yes | FTS5 knowledge base |
git |
Yes | Worktree isolation |
Run bmb doctor after installation to verify all dependencies.
Explore the full pipeline visually:
Mobile-optimized summary pages (7-card vertical scroll, 4 locales):
| Language | URL |
|---|---|
| English | m.html |
| 한국어 | m.ko.html |
| 日本語 | m.ja.html |
| 繁體中文 | m.zh-TW.html |
~/Projects/bmb/ # Source of truth (GitHub repo)
├── skills/bmb*/ # 5 slash command skills
├── agents/bmb-*.md # 5 agent definitions
├── bmb-system/
│ ├── config/ # defaults.json (v2)
│ ├── scripts/ # bmb-config.sh, bmb-ideas.sh, bmb-analytics.sh, bmb-learn.sh,
│ │ # bmb-external-incidents.sh, knowledge-index.sh, knowledge-search.sh
│ └── plans/ # Version release plans
└── docs/ # Architecture, configuration, troubleshooting
~/.claude/ # Runtime (symlinks to repo)
├── skills/bmb* → repo # Symlinked skills
├── agents/ → repo # Symlinked agents
└── bmb-system/ → repo # Symlinked runtime
.bmb/ # Per-project runtime directory
├── config.json # Project-local config (merged from 3 layers)
├── analytics/
│ └── analytics.db # SQLite: sessions, events, pattern_counts
├── handoffs/
│ └── analyst-report.md # Step 10.5 output
└── sessions/{id}/
├── carry-forward.md # Atomic session continuity
└── plan-review.md # Cross-model plan critique
6-Feature Upgrade — cross-model fix, agent discipline, visual brainstorming, session continuity, parallel sessions, and Monitor watchdog.
| Capability | Description |
|---|---|
| OMX Cross-Model Fix | Replaced raw codex exec with MCP-disabled invocation. Eliminates 100% timeout rate caused by MCP server loading. |
| Superpowers Discipline | Verification gates, debugging discipline, TDD checklists, and YAGNI principles embedded directly in agent prompts. All agents upgraded to Opus 4.6 (1M context). |
| Visual Brainstorming | Browser-based visual companion for Step 2 — mockups, architecture diagrams, trade-off matrices via Superpowers server. |
| Session-End Prep | Step 12 auto-generates next-session-plan.md with completed items, follow-ups, and a one-line start prompt. |
| Parallel Sessions | SESSION_MODE enum (standalone/sub/consolidation) for safe concurrent pipelines with track splitting and consolidation prompts. |
| Monitor Watchdog | Haiku Monitor enhanced with pane sweep for orphaned processes and nudge escalation for stalled agents. |
Contributions are welcome. Please read the Contributing Guide before submitting a PR.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Run the test suite (
bmb doctor && /BMB-setup) - Commit your changes
- Open a Pull Request
MIT — use it however you want.
Built with obstinate attention to correctness.