Most AI coding tools lose context between sessions. The ones that try to fix this reach for the same solution: embed everything, retrieve what seems relevant.
That works — until it doesn't.
Embedding-based retrieval treats project knowledge as a flat collection of chunks. Every file, every function, every comment gets embedded into the same vector space. When the agent needs context, it searches that space and pulls back the top-N results.
For small projects, this is fine. Precision stays high because the search space is small.
But as a codebase grows — more files, more conventions, more architectural decisions, more history — embedding precision collapses. The agent asks for "how authentication works" and gets fragments of auth middleware, test fixtures, migration scripts, and a README section about OAuth. Some of it is relevant. Most of it is noise. The agent spends tokens sorting through retrieval results instead of doing the work.
This is a fundamental property of flat retrieval, not a tuning problem. Precision degrades as corpus scale increases. More content means more candidates competing for the same top-N slots.
Books have chapters. APIs have namespaces. Codebases have directories. Humans organize knowledge into hierarchies because hierarchies let you ignore most of the information most of the time.
The same principle applies to AI agents. Instead of searching a flat space, give the agent a pre-filtered navigation path. Start with a high-level summary. If the agent needs more detail, follow a link down one level. Each level is a curated summary of the level below — written by a human or by a previous agent run, not retrieved by an embedding model.
This is deterministic, not probabilistic. The agent doesn't hope the right chunk floats to the top of a similarity search. It navigates a known structure.
Beastmode organizes project knowledge into four levels:
L0 — System Manual (BEASTMODE.md)
The sole autoloaded file (~40 lines). Contains hierarchy spec, persona definition,
writing rules, and conventions. Enough for any agent to orient after compression.
Always loaded via CLAUDE.md.
L1 — Domain Summaries (e.g., context/DESIGN.md)
One file per phase. Five L1 files total. Each contains a summary paragraph plus 2-3 sentence
descriptions of each topic below it. Loaded during the prime sub-phase of each
workflow phase (not autoloaded). An agent reading L1 files knows where everything
is without loading everything.
L2 — Detail Files (e.g., context/design/architecture.md)
Full topic detail. Architecture decisions, code conventions, test strategies.
Loaded on demand via convention-based paths. Only loaded when the agent's
current task touches that domain.
L3 — State Artifacts (e.g., state/design/2026-03-05-feature.md)
Raw design documents, implementation plans, validation records. Referenced from L2
"Related Decisions" sections. Rarely loaded in full — agents find them through L2
links when they need provenance.
Beastmode separates knowledge by purpose into two domains, each with its own
directory tree under .beastmode/:
| Domain | Path | Purpose | Example |
|---|---|---|---|
| Context | context/ |
Published knowledge. What the project knows and how it works. | context/design/architecture.md |
| State | state/ |
Checkpoint artifacts. What happened when. | state/design/2026-03-06-feature.md |
Context spans L1 and L2. State lives at L3 only. Universal process rules live in BEASTMODE.md (L0). Every phase has its own subdirectory in the context domain.
Every level follows the same structure: summary + section summaries + convention paths to the next level down. This pattern repeats at every level, which is why we call it fractal. An agent at any level can decide whether to go deeper or stay at the current summary.
Each summary is written — either by the user, by the agent during a retro phase, or during brownfield discovery. These aren't auto-generated embeddings. They're human-quality descriptions that compress the essential information from the level below.
When the retro sub-phase runs at the end of each workflow phase, it reviews L2 files for accuracy, updates L1 summaries to match, and propagates changes upward. The hierarchy stays accurate because maintenance is built into the workflow, not bolted on.
Knowledge flows upward through a strict promotion path. Phases write artifacts to
state/ only — never directly to context/. The retro sub-phase, which
runs at release, is the sole gatekeeper for upward promotion:
| Writer | Allowed Targets | Mechanism |
|---|---|---|
| Phase checkpoints | state/ |
Direct write |
| Retro | L1, L2 | Bottom-up promotion |
| Release | L0 | Release-time L1->L0 rollup |
| Init | L0, L1, L2 | Bootstrap exemption |
This prevents phases from corrupting published knowledge. A design phase can't
accidentally overwrite an architecture decision in context/design/architecture.md
— it writes its design doc to state/design/, and retro decides what gets promoted.
The hierarchy doesn't maintain itself. It stays accurate because maintenance is structural — built into a five-phase workflow that every feature passes through:
design -> plan -> implement -> validate -> release
Each phase follows the same four sub-phases: prime (load context) -> execute (do the work) -> validate (check quality) -> checkpoint (save artifacts, run retro).
The retro sub-phase runs inside every checkpoint. It compares the phase's output against existing L1 and L2 files, proposes updates, and promotes changes upward through the hierarchy. This is how the hierarchy compounds — every phase cycle is an opportunity to refine what the project knows about itself.
Retro is also gated: a configurable HITL (human-in-the-loop) gate system controls
whether context changes require human approval or auto-apply.
Gates are defined in .beastmode/config.yaml.
Token efficiency. Agents load L0 by default (~40 lines). L1 loaded during prime. L2 and L3 loaded on demand. A flat system would load 10-50x more tokens for the same effective context.
Precision. A curated summary written during the last retro is more relevant than the top-5 embedding matches for "how does auth work." Summaries capture intent and relationships. Embeddings capture surface similarity.
Compounding knowledge. Every phase retro updates summaries. Over weeks and months, the hierarchy becomes a progressively richer, more accurate representation of the project. The agent gets smarter about your codebase because the hierarchy improves with every cycle.
Session survival. The hierarchy lives in .beastmode/ and is version-controlled
in git. There's no vector database to maintain, no embeddings to regenerate when code
changes. The context survives sessions, branches, and collaborators because it's just
markdown files in your repo.