Version 1.0 | 2026-03-17
AceClaw actively manages what enters the context window so long-running sessions stay effective. This document covers the full pipeline: system prompt budgeting, request-aware assembly, request-time pruning, context compaction, candidate injection, and observability.
User query
↓
RequestFocus (extract symbols, files, plan signals)
↓
SystemPromptLoader.inspectRequest()
├── MemoryTierLoader.loadAll() [8-tier hierarchy]
├── RuleEngine.loadRules() [path-based rules]
├── CandidatePromptAssembler [promoted candidates]
├── applyRequestFocusPriority() [dynamic priority boost]
├── ContextAssemblyPlan.build() [budget enforcement]
│ └── TierTruncator.applyBudget() [70/20/10 truncation]
└── ContextEstimator.estimateTokens() [per-section cost]
↓
Assembled system prompt
↓
StreamingAgentLoop (per LLM call)
├── MessageCompactor.pruneForRequest() [transient pruning]
├── checkAndCompact() [3-phase compaction]
└── buildRequest() [final LLM request]
Configuration record that caps system prompt size.
| Field | Default | Description |
|---|---|---|
maxPerTierChars |
20,000 | Max characters per individual memory tier |
maxTotalChars |
150,000 | Max total characters across all tiers + base prompt |
For smaller context windows, forContextWindow(contextTokens, maxOutput) scales the budget to ~25% of the effective window (context minus output), clamped between 1K–150K total and 1K–20K per tier.
Enforces the budget in two passes:
- Pass 1: Truncate individual tiers exceeding
maxPerTierChars - Pass 2: If total exceeds
maxTotalChars, truncate lowest-priority tiers first
Truncation split: 70% head + 20% tail + 10% marker (<!-- [TRUNCATED] Original: N chars -->). This preserves both core instructions at the start and recent additions at the end.
Protected tiers: Soul (priority 100) and Managed Policy (priority 90) are never truncated.
Generic section-based system prompt assembler. Maintains insertion order while allowing priority-driven truncation.
Key methods:
addSection(key, content, priority, protectedSection)— builder patternbuild(SystemPromptBudget)→Resultcontaining the final prompt, truncated section keys, and per-section metadata
Result includes per-section details:
| Field | Description |
|---|---|
key |
Section identifier (e.g., "memory:soul", "rules:test-conventions") |
priority |
Numeric priority (higher = harder to truncate) |
originalChars |
Size before budget enforcement |
finalChars |
Size after truncation |
included |
Whether the section survived budget enforcement |
truncated |
Whether the section was truncated |
RequestFocus extracts signals from each user query to dynamically boost relevant context sections.
public record RequestFocus(
String querySummary, // normalized query (max 160 chars)
List<String> activeFilePaths, // files in focus (max 6)
List<String> activeSymbols, // Java symbols from backticks/CamelCase (max 8)
List<String> planSignals // inferred action signals
)Symbol extraction: Identifies backtick-quoted code (`StreamingAgentLoop`) and CamelCase identifiers. Validates as Java identifiers, optionally dotted (e.g., AppService.validate, max 6 segments).
Plan signals (keyword-inferred):
"continue current execution"— continue, resume, next step"planning context"— plan, next steps, steps to"code change requested"— fix, implement, edit, update, refactor"verification requested"— test, verify, review
applyRequestFocusPriority() boosts section priorities by up to 12 points (MAX_FOCUS_BOOST) based on RequestFocus signals:
| Section type | Boost condition | Points |
|---|---|---|
| Memory | Symbol match in content | +16 |
| Auto-memory | Always (learned signals) | +8 |
| Markdown memory | Active file mentioned | +10 |
| Journal | Continuation signal | +12 |
| Rules | Active file matches glob | +4 |
| Candidates | Symbol match | +6 |
| Git context | Continuation signal | +6 |
This means a section about StreamingAgentLoop gets boosted when the user asks about that class, making it survive truncation even under budget pressure.
Lightweight pre-flight pruning before each LLM call. Phase 1 only — no LLM calls, no session history mutation.
public RequestPruneResult pruneForRequest(
List<Message> messages,
String systemPrompt,
List<ToolDefinition> tools
)Behavior:
- Estimates full context with
ContextEstimator.estimateFullContext() - If below the compaction trigger threshold → returns original messages unchanged
- Otherwise, applies Phase 1 pruning:
- Keeps last N turns intact (protected)
- Replaces old tool results with 200-char stubs +
[content pruned...]marker - Removes old thinking blocks entirely
- Returns
RequestPruneResultwith pruned messages and token estimates
Key property: The pruned messages are transient — they are used only for the immediate LLM request. Session history (allMessages) is never modified. This is critical because compaction decisions must operate on the full conversation, not a transiently pruned view.
Integration in StreamingAgentLoop:
1. Run pruneForRequest() → get transient pruned messages + token estimate
2. Feed pruned token estimate to checkAndCompact() → compaction only fires if pruning alone is insufficient
3. If compaction ran, use full allMessages (now compacted); otherwise use pruned messages for the LLM call
When context pressure reaches 85% of the effective window, MessageCompactor runs a multi-phase compaction:
Extracts key items from conversation before pruning:
- Modified files (from
write_file,edit_filetool uses) - Bash commands (>10 chars, excluding
cd/ls) - Errors (from
ToolResultwithisError=true)
These flow to AutoMemoryStore via the StreamingAgentHandler.
- Replace old tool results with 200-char stubs
- Clear old thinking blocks
- Protect last 5 turns
Target: 60% of effective window. If Phase 1 achieves this, Phase 2 is skipped.
- Structured prompt asks the LLM to summarize the conversation
- Extracts
<summary>tags from the response - Replaces old messages with summary + continuation instruction
[COMPACTED]marker prevents re-summarizing summaries
ContextEstimator uses a ~4 chars/token heuristic but prefers actual usage.inputTokens from API responses when available. Key methods:
| Method | Purpose |
|---|---|
estimateTokens(String) |
Basic string → token estimate |
estimateFullContext(prompt, tools, messages) |
Total context cost |
checkBudget(...) |
Pre-flight validation with BudgetCheck result |
Overhead constants: MESSAGE_OVERHEAD=4, TOOL_USE_OVERHEAD=20, TOOL_RESULT_OVERHEAD=10, TOOL_DEF_OVERHEAD=10.
Manages learning candidates through a state machine lifecycle:
SHADOW → PROMOTED → IN_USE → ARCHIVED
↓
DEMOTED (cooldown) → SHADOW (retry)
| State | Description |
|---|---|
SHADOW |
Newly observed, accumulating evidence |
PROMOTED |
Passed promotion gates, injected into system prompt |
DEMOTED |
Failed or regressed, cooling down before retry |
Key behaviors:
upsert(CandidateObservation)— merges with existing candidates via Jaccard similarity (≥0.50, 30-day window)evaluateAll()— automatic promotion/demotion based onCandidateStateMachinegates- Score decay: exponential with 30-day half-life and 7-day grace period
- Retention: automatic removal after 90 days of inactivity
- Integrity: HMAC-SHA256 signed; tampered entries skipped on load
Promoted candidates are assembled into the system prompt by CandidatePromptAssembler and receive RequestFocus priority boosts based on symbol/file/plan signal matches.
ContextRpcHelper registers the context.inspect JSON-RPC method. It returns:
{
"sessionId": "...",
"totalChars": 42000,
"estimatedTokens": 10500,
"contextWindowTokens": 200000,
"systemPromptSharePct": 5.25,
"requestFocus": {
"querySummary": "fix the streaming bug",
"activeFilePaths": ["src/agent/StreamingAgentLoop.java"],
"activeSymbols": ["StreamingAgentLoop"],
"planSignals": ["code change requested"]
},
"budget": { "maxPerTierChars": 20000, "maxTotalChars": 150000 },
"truncatedSectionKeys": [],
"sections": [
{
"key": "memory:soul",
"sourceType": "memory",
"scopeType": "always-on",
"inclusionReason": "8-tier memory hierarchy",
"priority": 100,
"protected": true,
"originalChars": 1200,
"finalChars": 1200,
"included": true,
"truncated": false,
"evidence": ["tier=Soul", "priority=100"]
}
]
}The /context command in TerminalRepl renders the inspection data:
/context list # Summary: prompt size, budget, pressure, per-section costs
/context detail <key> # Full content of a specific section with metadata
List view includes:
- System prompt size and context window share percentage
- Live context occupation and pressure level (color-coded)
- Compaction statistics (count, phases reached, tokens saved)
- Injection cost breakdown by type (rules, candidates, skills, learned signals, memory, core)
- Per-section cost table
ContextMonitor in the CLI tracks live context metrics across turns:
| Method | Description |
|---|---|
recordTurnComplete() |
Updates metrics after each turn |
currentContextTokens() |
Live context occupation |
pressureLevel() |
Status indicator (low/medium/high/critical) |
peakContextTokens() |
Session peak |
compactionCount() |
Total compactions triggered |
-
Character budgets, not token budgets — Neither Claude Code nor OpenClaw uses precise token accounting for system prompt assembly. Character caps with ~4 chars/token heuristic are sufficient and avoid API dependencies.
-
Transient pruning before persistent compaction —
pruneForRequest()produces a disposable pruned copy. Session history stays intact for compaction decisions. This prevents double-counting and ensures compaction only fires when truly needed. -
Priority-driven truncation — Human-authored tiers (Soul, Policy) survive budget pressure; agent-generated tiers (Auto-Memory, Journal) are truncated first.
-
Request-aware boosting — Static priority ordering is overridden per-request based on query signals. A section about
ErrorDetectorgets boosted when the user asks about error detection, even if its base priority is low. -
Observable by default — Every section carries metadata (source type, scope, inclusion reason, evidence) that can be inspected via
/context. No black-box prompt assembly.