AceClaw Context Engineering

Version 1.0 | 2026-03-17

AceClaw actively manages what enters the context window so long-running sessions stay effective. This document covers the full pipeline: system prompt budgeting, request-aware assembly, request-time pruning, context compaction, candidate injection, and observability.

Architecture Overview

User query
    ↓
RequestFocus (extract symbols, files, plan signals)
    ↓
SystemPromptLoader.inspectRequest()
    ├── MemoryTierLoader.loadAll()          [8-tier hierarchy]
    ├── RuleEngine.loadRules()              [path-based rules]
    ├── CandidatePromptAssembler            [promoted candidates]
    ├── applyRequestFocusPriority()         [dynamic priority boost]
    ├── ContextAssemblyPlan.build()         [budget enforcement]
    │   └── TierTruncator.applyBudget()    [70/20/10 truncation]
    └── ContextEstimator.estimateTokens()   [per-section cost]
    ↓
Assembled system prompt
    ↓
StreamingAgentLoop (per LLM call)
    ├── MessageCompactor.pruneForRequest()  [transient pruning]
    ├── checkAndCompact()                   [3-phase compaction]
    └── buildRequest()                      [final LLM request]

1. System Prompt Budget

SystemPromptBudget

Configuration record that caps system prompt size.

Field	Default	Description
`maxPerTierChars`	20,000	Max characters per individual memory tier
`maxTotalChars`	150,000	Max total characters across all tiers + base prompt

For smaller context windows, forContextWindow(contextTokens, maxOutput) scales the budget to ~25% of the effective window (context minus output), clamped between 1K–150K total and 1K–20K per tier.

TierTruncator

Enforces the budget in two passes:

Pass 1: Truncate individual tiers exceeding maxPerTierChars
Pass 2: If total exceeds maxTotalChars, truncate lowest-priority tiers first

Truncation split: 70% head + 20% tail + 10% marker (). This preserves both core instructions at the start and recent additions at the end.

Protected tiers: Soul (priority 100) and Managed Policy (priority 90) are never truncated.

2. Context Assembly

ContextAssemblyPlan

Generic section-based system prompt assembler. Maintains insertion order while allowing priority-driven truncation.

Key methods:

addSection(key, content, priority, protectedSection) — builder pattern
build(SystemPromptBudget) → Result containing the final prompt, truncated section keys, and per-section metadata

Result includes per-section details:

Field	Description
`key`	Section identifier (e.g., "memory:soul", "rules:test-conventions")
`priority`	Numeric priority (higher = harder to truncate)
`originalChars`	Size before budget enforcement
`finalChars`	Size after truncation
`included`	Whether the section survived budget enforcement
`truncated`	Whether the section was truncated

3. Request-Aware Priority (RequestFocus)

RequestFocus extracts signals from each user query to dynamically boost relevant context sections.

Extraction

public record RequestFocus(
    String querySummary,           // normalized query (max 160 chars)
    List<String> activeFilePaths,  // files in focus (max 6)
    List<String> activeSymbols,    // Java symbols from backticks/CamelCase (max 8)
    List<String> planSignals       // inferred action signals
)

Symbol extraction: Identifies backtick-quoted code (`StreamingAgentLoop`) and CamelCase identifiers. Validates as Java identifiers, optionally dotted (e.g., AppService.validate, max 6 segments).

Plan signals (keyword-inferred):

"continue current execution" — continue, resume, next step
"planning context" — plan, next steps, steps to
"code change requested" — fix, implement, edit, update, refactor
"verification requested" — test, verify, review

Priority Boosting

applyRequestFocusPriority() boosts section priorities by up to 12 points (MAX_FOCUS_BOOST) based on RequestFocus signals:

Section type	Boost condition	Points
Memory	Symbol match in content	+16
Auto-memory	Always (learned signals)	+8
Markdown memory	Active file mentioned	+10
Journal	Continuation signal	+12
Rules	Active file matches glob	+4
Candidates	Symbol match	+6
Git context	Continuation signal	+6

This means a section about StreamingAgentLoop gets boosted when the user asks about that class, making it survive truncation even under budget pressure.

4. Request-Time Pruning

MessageCompactor.pruneForRequest()

Lightweight pre-flight pruning before each LLM call. Phase 1 only — no LLM calls, no session history mutation.

public RequestPruneResult pruneForRequest(
    List<Message> messages,
    String systemPrompt,
    List<ToolDefinition> tools
)

Behavior:

Estimates full context with ContextEstimator.estimateFullContext()
If below the compaction trigger threshold → returns original messages unchanged
Otherwise, applies Phase 1 pruning:
- Keeps last N turns intact (protected)
- Replaces old tool results with 200-char stubs + [content pruned...] marker
- Removes old thinking blocks entirely
Returns RequestPruneResult with pruned messages and token estimates

Key property: The pruned messages are transient — they are used only for the immediate LLM request. Session history (allMessages) is never modified. This is critical because compaction decisions must operate on the full conversation, not a transiently pruned view.

Integration in StreamingAgentLoop:

1. Run pruneForRequest() → get transient pruned messages + token estimate
2. Feed pruned token estimate to checkAndCompact() → compaction only fires if pruning alone is insufficient
3. If compaction ran, use full allMessages (now compacted); otherwise use pruned messages for the LLM call

5. Context Compaction (3-Phase)

When context pressure reaches 85% of the effective window, MessageCompactor runs a multi-phase compaction:

Phase 0: Memory Flush (Heuristic)

Extracts key items from conversation before pruning:

Modified files (from write_file, edit_file tool uses)
Bash commands (>10 chars, excluding cd/ls)
Errors (from ToolResult with isError=true)

These flow to AutoMemoryStore via the StreamingAgentHandler.

Phase 1: Prune (Free)

Replace old tool results with 200-char stubs
Clear old thinking blocks
Protect last 5 turns

Target: 60% of effective window. If Phase 1 achieves this, Phase 2 is skipped.

Phase 2: Summarize (LLM Call)

Structured prompt asks the LLM to summarize the conversation
Extracts <summary> tags from the response
Replaces old messages with summary + continuation instruction
[COMPACTED] marker prevents re-summarizing summaries

Token Tracking

ContextEstimator uses a ~4 chars/token heuristic but prefers actual usage.inputTokens from API responses when available. Key methods:

Method	Purpose
`estimateTokens(String)`	Basic string → token estimate
`estimateFullContext(prompt, tools, messages)`	Total context cost
`checkBudget(...)`	Pre-flight validation with `BudgetCheck` result

Overhead constants: MESSAGE_OVERHEAD=4, TOOL_USE_OVERHEAD=20, TOOL_RESULT_OVERHEAD=10, TOOL_DEF_OVERHEAD=10.

6. Candidate Injection

CandidateStore

Manages learning candidates through a state machine lifecycle:

SHADOW → PROMOTED → IN_USE → ARCHIVED
           ↓
        DEMOTED (cooldown) → SHADOW (retry)

State	Description
`SHADOW`	Newly observed, accumulating evidence
`PROMOTED`	Passed promotion gates, injected into system prompt
`DEMOTED`	Failed or regressed, cooling down before retry

Key behaviors:

upsert(CandidateObservation) — merges with existing candidates via Jaccard similarity (≥0.50, 30-day window)
evaluateAll() — automatic promotion/demotion based on CandidateStateMachine gates
Score decay: exponential with 30-day half-life and 7-day grace period
Retention: automatic removal after 90 days of inactivity
Integrity: HMAC-SHA256 signed; tampered entries skipped on load

Promoted candidates are assembled into the system prompt by CandidatePromptAssembler and receive RequestFocus priority boosts based on symbol/file/plan signal matches.

7. Observability

context.inspect RPC

ContextRpcHelper registers the context.inspect JSON-RPC method. It returns:

{
  "sessionId": "...",
  "totalChars": 42000,
  "estimatedTokens": 10500,
  "contextWindowTokens": 200000,
  "systemPromptSharePct": 5.25,
  "requestFocus": {
    "querySummary": "fix the streaming bug",
    "activeFilePaths": ["src/agent/StreamingAgentLoop.java"],
    "activeSymbols": ["StreamingAgentLoop"],
    "planSignals": ["code change requested"]
  },
  "budget": { "maxPerTierChars": 20000, "maxTotalChars": 150000 },
  "truncatedSectionKeys": [],
  "sections": [
    {
      "key": "memory:soul",
      "sourceType": "memory",
      "scopeType": "always-on",
      "inclusionReason": "8-tier memory hierarchy",
      "priority": 100,
      "protected": true,
      "originalChars": 1200,
      "finalChars": 1200,
      "included": true,
      "truncated": false,
      "evidence": ["tier=Soul", "priority=100"]
    }
  ]
}

/context CLI Command

The /context command in TerminalRepl renders the inspection data:

/context list          # Summary: prompt size, budget, pressure, per-section costs
/context detail <key>  # Full content of a specific section with metadata

List view includes:

System prompt size and context window share percentage
Live context occupation and pressure level (color-coded)
Compaction statistics (count, phases reached, tokens saved)
Injection cost breakdown by type (rules, candidates, skills, learned signals, memory, core)
Per-section cost table

Context Monitor

ContextMonitor in the CLI tracks live context metrics across turns:

Method	Description
`recordTurnComplete()`	Updates metrics after each turn
`currentContextTokens()`	Live context occupation
`pressureLevel()`	Status indicator (low/medium/high/critical)
`peakContextTokens()`	Session peak
`compactionCount()`	Total compactions triggered

Key Design Decisions

Character budgets, not token budgets — Neither Claude Code nor OpenClaw uses precise token accounting for system prompt assembly. Character caps with ~4 chars/token heuristic are sufficient and avoid API dependencies.
Transient pruning before persistent compaction — pruneForRequest() produces a disposable pruned copy. Session history stays intact for compaction decisions. This prevents double-counting and ensures compaction only fires when truly needed.
Priority-driven truncation — Human-authored tiers (Soul, Policy) survive budget pressure; agent-generated tiers (Auto-Memory, Journal) are truncated first.
Request-aware boosting — Static priority ordering is overridden per-request based on query signals. A section about ErrorDetector gets boosted when the user asks about error detection, even if its base priority is low.
Observable by default — Every section carries metadata (source type, scope, inclusion reason, evidence) that can be inspected via /context. No black-box prompt assembly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AceClaw Context Engineering

Architecture Overview

1. System Prompt Budget

SystemPromptBudget

TierTruncator

2. Context Assembly

ContextAssemblyPlan

3. Request-Aware Priority (RequestFocus)

Extraction

Priority Boosting

4. Request-Time Pruning

MessageCompactor.pruneForRequest()

5. Context Compaction (3-Phase)

Phase 0: Memory Flush (Heuristic)

Phase 1: Prune (Free)

Phase 2: Summarize (LLM Call)

Token Tracking

6. Candidate Injection

CandidateStore

7. Observability

context.inspect RPC

/context CLI Command

Context Monitor

Key Design Decisions

FilesExpand file tree

context-engineering.md

Latest commit

History

context-engineering.md

File metadata and controls

AceClaw Context Engineering

Architecture Overview

1. System Prompt Budget

SystemPromptBudget

TierTruncator

2. Context Assembly

ContextAssemblyPlan

3. Request-Aware Priority (RequestFocus)

Extraction

Priority Boosting

4. Request-Time Pruning

MessageCompactor.pruneForRequest()

5. Context Compaction (3-Phase)

Phase 0: Memory Flush (Heuristic)

Phase 1: Prune (Free)

Phase 2: Summarize (LLM Call)

Token Tracking

6. Candidate Injection

CandidateStore

7. Observability

context.inspect RPC

/context CLI Command

Context Monitor

Key Design Decisions