Skip to content

feat(agent): Token-based context budgeting and smart tool result truncation #249

@mmogr

Description

@mmogr

Summary

The agentic loop's context management uses character-based budgeting (MAX_CONTEXT_CHARS = 180_000 in agentLoop.ts) for pruning conversation history. Characters are a rough proxy for tokens and don't account for tokenizer-specific encoding. Additionally, the main agentic loop lacks the sophisticated context compression available in the deep research loop (roundSummaries, multi-round pruning).

Current Implementation

Character-based budget (agentLoop.ts):

export const MAX_CONTEXT_CHARS = 180_000;
export const KEEP_LAST_TOOL_MESSAGES = 10;
export const TOOL_RESULT_SNIPPET_CHARS = 4_000;

function totalChars(messages: ChatMessage[]): number {
  return messages.reduce((acc, m) => acc + (m.content?.length ?? 0), 0);
}

export function pruneForBudget(messages: ChatMessage[]): ChatMessage[] {
  if (totalChars(messages) <= MAX_CONTEXT_CHARS) return messages;
  // ... drop old tool messages, then drop all but last 12 turns
}

Tool result truncation:

export function summarizeToolResult(_name: string, res: ToolResult): string {
  if (!res.success) {
    return `ERROR: ${res.error}`.slice(0, TOOL_RESULT_SNIPPET_CHARS);
  }
  const raw = stableStringify(res.data);
  return raw.slice(0, TOOL_RESULT_SNIPPET_CHARS); // ← naive truncation
}

Problems

  1. Chars ≠ tokens — 180K chars might be 45K tokens or 90K tokens depending on content (code vs prose vs JSON)
  2. Naive truncationslice(0, 4000) can cut JSON mid-object, producing invalid data that confuses the model
  3. No summarization — the deep research loop summarizes completed rounds, but the main agent loop doesn't summarize completed tool interactions
  4. Fixed budget — doesn't adapt to the actual model's context window size

Proposed Solution

Phase 1: Token-approximate budgeting

Replace character budget with a simple token approximation:

function estimateTokens(text: string): number {
  // GPT-style: ~4 chars per token for English, ~3 for code/JSON
  // This is intentionally conservative (overestimates)
  return Math.ceil(text.length / 3);
}

export const MAX_CONTEXT_TOKENS = 32_000; // Default; should be configurable per model

Make the budget configurable based on the active model's known context window.

Phase 2: Smart tool result truncation

Replace naive slice() with structure-preserving truncation:

function truncateToolResult(data: unknown, maxChars: number): string {
  const raw = stableStringify(data);
  if (raw.length <= maxChars) return raw;
  
  // For arrays: keep first and last elements, indicate truncation
  if (Array.isArray(data) && data.length > 2) {
    const first = stableStringify(data[0]);
    const last = stableStringify(data[data.length - 1]);
    return `[${first}, ... (${data.length - 2} items omitted), ${last}]`;
  }
  
  // For objects: keep keys, truncate long values
  // For strings: truncate with "..." indicator
  return raw.slice(0, maxChars - 20) + '... (truncated)';
}

Phase 3: Lift round-summary compression from deep research

Port the createRoundSummary pattern from deep research into the main agent loop:

  • After every N tool iterations (e.g., 5), summarize completed tool interactions into a compressed working memory entry
  • Replace individual tool result messages with the summary
  • This keeps the context lean during long agent sessions

Files to Modify

File Change
src/hooks/useGglibRuntime/agentLoop.ts Replace MAX_CONTEXT_CHARS with token-based budget; improve summarizeToolResult
src/hooks/useGglibRuntime/agentLoop.ts Add estimateTokens() utility
src/hooks/useGglibRuntime/agentLoop.ts Port round-summary compression from deep research
src/hooks/useGglibRuntime/runAgenticLoop.ts Use configurable budget based on model context window
src/config/ Add agent loop configuration (max tokens, summarization interval)

Acceptance Criteria

  • Context budget uses token approximation instead of raw character count
  • Budget is configurable and adapts to model's context window when known
  • Tool result truncation preserves JSON structure (no mid-object cuts)
  • Array results show first/last elements with count of omitted items
  • Periodic summarization compresses old tool interactions
  • Long agent sessions (20+ iterations) don't overflow context window
  • estimateTokens() has unit tests with known inputs
  • No regression for short conversations that fit within budget

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions