-
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
component: frontendReact/TypeScript UIReact/TypeScript UIllm: inferenceModel loading/inferenceModel loading/inferencepriority: lowNice to haveNice to havesize: m4-8 hours (half to full day)4-8 hours (half to full day)type: featureNew functionality or enhancementNew functionality or enhancement
Description
Parent: #249 — Token-based context budgeting
Goal
Replace the character-based context budgeting (MAX_CONTEXT_CHARS = 180_000) with token-aware budgeting that uses the model's actual context window size.
Background
Currently agentLoop.ts uses pruneForBudget() with a fixed character limit:
const MAX_CONTEXT_CHARS = 180_000; // ~45K tokens at 4 chars/tokenThis is problematic because:
- Different models have different context windows (4K, 8K, 32K, 128K)
- 4 chars/token is a rough heuristic; actual ratio varies by language and content
- No way to reserve space for the model's response
- The context window size is available from the model metadata (GGUF) but not used
Implementation
1. Create src/utils/tokenEstimator.ts
/**
* Approximate token count using multiple heuristics.
* More accurate than simple char/4 for mixed content.
*/
export function estimateTokens(text: string): number {
// Heuristic: count word boundaries + special tokens
const words = text.split(/\s+/).length;
const specialTokens = (text.match(/[{}[\](),;:'"<>]/g) || []).length;
// Rough GPT-style tokenization: ~1.3 tokens per word + special chars
return Math.ceil(words * 1.3 + specialTokens * 0.5);
}
/**
* Estimate tokens for a message array using per-message overhead.
*/
export function estimateMessagesTokens(messages: any[]): number {
const PER_MESSAGE_OVERHEAD = 4; // role, name, etc.
return messages.reduce((sum, msg) => {
const content = typeof msg.content === 'string' ? msg.content : JSON.stringify(msg.content);
return sum + estimateTokens(content) + PER_MESSAGE_OVERHEAD;
}, 0);
}2. Get model context window size
Pull the actual context size from the server status API (already available from GGUF metadata):
interface ContextBudget {
totalTokens: number; // Model's context window (e.g., 8192)
reserveForResponse: number; // Tokens reserved for model output (e.g., 2048)
reserveForTools: number; // Space for tool definitions (estimated)
availableForHistory: number; // totalTokens - reserves
}
export function createContextBudget(modelContextSize: number): ContextBudget {
const reserveForResponse = Math.min(modelContextSize * 0.25, 4096);
const reserveForTools = 500; // Estimated based on tool count
return {
totalTokens: modelContextSize,
reserveForResponse,
reserveForTools,
availableForHistory: modelContextSize - reserveForResponse - reserveForTools,
};
}3. Update pruneForBudget() in agentLoop.ts
Replace character-based pruning with token-aware pruning:
export function pruneForBudget(
messages: any[],
budget: ContextBudget,
): any[] {
let totalTokens = estimateMessagesTokens(messages);
if (totalTokens <= budget.availableForHistory) return messages;
// Drop old tool result messages first (keep system + last N turns)
// ... same logic but using token estimates instead of char counts
}Files to Create/Modify
| File | Action |
|---|---|
src/utils/tokenEstimator.ts |
Create — estimateTokens(), estimateMessagesTokens() |
src/hooks/useGglibRuntime/agentLoop.ts |
Replace MAX_CONTEXT_CHARS with token-based ContextBudget |
src/hooks/useGglibRuntime/agentLoop.ts |
Update pruneForBudget() to use token estimates |
src/hooks/useGglibRuntime/runAgenticLoop.ts |
Pass model context size to budget calculation |
Acceptance Criteria
-
estimateTokens()function with reasonable accuracy (within ~20% of actual for English text) -
ContextBudgetstruct that uses model's actual context window size - Response space reserved to prevent context window overflow
-
pruneForBudget()uses token estimates instead of character counts - Fallback to
MAX_CONTEXT_CHARSwhen model context size is unknown - Unit tests for token estimation with various content types (English, code, JSON, unicode)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
component: frontendReact/TypeScript UIReact/TypeScript UIllm: inferenceModel loading/inferenceModel loading/inferencepriority: lowNice to haveNice to havesize: m4-8 hours (half to full day)4-8 hours (half to full day)type: featureNew functionality or enhancementNew functionality or enhancement