sub(#249) Phase 1: Token estimation utility and context budget calculation

## Parent: #249 — Token-based context budgeting

## Goal

Replace the character-based context budgeting (`MAX_CONTEXT_CHARS = 180_000`) with token-aware budgeting that uses the model's actual context window size.

## Background

Currently `agentLoop.ts` uses `pruneForBudget()` with a fixed character limit:
```typescript
const MAX_CONTEXT_CHARS = 180_000; // ~45K tokens at 4 chars/token
```

This is problematic because:
- Different models have different context windows (4K, 8K, 32K, 128K)
- 4 chars/token is a rough heuristic; actual ratio varies by language and content
- No way to reserve space for the model's response
- The context window size is available from the model metadata (GGUF) but not used

## Implementation

### 1. Create `src/utils/tokenEstimator.ts`

```typescript
/**
 * Approximate token count using multiple heuristics.
 * More accurate than simple char/4 for mixed content.
 */
export function estimateTokens(text: string): number {
  // Heuristic: count word boundaries + special tokens
  const words = text.split(/\s+/).length;
  const specialTokens = (text.match(/[{}[\](),;:'"<>]/g) || []).length;
  
  // Rough GPT-style tokenization: ~1.3 tokens per word + special chars
  return Math.ceil(words * 1.3 + specialTokens * 0.5);
}

/**
 * Estimate tokens for a message array using per-message overhead.
 */
export function estimateMessagesTokens(messages: any[]): number {
  const PER_MESSAGE_OVERHEAD = 4; // role, name, etc.
  return messages.reduce((sum, msg) => {
    const content = typeof msg.content === 'string' ? msg.content : JSON.stringify(msg.content);
    return sum + estimateTokens(content) + PER_MESSAGE_OVERHEAD;
  }, 0);
}
```

### 2. Get model context window size

Pull the actual context size from the server status API (already available from GGUF metadata):

```typescript
interface ContextBudget {
  totalTokens: number;     // Model's context window (e.g., 8192)
  reserveForResponse: number; // Tokens reserved for model output (e.g., 2048)
  reserveForTools: number;    // Space for tool definitions (estimated)
  availableForHistory: number; // totalTokens - reserves
}

export function createContextBudget(modelContextSize: number): ContextBudget {
  const reserveForResponse = Math.min(modelContextSize * 0.25, 4096);
  const reserveForTools = 500; // Estimated based on tool count
  return {
    totalTokens: modelContextSize,
    reserveForResponse,
    reserveForTools,
    availableForHistory: modelContextSize - reserveForResponse - reserveForTools,
  };
}
```

### 3. Update `pruneForBudget()` in `agentLoop.ts`

Replace character-based pruning with token-aware pruning:

```typescript
export function pruneForBudget(
  messages: any[],
  budget: ContextBudget,
): any[] {
  let totalTokens = estimateMessagesTokens(messages);
  
  if (totalTokens <= budget.availableForHistory) return messages;
  
  // Drop old tool result messages first (keep system + last N turns)
  // ... same logic but using token estimates instead of char counts
}
```

## Files to Create/Modify

| File | Action |
|------|--------|
| `src/utils/tokenEstimator.ts` | **Create** — `estimateTokens()`, `estimateMessagesTokens()` |
| `src/hooks/useGglibRuntime/agentLoop.ts` | Replace `MAX_CONTEXT_CHARS` with token-based `ContextBudget` |
| `src/hooks/useGglibRuntime/agentLoop.ts` | Update `pruneForBudget()` to use token estimates |
| `src/hooks/useGglibRuntime/runAgenticLoop.ts` | Pass model context size to budget calculation |

## Acceptance Criteria

- [ ] `estimateTokens()` function with reasonable accuracy (within ~20% of actual for English text)
- [ ] `ContextBudget` struct that uses model's actual context window size
- [ ] Response space reserved to prevent context window overflow
- [ ] `pruneForBudget()` uses token estimates instead of character counts
- [ ] Fallback to `MAX_CONTEXT_CHARS` when model context size is unknown
- [ ] Unit tests for token estimation with various content types (English, code, JSON, unicode)

File	Action
`src/utils/tokenEstimator.ts`	Create — `estimateTokens()`, `estimateMessagesTokens()`
`src/hooks/useGglibRuntime/agentLoop.ts`	Replace `MAX_CONTEXT_CHARS` with token-based `ContextBudget`
`src/hooks/useGglibRuntime/agentLoop.ts`	Update `pruneForBudget()` to use token estimates
`src/hooks/useGglibRuntime/runAgenticLoop.ts`	Pass model context size to budget calculation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sub(#249) Phase 1: Token estimation utility and context budget calculation #271

Parent: #249 — Token-based context budgeting

Goal

Background

Implementation

1. Create `src/utils/tokenEstimator.ts`

2. Get model context window size

3. Update `pruneForBudget()` in `agentLoop.ts`

Files to Create/Modify

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

sub(#249) Phase 1: Token estimation utility and context budget calculation #271

Description

Parent: #249 — Token-based context budgeting

Goal

Background

Implementation

1. Create src/utils/tokenEstimator.ts

2. Get model context window size

3. Update pruneForBudget() in agentLoop.ts

Files to Create/Modify

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Create `src/utils/tokenEstimator.ts`

3. Update `pruneForBudget()` in `agentLoop.ts`