-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Problem
Every user prompt in the main conversation thread runs on the same model (e.g., claude-opus-4-6/max) regardless of complexity. A simple "fix this typo" burns the same tokens as "architect a distributed system." This wastes significant cost on trivial interactions.
Current State
OhMyOpenAgent already has the infrastructure for model routing:
- Category → model mapping exists (
quick→Haiku,deep→Opus,unspecified-low→Sonnet) but is only used for subagent delegation viadelegate_task, not the main thread. ultraworkkeyword allows user-initiated escalation but requires manual intervention.- Model resolution pipeline (
model-resolution-pipeline.ts) has a clear priority chain that could accommodate a newprovenance: "auto-classified"step. small_modelexists in OpenCode core but is only wired to title generation.
Proposed Solution
Add an optional complexity classifier that runs before the main model, categorizes the prompt, and selects the model via the existing category → model mapping.
Architecture
User prompt
│
▼
┌──────────────────────┐
│ Complexity Classifier│ ← Lightweight (regex/heuristic OR small_model call)
│ Returns: category │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Model Resolution │ ← Existing pipeline, new provenance: "auto-classified"
│ Pipeline │ Sits between "override" and "category-default"
└──────────┬───────────┘
│
▼
Selected model
Classifier Options (in order of preference)
Option A: Heuristic (zero-cost, instant)
- Prompt < 50 tokens + no code blocks + no "architect/design/debug/refactor" keywords →
quick(Haiku) - Prompt mentions files but is simple ("rename X to Y", "add import") →
unspecified-low(Sonnet) - Everything else → configured default (Opus)
- User can always override with
ultraworkkeyword
Option B: small_model classifier (low-cost, ~100 tokens)
- Send the prompt to
small_modelwith a fixed system prompt: "Classify this developer request as: quick, low, high, ultrabrain. Return one word." - Map result to category → model
- Adds ~0.1s latency + ~100 input tokens per message
Option C: Hybrid
- Heuristic first (catches obvious simple/complex cases)
- Falls back to
small_modelfor ambiguous prompts
Config Schema Addition
{
"auto_routing": {
"enabled": false,
"strategy": "heuristic",
"default_category": "unspecified-high",
"escalation_keywords": ["ultrawork", "ulw", "think hard", "deep dive"],
"simple_keywords": ["fix typo", "rename", "add import", "what does", "explain"],
"token_threshold_simple": 50,
"token_threshold_complex": 500
}
}Integration Point
In model-resolution-pipeline.ts, insert between step 1 (UI-selected) and step 3 (category-default):
1. UI-selected model → provenance: "override"
2. Config model field → provenance: "override"
2.5 NEW: Auto-classifier → provenance: "auto-classified" ← INSERT HERE
3. Category default → provenance: "category-default"
4. User fallback_models → provenance: "provider-fallback"
5. Hardcoded fallback chain → provenance: "provider-fallback"
6. System default → provenance: "system-default"
User Override
ultrawork/ulwkeyword already exists and should take priority over auto-classification- User can also set
"auto_routing": { "enabled": false }to disable entirely - Per-agent override: some agents (oracle, prometheus) should always use their configured model regardless of classification
Expected Impact
Assuming typical usage distribution:
- ~40% of prompts are trivial (questions, typos, simple edits) → routed to Haiku (20x cheaper than Opus)
- ~30% are moderate (multi-file changes, feature work) → routed to Sonnet (5x cheaper)
- ~30% are complex (architecture, debugging, planning) → stays on Opus
Estimated 50-60% token cost reduction with no perceived quality loss on simple tasks.
References
src/shared/model-requirements.ts— existing category → model mappingsrc/shared/model-resolution-pipeline.ts— resolution priority chainsrc/plugin/ultrawork-model-override.ts— keyword-triggered escalation patternsrc/config/schema/runtime-fallback.ts— existing runtime model switching