Skip to content

Feature Request: Automatic Model Routing by Prompt Complexity #2412

@YosefHayim

Description

@YosefHayim

Problem

Every user prompt in the main conversation thread runs on the same model (e.g., claude-opus-4-6/max) regardless of complexity. A simple "fix this typo" burns the same tokens as "architect a distributed system." This wastes significant cost on trivial interactions.

Current State

OhMyOpenAgent already has the infrastructure for model routing:

  • Category → model mapping exists (quick→Haiku, deep→Opus, unspecified-low→Sonnet) but is only used for subagent delegation via delegate_task, not the main thread.
  • ultrawork keyword allows user-initiated escalation but requires manual intervention.
  • Model resolution pipeline (model-resolution-pipeline.ts) has a clear priority chain that could accommodate a new provenance: "auto-classified" step.
  • small_model exists in OpenCode core but is only wired to title generation.

Proposed Solution

Add an optional complexity classifier that runs before the main model, categorizes the prompt, and selects the model via the existing category → model mapping.

Architecture

User prompt
    │
    ▼
┌──────────────────────┐
│  Complexity Classifier│  ← Lightweight (regex/heuristic OR small_model call)
│  Returns: category    │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────┐
│  Model Resolution    │  ← Existing pipeline, new provenance: "auto-classified"
│  Pipeline            │     Sits between "override" and "category-default"
└──────────┬───────────┘
           │
           ▼
    Selected model

Classifier Options (in order of preference)

Option A: Heuristic (zero-cost, instant)

  • Prompt < 50 tokens + no code blocks + no "architect/design/debug/refactor" keywords → quick (Haiku)
  • Prompt mentions files but is simple ("rename X to Y", "add import") → unspecified-low (Sonnet)
  • Everything else → configured default (Opus)
  • User can always override with ultrawork keyword

Option B: small_model classifier (low-cost, ~100 tokens)

  • Send the prompt to small_model with a fixed system prompt: "Classify this developer request as: quick, low, high, ultrabrain. Return one word."
  • Map result to category → model
  • Adds ~0.1s latency + ~100 input tokens per message

Option C: Hybrid

  • Heuristic first (catches obvious simple/complex cases)
  • Falls back to small_model for ambiguous prompts

Config Schema Addition

{
  "auto_routing": {
    "enabled": false,
    "strategy": "heuristic",
    "default_category": "unspecified-high",
    "escalation_keywords": ["ultrawork", "ulw", "think hard", "deep dive"],
    "simple_keywords": ["fix typo", "rename", "add import", "what does", "explain"],
    "token_threshold_simple": 50,
    "token_threshold_complex": 500
  }
}

Integration Point

In model-resolution-pipeline.ts, insert between step 1 (UI-selected) and step 3 (category-default):

1. UI-selected model → provenance: "override"
2. Config model field → provenance: "override"
2.5 NEW: Auto-classifier → provenance: "auto-classified"  ← INSERT HERE
3. Category default → provenance: "category-default"
4. User fallback_models → provenance: "provider-fallback"
5. Hardcoded fallback chain → provenance: "provider-fallback"
6. System default → provenance: "system-default"

User Override

  • ultrawork/ulw keyword already exists and should take priority over auto-classification
  • User can also set "auto_routing": { "enabled": false } to disable entirely
  • Per-agent override: some agents (oracle, prometheus) should always use their configured model regardless of classification

Expected Impact

Assuming typical usage distribution:

  • ~40% of prompts are trivial (questions, typos, simple edits) → routed to Haiku (20x cheaper than Opus)
  • ~30% are moderate (multi-file changes, feature work) → routed to Sonnet (5x cheaper)
  • ~30% are complex (architecture, debugging, planning) → stays on Opus

Estimated 50-60% token cost reduction with no perceived quality loss on simple tasks.

References

  • src/shared/model-requirements.ts — existing category → model mapping
  • src/shared/model-resolution-pipeline.ts — resolution priority chain
  • src/plugin/ultrawork-model-override.ts — keyword-triggered escalation pattern
  • src/config/schema/runtime-fallback.ts — existing runtime model switching

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions