|
| 1 | +# OpenCode Model Fallback Plugin — Planning Prompt |
| 2 | + |
| 3 | +I need you to help me design and build an OpenCode plugin that provides **automatic model fallback on rate limits, quota exhaustion, and provider errors** — with per-agent ordered fallback chains. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Context & Problem |
| 8 | + |
| 9 | +OpenCode (https://opencode.ai) is an open-source AI coding agent (terminal + desktop). It supports multiple providers (OpenAI, Anthropic, Google, Groq, OpenRouter, etc.) and has a plugin system built on event hooks. |
| 10 | + |
| 11 | +**The core problem:** When I'm deep in a coding session and my primary model (e.g., `openai/gpt-5.3-codex`) hits a rate limit or quota ceiling, OpenCode enters a retry loop — sometimes waiting hours. There's no automatic failover to a different model. I have to manually switch models or wait it out. This kills flow. |
| 12 | + |
| 13 | +**What exists today (and why it's not enough):** |
| 14 | + |
| 15 | +1. **`opencode-rate-limit-fallback`** (github.com/liamvinberg/opencode-rate-limit-fallback) — A simple plugin that listens for `session.status` events, detects rate-limit patterns in retry messages, aborts the retry, reverts the session to the last user message via undo, and replays with a single fallback model. Limitation: only supports ONE fallback model, no ordered chain, no per-agent config. |
| 16 | + |
| 17 | +2. **`opencode-rate-limit` npm package** (v1.4.0) — More robust: priority-based model pool, circuit breakers, jitter, health tracking, a `/rate-limit-status` command. But it's a heavyweight solution with its own config system that doesn't align with OpenCode's native agent config patterns. |
| 18 | + |
| 19 | +3. **Native support (Issue #7602)** — There's an open feature request by `thdxr` (core maintainer) proposing first-class fallback with config like `agents.build.model.fallback: ["claude-sonnet", "gpt-4o-mini"]`. Still in discussion status — not implemented. |
| 20 | + |
| 21 | +**My goal:** Build a plugin that fills this gap NOW, but designs its config schema to align with the proposed native API (issue #7602) so migration is trivial when/if native support ships. |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## My Setup |
| 26 | + |
| 27 | +- **Agent config location:** `~/.config/opencode/agents/` (markdown agent files) and `~/.config/opencode/opencode.json` (JSON agent config) |
| 28 | +- **I define agents in both formats** — some as `.md` files in the agents directory, some in the JSON config |
| 29 | +- **Current agents include:** build, plan, and several custom subagents (coder, reviewer, etc.) |
| 30 | +- **Providers I use:** OpenAI, Anthropic, Google — I want fallback chains that can cross provider boundaries |
| 31 | + |
| 32 | +--- |
| 33 | + |
| 34 | +## Desired Plugin Behavior |
| 35 | + |
| 36 | +### Config Schema |
| 37 | + |
| 38 | +The plugin should read fallback configuration from a dedicated config file AND/OR respect inline agent config. Priority order for config resolution: |
| 39 | + |
| 40 | +**Option A — Dedicated plugin config file** (e.g., `~/.config/opencode/model-fallback.json`): |
| 41 | + |
| 42 | +```json |
| 43 | +{ |
| 44 | + "enabled": true, |
| 45 | + "defaults": { |
| 46 | + "fallbackOn": ["rate_limit", "quota_exceeded", "5xx", "timeout", "overloaded"], |
| 47 | + "cooldownMs": 300000, |
| 48 | + "retryOriginalAfterMs": 900000, |
| 49 | + "maxFallbackDepth": 3 |
| 50 | + }, |
| 51 | + "agents": { |
| 52 | + "build": { |
| 53 | + "fallbackModels": [ |
| 54 | + "anthropic/claude-sonnet-4-20250514", |
| 55 | + "google/gemini-3-pro", |
| 56 | + "openai/gpt-4o" |
| 57 | + ] |
| 58 | + }, |
| 59 | + "coder": { |
| 60 | + "fallbackModels": [ |
| 61 | + "anthropic/claude-sonnet-4-20250514", |
| 62 | + "deepseek/deepseek-r1" |
| 63 | + ] |
| 64 | + }, |
| 65 | + "plan": { |
| 66 | + "fallbackModels": [ |
| 67 | + "anthropic/claude-haiku-4-20250514", |
| 68 | + "google/gemini-3-flash" |
| 69 | + ] |
| 70 | + }, |
| 71 | + "*": { |
| 72 | + "fallbackModels": [ |
| 73 | + "anthropic/claude-sonnet-4-20250514", |
| 74 | + "google/gemini-3-flash" |
| 75 | + ] |
| 76 | + } |
| 77 | + }, |
| 78 | + "patterns": [ |
| 79 | + "rate limit", |
| 80 | + "usage limit", |
| 81 | + "too many requests", |
| 82 | + "quota exceeded", |
| 83 | + "overloaded", |
| 84 | + "capacity exceeded", |
| 85 | + "credits exhausted", |
| 86 | + "billing limit", |
| 87 | + "429" |
| 88 | + ], |
| 89 | + "logging": true, |
| 90 | + "logPath": "~/.local/share/opencode/logs/model-fallback.log" |
| 91 | +} |
| 92 | +``` |
| 93 | + |
| 94 | +Key design points: |
| 95 | +- `"*"` wildcard agent provides a default fallback chain for any agent not explicitly configured |
| 96 | +- `fallbackModels` is an ordered array — tried sequentially, first healthy model wins |
| 97 | +- `fallbackOn` defines which error categories trigger fallback |
| 98 | +- `cooldownMs` — how long a model stays marked "unhealthy" after a rate limit hit |
| 99 | +- `retryOriginalAfterMs` — when to attempt returning to the original/preferred model |
| 100 | +- `maxFallbackDepth` — safety valve to prevent infinite fallback cascading |
| 101 | + |
| 102 | +### Core Logic Flow |
| 103 | + |
| 104 | +1. **Detection:** Listen to `session.status` events. Match retry messages against configured patterns (case-insensitive). Also listen for `session.error` events for 5xx / timeout scenarios. |
| 105 | + |
| 106 | +2. **Model Health State Machine:** |
| 107 | + - Each model has a state: `healthy` | `rate_limited` | `cooldown` |
| 108 | + - On rate limit detection → mark current model as `rate_limited`, record timestamp |
| 109 | + - After `cooldownMs` → transition to `cooldown` (eligible for retry but not preferred) |
| 110 | + - After `retryOriginalAfterMs` → transition back to `healthy` |
| 111 | + - State is tracked in-memory (resets on plugin restart — this is fine) |
| 112 | + |
| 113 | +3. **Fallback Resolution:** |
| 114 | + - Identify which agent is active in the current session |
| 115 | + - Look up that agent's `fallbackModels` array (or fall back to `"*"` wildcard) |
| 116 | + - Walk the array in order, skip any model currently in `rate_limited` state |
| 117 | + - Models in `cooldown` state are eligible but deprioritized (only used if all preferred models are rate-limited) |
| 118 | + - If ALL models (including original) are rate-limited, log a warning and let OpenCode's native retry proceed |
| 119 | + |
| 120 | +4. **Replay Mechanism:** |
| 121 | + - Abort the current retry loop |
| 122 | + - Retrieve the last user message from the session |
| 123 | + - Revert the session to before that message (undo the failed attempt) |
| 124 | + - Re-send the original user message with the selected fallback model |
| 125 | + - Log the transition: `[FALLBACK] build: openai/gpt-5.3-codex → anthropic/claude-sonnet-4 (rate_limit)` |
| 126 | + |
| 127 | +5. **Recovery:** |
| 128 | + - Periodically check if original model's cooldown has expired |
| 129 | + - On next new user message (not replay), prefer the original model if it's back to `healthy` |
| 130 | + - Don't mid-conversation switch back — only on new user-initiated messages |
| 131 | + |
| 132 | +### Plugin Events to Hook |
| 133 | + |
| 134 | +Based on OpenCode's plugin system: |
| 135 | +- `session.status` — primary detection point for rate limit retry messages |
| 136 | +- `session.error` — catch 5xx, timeout, provider down errors |
| 137 | +- `session.idle` — good place to check/log model health state |
| 138 | +- `session.created` — initialize per-session state tracking |
| 139 | + |
| 140 | +### Commands |
| 141 | + |
| 142 | +Register a custom slash command (if OpenCode supports it via plugins — investigate): |
| 143 | +- `/fallback-status` — show current model health states, which agents are on fallback, cooldown timers remaining |
| 144 | + |
| 145 | +### Logging |
| 146 | + |
| 147 | +Structured log entries: |
| 148 | +``` |
| 149 | +[2026-03-17T14:23:01Z] [DETECT] session=abc123 agent=build model=openai/gpt-5.3-codex trigger="rate limit" message="Rate limited. Quick retry in 1s..." |
| 150 | +[2026-03-17T14:23:01Z] [FALLBACK] session=abc123 agent=build from=openai/gpt-5.3-codex to=anthropic/claude-sonnet-4-20250514 reason=rate_limit |
| 151 | +[2026-03-17T14:23:45Z] [HEALTH] openai/gpt-5.3-codex: rate_limited (cooldown in 4m15s) | anthropic/claude-sonnet-4: healthy | google/gemini-3-pro: healthy |
| 152 | +[2026-03-17T14:28:01Z] [RECOVER] openai/gpt-5.3-codex: rate_limited → cooldown |
| 153 | +``` |
| 154 | + |
| 155 | +--- |
| 156 | + |
| 157 | +## Technical Constraints |
| 158 | + |
| 159 | +- Plugin must be TypeScript, using `@opencode-ai/plugin` types |
| 160 | +- Must work as both a local plugin (`.opencode/plugins/`) and publishable to npm |
| 161 | +- Should gracefully degrade — if config is missing or malformed, log a warning and do nothing (don't crash OpenCode) |
| 162 | +- The undo/replay approach is acknowledged as fragile (the oh-my-opencode maintainer called it "a house of cards") — we need robust error handling around it |
| 163 | +- Config file locations to check (in order): `.opencode/model-fallback.json`, `~/.config/opencode/model-fallback.json` |
| 164 | + |
| 165 | +--- |
| 166 | + |
| 167 | +## Deliverables (Planning Phase) |
| 168 | + |
| 169 | +Right now I need you to: |
| 170 | + |
| 171 | +1. **Analyze feasibility** — Read through OpenCode's plugin SDK, event system, and session management. Identify any gaps or limitations that would block this design. |
| 172 | + |
| 173 | +2. **Propose the architecture** — File structure, module breakdown, state management approach, config loading strategy. |
| 174 | + |
| 175 | +3. **Identify risks** — What parts of the undo/replay mechanism are most fragile? What edge cases could cause data loss or conversation corruption? How do we handle concurrent subagent sessions? |
| 176 | + |
| 177 | +4. **Design the config schema** — Finalize it. Consider backward compatibility with the simpler `opencode-rate-limit-fallback` config format. |
| 178 | + |
| 179 | +5. **Draft a phased implementation plan:** |
| 180 | + - Phase 1: Config loading + pattern detection + logging (no replay yet — just detect and log) |
| 181 | + - Phase 2: Single-model fallback with undo/replay |
| 182 | + - Phase 3: Ordered fallback chains with health state machine |
| 183 | + - Phase 4: Recovery logic + `/fallback-status` command |
| 184 | + - Phase 5: npm packaging + documentation |
| 185 | + |
| 186 | +6. **Output the plan as a structured document** I can reference throughout implementation. |
| 187 | + |
| 188 | +Do NOT write implementation code yet. This is planning mode. Think critically, poke holes, and give me a plan I can trust. |
0 commit comments