Skip to content

Commit 680389c

Browse files
committed
docs: add planning prompt
1 parent 80d7385 commit 680389c

File tree

1 file changed

+188
-0
lines changed

1 file changed

+188
-0
lines changed
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# OpenCode Model Fallback Plugin — Planning Prompt
2+
3+
I need you to help me design and build an OpenCode plugin that provides **automatic model fallback on rate limits, quota exhaustion, and provider errors** — with per-agent ordered fallback chains.
4+
5+
---
6+
7+
## Context & Problem
8+
9+
OpenCode (https://opencode.ai) is an open-source AI coding agent (terminal + desktop). It supports multiple providers (OpenAI, Anthropic, Google, Groq, OpenRouter, etc.) and has a plugin system built on event hooks.
10+
11+
**The core problem:** When I'm deep in a coding session and my primary model (e.g., `openai/gpt-5.3-codex`) hits a rate limit or quota ceiling, OpenCode enters a retry loop — sometimes waiting hours. There's no automatic failover to a different model. I have to manually switch models or wait it out. This kills flow.
12+
13+
**What exists today (and why it's not enough):**
14+
15+
1. **`opencode-rate-limit-fallback`** (github.com/liamvinberg/opencode-rate-limit-fallback) — A simple plugin that listens for `session.status` events, detects rate-limit patterns in retry messages, aborts the retry, reverts the session to the last user message via undo, and replays with a single fallback model. Limitation: only supports ONE fallback model, no ordered chain, no per-agent config.
16+
17+
2. **`opencode-rate-limit` npm package** (v1.4.0) — More robust: priority-based model pool, circuit breakers, jitter, health tracking, a `/rate-limit-status` command. But it's a heavyweight solution with its own config system that doesn't align with OpenCode's native agent config patterns.
18+
19+
3. **Native support (Issue #7602)** — There's an open feature request by `thdxr` (core maintainer) proposing first-class fallback with config like `agents.build.model.fallback: ["claude-sonnet", "gpt-4o-mini"]`. Still in discussion status — not implemented.
20+
21+
**My goal:** Build a plugin that fills this gap NOW, but designs its config schema to align with the proposed native API (issue #7602) so migration is trivial when/if native support ships.
22+
23+
---
24+
25+
## My Setup
26+
27+
- **Agent config location:** `~/.config/opencode/agents/` (markdown agent files) and `~/.config/opencode/opencode.json` (JSON agent config)
28+
- **I define agents in both formats** — some as `.md` files in the agents directory, some in the JSON config
29+
- **Current agents include:** build, plan, and several custom subagents (coder, reviewer, etc.)
30+
- **Providers I use:** OpenAI, Anthropic, Google — I want fallback chains that can cross provider boundaries
31+
32+
---
33+
34+
## Desired Plugin Behavior
35+
36+
### Config Schema
37+
38+
The plugin should read fallback configuration from a dedicated config file AND/OR respect inline agent config. Priority order for config resolution:
39+
40+
**Option A — Dedicated plugin config file** (e.g., `~/.config/opencode/model-fallback.json`):
41+
42+
```json
43+
{
44+
"enabled": true,
45+
"defaults": {
46+
"fallbackOn": ["rate_limit", "quota_exceeded", "5xx", "timeout", "overloaded"],
47+
"cooldownMs": 300000,
48+
"retryOriginalAfterMs": 900000,
49+
"maxFallbackDepth": 3
50+
},
51+
"agents": {
52+
"build": {
53+
"fallbackModels": [
54+
"anthropic/claude-sonnet-4-20250514",
55+
"google/gemini-3-pro",
56+
"openai/gpt-4o"
57+
]
58+
},
59+
"coder": {
60+
"fallbackModels": [
61+
"anthropic/claude-sonnet-4-20250514",
62+
"deepseek/deepseek-r1"
63+
]
64+
},
65+
"plan": {
66+
"fallbackModels": [
67+
"anthropic/claude-haiku-4-20250514",
68+
"google/gemini-3-flash"
69+
]
70+
},
71+
"*": {
72+
"fallbackModels": [
73+
"anthropic/claude-sonnet-4-20250514",
74+
"google/gemini-3-flash"
75+
]
76+
}
77+
},
78+
"patterns": [
79+
"rate limit",
80+
"usage limit",
81+
"too many requests",
82+
"quota exceeded",
83+
"overloaded",
84+
"capacity exceeded",
85+
"credits exhausted",
86+
"billing limit",
87+
"429"
88+
],
89+
"logging": true,
90+
"logPath": "~/.local/share/opencode/logs/model-fallback.log"
91+
}
92+
```
93+
94+
Key design points:
95+
- `"*"` wildcard agent provides a default fallback chain for any agent not explicitly configured
96+
- `fallbackModels` is an ordered array — tried sequentially, first healthy model wins
97+
- `fallbackOn` defines which error categories trigger fallback
98+
- `cooldownMs` — how long a model stays marked "unhealthy" after a rate limit hit
99+
- `retryOriginalAfterMs` — when to attempt returning to the original/preferred model
100+
- `maxFallbackDepth` — safety valve to prevent infinite fallback cascading
101+
102+
### Core Logic Flow
103+
104+
1. **Detection:** Listen to `session.status` events. Match retry messages against configured patterns (case-insensitive). Also listen for `session.error` events for 5xx / timeout scenarios.
105+
106+
2. **Model Health State Machine:**
107+
- Each model has a state: `healthy` | `rate_limited` | `cooldown`
108+
- On rate limit detection → mark current model as `rate_limited`, record timestamp
109+
- After `cooldownMs` → transition to `cooldown` (eligible for retry but not preferred)
110+
- After `retryOriginalAfterMs` → transition back to `healthy`
111+
- State is tracked in-memory (resets on plugin restart — this is fine)
112+
113+
3. **Fallback Resolution:**
114+
- Identify which agent is active in the current session
115+
- Look up that agent's `fallbackModels` array (or fall back to `"*"` wildcard)
116+
- Walk the array in order, skip any model currently in `rate_limited` state
117+
- Models in `cooldown` state are eligible but deprioritized (only used if all preferred models are rate-limited)
118+
- If ALL models (including original) are rate-limited, log a warning and let OpenCode's native retry proceed
119+
120+
4. **Replay Mechanism:**
121+
- Abort the current retry loop
122+
- Retrieve the last user message from the session
123+
- Revert the session to before that message (undo the failed attempt)
124+
- Re-send the original user message with the selected fallback model
125+
- Log the transition: `[FALLBACK] build: openai/gpt-5.3-codex → anthropic/claude-sonnet-4 (rate_limit)`
126+
127+
5. **Recovery:**
128+
- Periodically check if original model's cooldown has expired
129+
- On next new user message (not replay), prefer the original model if it's back to `healthy`
130+
- Don't mid-conversation switch back — only on new user-initiated messages
131+
132+
### Plugin Events to Hook
133+
134+
Based on OpenCode's plugin system:
135+
- `session.status` — primary detection point for rate limit retry messages
136+
- `session.error` — catch 5xx, timeout, provider down errors
137+
- `session.idle` — good place to check/log model health state
138+
- `session.created` — initialize per-session state tracking
139+
140+
### Commands
141+
142+
Register a custom slash command (if OpenCode supports it via plugins — investigate):
143+
- `/fallback-status` — show current model health states, which agents are on fallback, cooldown timers remaining
144+
145+
### Logging
146+
147+
Structured log entries:
148+
```
149+
[2026-03-17T14:23:01Z] [DETECT] session=abc123 agent=build model=openai/gpt-5.3-codex trigger="rate limit" message="Rate limited. Quick retry in 1s..."
150+
[2026-03-17T14:23:01Z] [FALLBACK] session=abc123 agent=build from=openai/gpt-5.3-codex to=anthropic/claude-sonnet-4-20250514 reason=rate_limit
151+
[2026-03-17T14:23:45Z] [HEALTH] openai/gpt-5.3-codex: rate_limited (cooldown in 4m15s) | anthropic/claude-sonnet-4: healthy | google/gemini-3-pro: healthy
152+
[2026-03-17T14:28:01Z] [RECOVER] openai/gpt-5.3-codex: rate_limited → cooldown
153+
```
154+
155+
---
156+
157+
## Technical Constraints
158+
159+
- Plugin must be TypeScript, using `@opencode-ai/plugin` types
160+
- Must work as both a local plugin (`.opencode/plugins/`) and publishable to npm
161+
- Should gracefully degrade — if config is missing or malformed, log a warning and do nothing (don't crash OpenCode)
162+
- The undo/replay approach is acknowledged as fragile (the oh-my-opencode maintainer called it "a house of cards") — we need robust error handling around it
163+
- Config file locations to check (in order): `.opencode/model-fallback.json`, `~/.config/opencode/model-fallback.json`
164+
165+
---
166+
167+
## Deliverables (Planning Phase)
168+
169+
Right now I need you to:
170+
171+
1. **Analyze feasibility** — Read through OpenCode's plugin SDK, event system, and session management. Identify any gaps or limitations that would block this design.
172+
173+
2. **Propose the architecture** — File structure, module breakdown, state management approach, config loading strategy.
174+
175+
3. **Identify risks** — What parts of the undo/replay mechanism are most fragile? What edge cases could cause data loss or conversation corruption? How do we handle concurrent subagent sessions?
176+
177+
4. **Design the config schema** — Finalize it. Consider backward compatibility with the simpler `opencode-rate-limit-fallback` config format.
178+
179+
5. **Draft a phased implementation plan:**
180+
- Phase 1: Config loading + pattern detection + logging (no replay yet — just detect and log)
181+
- Phase 2: Single-model fallback with undo/replay
182+
- Phase 3: Ordered fallback chains with health state machine
183+
- Phase 4: Recovery logic + `/fallback-status` command
184+
- Phase 5: npm packaging + documentation
185+
186+
6. **Output the plan as a structured document** I can reference throughout implementation.
187+
188+
Do NOT write implementation code yet. This is planning mode. Think critically, poke holes, and give me a plan I can trust.

0 commit comments

Comments
 (0)