Skip to content

feat: eager fallback to backup model on rate-limit errors#1413

Open
usvimal wants to merge 1 commit intoNousResearch:mainfrom
usvimal:fix/eager-429-fallback
Open

feat: eager fallback to backup model on rate-limit errors#1413
usvimal wants to merge 1 commit intoNousResearch:mainfrom
usvimal:fix/eager-429-fallback

Conversation

@usvimal
Copy link

@usvimal usvimal commented Mar 15, 2026

Summary

When a fallback model is configured via fallback_model in config.yaml, the agent now switches to it immediately upon detecting rate-limit conditions instead of exhausting all retries with exponential backoff.

Problem: When the primary provider is rate-limited (429, quota exhaustion, billing cycle limit), the current retry loop burns through 3 attempts with extended backoff (5s → 10s → 20s) before trying the fallback. The primary provider is unlikely to recover within this window, so the delay is wasted.

Fix: Two eager-fallback checks are added:

  1. Invalid/empty API responses (common rate-limit symptom where the provider returns a response with no choices or missing content) — fallback is attempted immediately after the first failure.

  2. Exception-based rate limits (HTTP 429, quota exhaustion, usage limit, billing errors) — a new is_rate_limited check runs before the existing 413/4xx handlers and switches immediately.

Both paths are guarded by _fallback_activated to preserve the existing one-shot semantics.

Test plan

  • Configure a primary model with a rate-limited/exhausted API key and a working fallback model
  • Send a message — verify the agent switches to fallback on the first failure instead of retrying 3 times
  • Verify non-rate-limit errors (auth failures, invalid model, etc.) still follow existing retry/abort behavior
  • Verify fallback only activates once per session (one-shot guard)

When a fallback model is configured, switch to it immediately upon
detecting rate-limit conditions instead of exhausting retries with
exponential backoff. The primary provider is unlikely to recover
within the retry window when rate-limited or quota-exhausted.

Two paths are covered:

1. Invalid/empty API responses (common rate-limit symptom where the
   provider returns a response with no choices or missing content) --
   fallback is attempted right after incrementing retry_count.

2. Exception-based rate limits (HTTP 429, quota exhaustion, usage
   limit errors) -- a new is_rate_limited check runs before the
   existing 413/4xx handlers and switches immediately.

Both paths are guarded by _fallback_activated to ensure one-shot
semantics (no repeated switching).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant