feat: eager fallback to backup model on rate-limit errors#1413
Open
usvimal wants to merge 1 commit intoNousResearch:mainfrom
Open
feat: eager fallback to backup model on rate-limit errors#1413usvimal wants to merge 1 commit intoNousResearch:mainfrom
usvimal wants to merge 1 commit intoNousResearch:mainfrom
Conversation
When a fallback model is configured, switch to it immediately upon detecting rate-limit conditions instead of exhausting retries with exponential backoff. The primary provider is unlikely to recover within the retry window when rate-limited or quota-exhausted. Two paths are covered: 1. Invalid/empty API responses (common rate-limit symptom where the provider returns a response with no choices or missing content) -- fallback is attempted right after incrementing retry_count. 2. Exception-based rate limits (HTTP 429, quota exhaustion, usage limit errors) -- a new is_rate_limited check runs before the existing 413/4xx handlers and switches immediately. Both paths are guarded by _fallback_activated to ensure one-shot semantics (no repeated switching).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a fallback model is configured via
fallback_modelin config.yaml, the agent now switches to it immediately upon detecting rate-limit conditions instead of exhausting all retries with exponential backoff.Problem: When the primary provider is rate-limited (429, quota exhaustion, billing cycle limit), the current retry loop burns through 3 attempts with extended backoff (5s → 10s → 20s) before trying the fallback. The primary provider is unlikely to recover within this window, so the delay is wasted.
Fix: Two eager-fallback checks are added:
Invalid/empty API responses (common rate-limit symptom where the provider returns a response with no choices or missing content) — fallback is attempted immediately after the first failure.
Exception-based rate limits (HTTP 429, quota exhaustion, usage limit, billing errors) — a new
is_rate_limitedcheck runs before the existing 413/4xx handlers and switches immediately.Both paths are guarded by
_fallback_activatedto preserve the existing one-shot semantics.Test plan