feat: eager fallback to backup model on rate-limit errors by usvimal · Pull Request #1413 · NousResearch/hermes-agent

usvimal · 2026-03-15T08:46:29Z

Summary

When a fallback model is configured via fallback_model in config.yaml, the agent now switches to it immediately upon detecting rate-limit conditions instead of exhausting all retries with exponential backoff.

Problem: When the primary provider is rate-limited (429, quota exhaustion, billing cycle limit), the current retry loop burns through 3 attempts with extended backoff (5s → 10s → 20s) before trying the fallback. The primary provider is unlikely to recover within this window, so the delay is wasted.

Fix: Two eager-fallback checks are added:

Invalid/empty API responses (common rate-limit symptom where the provider returns a response with no choices or missing content) — fallback is attempted immediately after the first failure.
Exception-based rate limits (HTTP 429, quota exhaustion, usage limit, billing errors) — a new is_rate_limited check runs before the existing 413/4xx handlers and switches immediately.

Both paths are guarded by _fallback_activated to preserve the existing one-shot semantics.

Test plan

Configure a primary model with a rate-limited/exhausted API key and a working fallback model
Send a message — verify the agent switches to fallback on the first failure instead of retrying 3 times
Verify non-rate-limit errors (auth failures, invalid model, etc.) still follow existing retry/abort behavior
Verify fallback only activates once per session (one-shot guard)

When a fallback model is configured, switch to it immediately upon detecting rate-limit conditions instead of exhausting retries with exponential backoff. The primary provider is unlikely to recover within the retry window when rate-limited or quota-exhausted. Two paths are covered: 1. Invalid/empty API responses (common rate-limit symptom where the provider returns a response with no choices or missing content) -- fallback is attempted right after incrementing retry_count. 2. Exception-based rate limits (HTTP 429, quota exhaustion, usage limit errors) -- a new is_rate_limited check runs before the existing 413/4xx handlers and switches immediately. Both paths are guarded by _fallback_activated to ensure one-shot semantics (no repeated switching).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: eager fallback to backup model on rate-limit errors#1413

feat: eager fallback to backup model on rate-limit errors#1413
usvimal wants to merge 1 commit intoNousResearch:mainfrom
usvimal:fix/eager-429-fallback

usvimal commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

usvimal commented Mar 15, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant