Skip to content

cheapestinference/openclaw-plugin-ratelimit-retry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ratelimit-retry

An OpenClaw plugin that automatically retries agent conversations killed by provider rate limits.

Problem

When your LLM provider hits a rate limit or budget cap (HTTP 429), every running agent task dies mid-conversation. Nothing resumes them. If you close the dashboard, those conversations are gone. You have to manually find and re-trigger each one after the budget resets.

Solution

This plugin hooks into OpenClaw's agent_end event, detects retriable errors (429s, rate limits, budget exhaustion), and parks the failed session in a persistent queue on disk. A background service waits for the provider's budget window to reset, then sends chat.send to the original session -- resuming the conversation with its full transcript context, as if the user had typed a message.

Installation

openclaw plugins install @cheapestinference/openclaw-ratelimit-retry

Or copy manually to your extensions directory:

cp -r openclaw-plugin-ratelimit-retry ~/.openclaw/extensions/ratelimit-retry

Enable it in OpenClaw config:

openclaw config set plugins.ratelimit-retry.budgetWindowHours 5
openclaw config set plugins.ratelimit-retry.maxRetryAttempts 3

No npm install needed. The plugin has zero runtime dependencies.

Complete example

# ~/.openclaw/config.yaml
plugins:
  ratelimit-retry:
    budgetWindowHours: 5
    maxRetryAttempts: 3
    checkIntervalMinutes: 5
    retryMessage: "Continue where you left off. The previous attempt failed due to a rate limit that has now reset."

How It Works

Agent run fails (429)
  |
  v
agent_end hook fires
  |-- Non-retriable error? --> ignore
  |-- Retriable error?     --> queue to disk
                                 |
                                 v
                  Background timer (every 5 min)
                    |
                    |-- Budget window not reset? --> wait
                    |-- Budget window reset?     --> chat.send to session
                                                       |
                                                       |--> Ack received: wait for result
                                                       |     |--> agent_end success: remove from queue
                                                       |     |--> agent_end 429: re-queued automatically
                                                       |--> Send failed: wait for next window

The retry uses chat.send with the original sessionKey, which means the gateway loads the complete JSONL transcript and the agent resumes with full context. This is equivalent to the user typing a message in the chat.

The model is fire-and-forget with re-detection: chat.send returns an immediate ack ({ ok, runId, status: "started" }), not the final result. If the retried run fails again with a 429, the agent_end hook fires again and the session is re-queued with an incremented attempt counter. This loop continues until the retry succeeds or maxRetryAttempts is reached.

Configuration

Option Type Default Description
budgetWindowHours number 5 Budget reset window in hours, aligned to UTC clock boundaries
maxRetryAttempts number 3 Max retries per session before abandoning
checkIntervalMinutes number 5 How often the background service checks for pending retries
retryMessage string "Continue where you left off..." Message sent to the session to resume the conversation

How the Retry Timing Works

Many LLM providers (including LiteLLM) reset budget counters on fixed UTC-aligned windows. With a 5-hour window, the boundaries are:

00:00  05:00  10:00  15:00  20:00  (next day) 00:00
  |------|------|------|------|------|

When an error is queued, the plugin calculates the next boundary after the current time and adds a 1-minute margin (retries at HH:01:00 instead of HH:00:00) to avoid racing the provider's reset.

When 24 is not evenly divisible by windowHours: the math still works. If windowHours is 7, boundaries fall at 0, 7, 14, 21, and the next one would be 28 -- which overflows to 04:00 the next day. The plugin handles day overflow correctly.

Error Classification

Non-retriable patterns are checked first. If an error matches a non-retriable pattern, it is never retried, even if it also matches a retriable pattern.

Retriable (queued for retry)

Pattern Catches
429 "Error code: 429 - ..."
rate limit, rate_limit "RateLimitError: ..."
too many requests HTTP 429 reason phrases
budget "Budget exceeded for ..."
quota exceeded Provider quota messages
resource exhausted gRPC-style exhaustion errors
tokens per minute, tpm TPM limit messages

Non-retriable (ignored)

Pattern Reason
401, 402, 403, 404 HTTP client errors -- won't succeed on retry
invalid api key, unauthorized Auth errors -- fix your credentials
invalid request, malformed Bad request format -- won't succeed on retry
model not found Model doesn't exist
context length, prompt too large Context overflow -- message is too long
insufficient credits Billing issue -- requires user action

Edge Cases

  • Server restarts: the queue is persisted to {stateDir}/ratelimit-retry/queue.json and reloaded on startup.
  • Same session errors multiple times: deduplicated by sessionKey. The existing entry is updated with incremented attempts and a recalculated retryAfter.
  • Retry fails with 429 again: agent_end fires again, re-queuing with incremented attempts. Natural loop until success or maxRetryAttempts.
  • Gateway unreachable during retry: connection error is caught, entry's retryAfter is pushed to the next budget window to avoid hammering a down gateway every tick.
  • Max attempts exceeded: entry is removed from queue and a warning is logged.
  • Sub-agent sessions: handled identically -- sessionKey format agent:X:subagent:Y works the same way.
  • Timer fires during active retry: a retryInProgress guard prevents overlapping batches.
  • Queue file corrupted: JSON parse errors are caught; service starts with an empty queue and logs a warning.
  • Queue overflow: capped at 100 entries. Oldest entries are evicted when full.
  • Atomic writes: queue is written to a uniquely-named .tmp file first, then renamed, to prevent corruption on crashes or concurrent writes.

Limitations

  • Fire-and-forget window: after chat.send returns its ack, there is a brief period where the retried run is in progress. If it fails with 429 again immediately, there is a small window before the agent_end hook fires and re-queues it. This is by design -- the re-detection loop handles it.
  • chat.send requires a non-empty message: the retry always sends the configured retryMessage. It cannot send an empty message to silently resume.
  • No partial-run recovery: the plugin resumes the conversation from the last completed turn. It does not replay partial streaming output that was interrupted.
  • Single-instance only: the queue is a local JSON file with no locking. Running multiple OpenClaw instances sharing the same ~/.openclaw/ directory is not supported.
  • No backpressure on the provider: the plugin retries all ready sessions in sequence. If you have many queued sessions, they all fire at the start of the next window.

License

MIT

Contributing

Contributions are welcome. Please open an issue first to discuss what you would like to change.

git clone https://github.com/cheapestinference/openclaw-plugin-ratelimit-retry
cd openclaw-plugin-ratelimit-retry
# No build step. OpenClaw loads .ts files directly via Jiti.

About

OpenClaw plugin: automatically retry agent conversations that fail due to provider rate limits (429)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors