Skip to content

Undocumented "Queue is overloaded. Please back off." error from sendBatch() #6053

@DivMode

Description

@DivMode

Description

Queue.sendBatch() intermittently throws an undocumented error:

Error: Queue sendBatch failed: Queue is overloaded. Please back off.

This error is not documented anywhere — not in the Queues limits, error handling docs, or changelog. It is distinct from the documented Too Many Requests rate limit error.

Reproduction

  • Context: sendBatch() called inside a Cloudflare Workflow step.do() callback
  • Batch size: 100 messages (~14 msg/s throughput, well under the 5,000 msg/s documented limit)
  • Failure duration: 57 seconds (request hangs, then fails)
  • Retry behavior: Succeeds in <1s on immediate retry — clearly transient
  • Frequency: Observed on multiple occasions across different workflow instances

Evidence

From wrangler workflows instances describe:

┌───────────────────────┬───────────────────────┬────────────┬──────────┬──────────────────────────────────────────────────────────────────────┐
│ Start                 │ End                   │ Duration   │ State    │ Error                                                                │
├───────────────────────┼───────────────────────┼────────────┼──────────┼──────────────────────────────────────────────────────────────────────┤
│ 2/9/2026, 10:20:57 AM │ 2/9/2026, 10:21:54 AM │ 57 seconds │ ❌ Error │ Error: Queue sendBatch failed: Queue is overloaded. Please back off. │
├───────────────────────┼───────────────────────┼────────────┼──────────┼──────────────────────────────────────────────────────────────────────┤
│ 2/9/2026, 10:22:04 AM │ 2/9/2026, 10:22:05 AM │ 1 second   │ ✅ Success│                                                                      │
└───────────────────────┴───────────────────────┴────────────┴──────────┘

Root Cause Analysis

From the workerd source (src/workerd/api/queue.c++):

JSG_REQUIRE(response.statusCode == 200, Error,
    kj::str("Queue sendBatch failed: ", response.statusText));

The error is the literal HTTP statusText from the internal queue backend. Based on the Queues v2 architecture blog post, the backend uses Storage Shard Durable Objects. The 57-second hang + overload message is consistent with the DO's internal request queue exceeding capacity — likely because the randomly-assigned shard was hot (other tenants' traffic or autoscaling lag).

Impact

When this error occurs inside a Workflow step.do(), the Workflows engine retries the entire callback. If the callback contained sendBatch() calls that already succeeded before the failure, all messages are re-sent with different message IDs — creating invisible duplicates that cannot be deduplicated at the message level.

In our case, this caused 840 duplicate queue messages per incident.

Questions

  1. What causes this error? Is it DO shard overload as described above?
  2. Why does it trigger at ~14 msg/s when the documented limit is 5,000 msg/s?
  3. Can this be documented alongside the existing Too Many Requests error?
  4. Is there a recommended retry strategy beyond what the error message suggests?

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions