fix: RateLimiter burst race, Retry-After headers, deep crawl dispatcher (#1095) by ntohidi · Pull Request #1835 · unclecode/crawl4ai

ntohidi · 2026-03-16T03:20:15Z

Summary

Fixes [Bug]: RateLimiter provides ineffective protection against failures #1095
Three fixes for RateLimiter ineffectiveness under concurrent load and on rate-limited sites

1. Burst race (concurrent tasks bypass rate limiting)

wait_if_needed() had no synchronization — concurrent tasks all read last_request_time at the same instant, computed wait_time ≈ 0, and fired together. Added per-domain asyncio.Lock so tasks serialize and each waits its proper turn.

Before: 9/10 requests fire at +1.7s simultaneously (0ms gaps)
After: Requests spaced 1.2-1.8s apart across 13.7s total

2. Retry-After header support

update_delay() only accepted (url, status_code) — server rate-limit headers were completely ignored. Added optional response_headers param with parsing for Retry-After (both delay-seconds and HTTP-date formats). Both dispatcher call sites now pass result.response_headers.

Before: 429 with Retry-After: 5 → blind exponential backoff (1.9s)
After: 429 with Retry-After: 5 → delay set to 5.0s as server instructed

3. Deep crawl dispatcher configurability

BFS, DFS, and BestFirst strategies hardcoded arun_many() calls without passing a dispatcher. Added dispatcher param to all three, forwarded to every arun_many() call.

Changes

crawl4ai/async_dispatcher.py: Per-domain lock in wait_if_needed(), response_headers param + _parse_retry_after() in update_delay(), both call sites updated
crawl4ai/deep_crawling/bfs_strategy.py: Added dispatcher param, forwarded to arun_many()
crawl4ai/deep_crawling/dfs_strategy.py: Forwarded self.dispatcher to arun_many()
crawl4ai/deep_crawling/bff_strategy.py: Added dispatcher param, forwarded to arun_many()

Test plan

Reproduction script verifies all three fixes (burst serialization, Retry-After parsing, dispatcher passthrough)
Deep crawl with rate-limited site (e.g. gamesjobslive.niceboard.co) to verify end-to-end

…er (#1095) Three fixes for RateLimiter ineffectiveness: 1. Burst race: Added per-domain asyncio.Lock in wait_if_needed() so concurrent tasks serialize properly. Previously all tasks read last_request_time simultaneously and fired together. 2. Retry-After headers: Added optional response_headers param to update_delay() with parsing for Retry-After (seconds and HTTP-date). Both dispatcher call sites now pass result.response_headers. 3. Deep crawl dispatcher: Added dispatcher param to BFS, DFS, and BestFirst strategies, forwarded to all arun_many() calls so users can configure rate limiting for deep crawls.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: RateLimiter burst race, Retry-After headers, deep crawl dispatcher (#1095)#1835

fix: RateLimiter burst race, Retry-After headers, deep crawl dispatcher (#1095)#1835
ntohidi wants to merge 1 commit intodevelopfrom
fix/rate-limiter-burst-and-headers-1095

ntohidi commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ntohidi commented Mar 16, 2026

Summary

1. Burst race (concurrent tasks bypass rate limiting)

2. Retry-After header support

3. Deep crawl dispatcher configurability

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant