Handle Playwright route already handled race condition error #332

kenzaelk98 · 2026-02-10T10:50:32Z

Fix Playwright Route Handling Race Condition During Concurrent Scraping

Problem

When scraping large websites with --scope hostname (which crawls all pages within a domain), the scraper would intermittently crash with the following error:

route.fulfill: Route is already handled!
at /app/dist/index.js:5520:26

This occurred at random points during concurrent page processing, making it a classic race condition bug.

Root Cause

The Playwright route handlers in HtmlPlaywrightMiddleware.ts were not handling cases where multiple concurrent requests attempted to handle the same route. When scraping with hostname scope, hundreds of pages are processed in parallel, creating a high probability of route handling conflicts.

Solution

Wrapped all Playwright route operations (route.abort(), route.fulfill(), route.continue()) in try-catch blocks to gracefully handle already-handled routes. This prevents the crash and allows the scraper to continue processing other pages.

Changes

Added error handling around all route operations in both setupCachingRouteInterception() and the main process() method
Prefixed unused error variables with underscore to satisfy linting rules
Added debug logging for route handling conflicts

Testing

I tested this fix by scraping a large website (1000+ pages) with hostname scope:

Before fix:

❌ Crashed with "Route is already handled!" error during early stages of scraping
Occurred intermittently but consistently prevented completion

After fix:

✅ Successfully scraped hundreds of pages without any route handling errors
Only normal operational warnings (timeouts, network errors)
Scraping continues smoothly with concurrent page processing

Impact

This fix enables reliable large-scale scraping with hostname scope, which is essential for comprehensive documentation indexing across entire domains.

Copilot

Pull request overview

Fixes intermittent Playwright crashes during high-concurrency scraping by making route interception more tolerant of “Route is already handled!” errors.

Changes:

Adds guards/logging intended to detect already-handled routes before processing.
Wraps route.abort() / route.fulfill() (and some abort fallbacks) in try/catch to prevent scraper crashes.
Adds debug logging when a route action fails due to an assumed already-handled race.

Comments suppressed due to low confidence (1)

src/scraper/middleware/HtmlPlaywrightMiddleware.ts:871

This routing logic is now duplicated in two places (setupCachingRouteInterception and the inline handler in process), and the new race-condition handling needs to stay consistent between them. Consider extracting a shared helper (e.g., a private method that handles a route given context/headers) so future changes to caching/route error handling aren’t missed in one of the copies.

        // For all other requests, use the standard caching logic
        // We need to manually handle the interception since we can't delegate to another route
        const reqOrigin = (() => {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-10T21:26:02Z