Context: The Playwright engine intermittently reports that the browser is closed unexpectedly. This triggers the Circuit Breaker prematurely, disrupting the scraping flow even when the network is stable.
Goal: Implement logic to distinguish between a Process Crash and a Timeout. The Circuit Breaker should primarily be "tripped" on Timeouts or network-related failures, indicating that the target might be throttling us or the network is unstable.
- Steps to Investigate & Implement:
- Error Classification: Update the engine to parse Playwright errors and identify TimeoutError specifically.
- Conditional Tripping: Modify the Circuit Breaker logic to only increment the failure count on timeouts/network errors.
- Process Recovery: Implement a silent restart/retry for pure browser crashes that aren't related to timeouts.
- Resource Monitoring: Check if high-volume scraping is causing memory leaks that lead to these crashes.
Context: The Playwright engine intermittently reports that the browser is closed unexpectedly. This triggers the Circuit Breaker prematurely, disrupting the scraping flow even when the network is stable.
Goal: Implement logic to distinguish between a Process Crash and a Timeout. The Circuit Breaker should primarily be "tripped" on Timeouts or network-related failures, indicating that the target might be throttling us or the network is unstable.