Skip to content

Fix Firecrawl fallback: limit to bot-block responses, use 422 status#59

Merged
windoze95 merged 2 commits intomainfrom
fix/firecrawl-preview-fallback
Mar 8, 2026
Merged

Fix Firecrawl fallback: limit to bot-block responses, use 422 status#59
windoze95 merged 2 commits intomainfrom
fix/firecrawl-preview-fallback

Conversation

@windoze95
Copy link
Owner

Summary

  • Limit Firecrawl fallback to actual bot-block status codes (402/403/503) and Cloudflare challenge pages — previously all non-200 responses triggered Firecrawl
  • Return fetch_failed error for other non-200 statuses (500, 502, 429, 301, etc.) instead of misleading site_blocked
  • Change HTTP status for site_blocked and fetch_failed errors from 502 (BadGateway) to 422 (UnprocessableEntity) — avoids collision with ALB-generated 502s and fixes Dio JSON parsing on the client
  • Log the actual Firecrawl error when the fallback fails (was silently swallowed)
  • Add isBotBlockStatus helper with table-driven tests
  • Add TestExtractFromURL_500_NoFirecrawl to verify non-bot-block errors skip Firecrawl

Test plan

  • go test ./... -count=1 — all tests pass
  • Existing TestExtractFromURL_403_* tests still route through Firecrawl
  • New TestExtractFromURL_500_NoFirecrawl returns fetch_failed, not site_blocked
  • New TestIsBotBlockStatus covers 200, 301, 400, 402, 403, 404, 429, 500, 502, 503
  • TestPreviewFromURL_Handler_SiteBlocked updated to expect 422

🤖 Generated with Claude Code

- Change site_blocked and fetch_failed HTTP status from 502 (BadGateway)
  to 422 (UnprocessableEntity) so client JSON parsing works correctly
  and errors don't collide with ALB-generated 502s
- Log the actual Firecrawl error when fallback fails (was silently swallowed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4feb1a6b5a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

fetch_failed represents transient upstream failures (500, 429, etc.)
that may resolve on retry — 502 (BadGateway) is semantically correct.
site_blocked is a permanent condition — 422 is appropriate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@windoze95 windoze95 merged commit b6f243f into main Mar 8, 2026
1 check passed
@windoze95 windoze95 deleted the fix/firecrawl-preview-fallback branch March 8, 2026 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant