Skip to content

Fix recipe preview failures with Firecrawl fallback#58

Merged
windoze95 merged 2 commits intomainfrom
fix/firecrawl-preview-fallback
Mar 8, 2026
Merged

Fix recipe preview failures with Firecrawl fallback#58
windoze95 merged 2 commits intomainfrom
fix/firecrawl-preview-fallback

Conversation

@windoze95
Copy link
Owner

@windoze95 windoze95 commented Mar 8, 2026

Summary

  • Add Firecrawl API as fallback scraper when direct HTTP fetch is blocked by Cloudflare (402/403/503 or JS challenge pages)
  • Add User-Agent header to outbound recipe fetch requests
  • Return structured ExtractionError responses with machine-readable error codes (site_blocked, not_found, fetch_failed) instead of generic 500s
  • Add FIRECRAWL_API_KEY env var (optional) and firecrawl_json_ld / firecrawl_haiku extraction method constants
  • Add test seams (HTTPFetchOverride, FirecrawlFetchOverride) for fully offline testing

Test plan

  • All existing Go tests pass (144 → 154 tests)
  • 8 new service tests: direct success, AI fallback, 403+Firecrawl success/fail, no key, 404, Cloudflare challenge, isCloudflareChallenge
  • 2 new handler tests: PreviewFromURL returns 502+site_blocked, 404+not_found
  • Set FIRECRAWL_API_KEY in ECS task definition
  • Manual test: preview allrecipes.com URL → extracts via Firecrawl or shows "blocks automated access"
  • Manual test: preview budgetbytes.com URL → works via direct fetch as before

🤖 Generated with Claude Code

…rors

Many recipe sites (allrecipes.com, etc.) use Cloudflare bot protection that
blocks direct HTTP fetch, returning 402/403 or a JS challenge page. This adds
a Firecrawl API fallback for blocked sites, User-Agent headers on requests,
Cloudflare challenge detection, and structured ExtractionError responses so
the frontend can show actionable guidance instead of generic 500 errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1072daf1cf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Previously all non-200 statuses triggered the Firecrawl fallback, causing
misleading site_blocked errors for ordinary server failures. Now only
bot-block codes and Cloudflare challenges route to Firecrawl; other
non-200 responses return fetch_failed instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@windoze95 windoze95 merged commit aee1803 into main Mar 8, 2026
1 check passed
@windoze95 windoze95 deleted the fix/firecrawl-preview-fallback branch March 8, 2026 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant