Web scraping + AI visual intelligence that just works -- anti-bot era edition.
SuperScrape uses Camoufox (C++ anti-detection Firefox) to scrape sites that block Playwright, Selenium, and curl. Then it analyzes product images with GPT Vision to generate competitive intelligence reports.
# Scrape Amazon product images + run AI analysis
superscrape amazon visual "portable blender" --top 10- Anti-bot scraping -- Camoufox bypasses Cloudflare, DataDome, and other bot detection
- Amazon -- Product pages, search results, image extraction with hi-res upgrade
- Instagram -- Public profiles, recent posts, follower counts
- Reddit -- Subreddit posts with sorting and filtering
- eBay, Walmart, Etsy, Shopee -- Additional e-commerce platforms
- Visual Intelligence -- GPT Vision analyzes product images (type, angle, background, text, people)
- Reports -- Markdown + JSON reports with category-level insights and recommendations
- Python 3.10+
- An OpenAI API key (for Visual Intelligence features)
pip install superscrape
# Install the Camoufox browser binary
python -c "from camoufox.sync_api import Camoufox; print('ready')"Or install from source:
git clone https://github.com/PHY041/superscrape.git
cd superscrape
pip install -e ".[dev]"# 1. Set your OpenAI API key (needed for visual analysis)
export OPENAI_API_KEY="sk-..."
# 2. Scrape a single Amazon product
superscrape amazon product B0CX23V2ZK
# 3. Search Amazon
superscrape amazon search "wireless earbuds" --pages 2
# 4. Run full visual intelligence pipeline
superscrape amazon visual "boys dress shirt" --top 10 --output-dir ./reports
# 5. Scrape Instagram
superscrape instagram natgeo
# 6. Scrape Reddit
superscrape reddit SideProject --sort hot --limit 50superscrape
amazon
product <ASIN> Scrape a single product
search <KEYWORD> Search results with pagination
visual <KEYWORD> Full visual intelligence pipeline
instagram <USERNAME> Public profile + recent posts
reddit <SUBREDDIT> Posts with sorting (hot/new/top)
| Command | Flag | Description |
|---|---|---|
amazon product |
--images-only |
Only output image URLs |
amazon search |
--pages N |
Number of search pages |
amazon visual |
--top N |
Number of products to analyze |
amazon visual |
--no-cache |
Bypass cached results |
amazon visual |
--output-dir DIR |
Output directory |
reddit |
--sort hot|new|top |
Sort order |
reddit |
--limit N |
Max posts to fetch |
| All commands | --output json|table |
Output format |
from superscrape.sites.amazon import Amazon
from superscrape.analyzers.vision import batch_analyze_first_images
from superscrape.reporters.visual_report import aggregate_report, render_markdown
# Scrape
products = Amazon.search_images("portable blender", top_n=10)
# Analyze with GPT Vision
analyses = batch_analyze_first_images(products)
# Generate report
report = aggregate_report("portable blender", products, analyses)
markdown = render_markdown(report)| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
For visual analysis | OpenAI API key for GPT Vision |
BYTEPLUSES_API_KEY |
Optional | BytePlus API key for lifestyle image generation |
SuperScrape includes an optional FastAPI server with real-time job tracking:
# Install API dependencies
pip install "superscrape[api]"
# Start the server
uvicorn api.main:app --host 0.0.0.0 --port 8001
# Or use Docker
docker compose up --buildAPI endpoints:
POST /jobs-- Submit a scraping + analysis jobGET /jobs/{id}-- Job statusGET /jobs/{id}/stream-- SSE real-time progressGET /reports-- List generated reportsGET /health-- Health check
CLI / API Request
|
v
+---------------------------+
| Scraping Layer |
| sites/amazon.py |
| sites/instagram.py |
| sites/reddit.py |
+------------+--------------+
|
v
Camoufox Browser
(C++ anti-detection)
|
v
+---------------------------+
| AI Analysis |
| analyzers/vision.py |
| (OpenAI GPT Vision) |
+------------+--------------+
|
v
+---------------------------+
| Reports |
| reporters/visual_report |
| Markdown + JSON + HTML |
+---------------------------+
Tested with Camoufox against major platforms:
| Platform | Status | Notes |
|---|---|---|
| Amazon | Pass | Search, product pages, images |
| Pass | Public profiles, no login required | |
| Pass | Playwright+stealth gets blocked, Camoufox passes | |
| eBay | Pass | Product listings, prices |
| Walmart | Pass | Product pages |
| Etsy | Pass | Listings, prices |
| Cloudflare Challenge | Pass | Generic CF challenge page |
See CONTRIBUTING.md for development setup and guidelines.
MIT License -- see LICENSE for details.
Powered by CanMarket.