SuperScrape

Web scraping + AI visual intelligence that just works -- anti-bot era edition.

SuperScrape uses Camoufox (C++ anti-detection Firefox) to scrape sites that block Playwright, Selenium, and curl. Then it analyzes product images with GPT Vision to generate competitive intelligence reports.

# Scrape Amazon product images + run AI analysis
superscrape amazon visual "portable blender" --top 10

Features

Anti-bot scraping -- Camoufox bypasses Cloudflare, DataDome, and other bot detection
Amazon -- Product pages, search results, image extraction with hi-res upgrade
Instagram -- Public profiles, recent posts, follower counts
Reddit -- Subreddit posts with sorting and filtering
eBay, Walmart, Etsy, Shopee -- Additional e-commerce platforms
Visual Intelligence -- GPT Vision analyzes product images (type, angle, background, text, people)
Reports -- Markdown + JSON reports with category-level insights and recommendations

Prerequisites

Python 3.10+
An OpenAI API key (for Visual Intelligence features)

Installation

pip install superscrape

# Install the Camoufox browser binary
python -c "from camoufox.sync_api import Camoufox; print('ready')"

Or install from source:

git clone https://github.com/PHY041/superscrape.git
cd superscrape
pip install -e ".[dev]"

Quick Start

# 1. Set your OpenAI API key (needed for visual analysis)
export OPENAI_API_KEY="sk-..."

# 2. Scrape a single Amazon product
superscrape amazon product B0CX23V2ZK

# 3. Search Amazon
superscrape amazon search "wireless earbuds" --pages 2

# 4. Run full visual intelligence pipeline
superscrape amazon visual "boys dress shirt" --top 10 --output-dir ./reports

# 5. Scrape Instagram
superscrape instagram natgeo

# 6. Scrape Reddit
superscrape reddit SideProject --sort hot --limit 50

CLI Reference

superscrape
  amazon
    product <ASIN>              Scrape a single product
    search <KEYWORD>            Search results with pagination
    visual <KEYWORD>            Full visual intelligence pipeline
  instagram <USERNAME>          Public profile + recent posts
  reddit <SUBREDDIT>            Posts with sorting (hot/new/top)

Options

Command	Flag	Description
`amazon product`	`--images-only`	Only output image URLs
`amazon search`	`--pages N`	Number of search pages
`amazon visual`	`--top N`	Number of products to analyze
`amazon visual`	`--no-cache`	Bypass cached results
`amazon visual`	`--output-dir DIR`	Output directory
`reddit`	`--sort hot\|new\|top`	Sort order
`reddit`	`--limit N`	Max posts to fetch
All commands	`--output json\|table`	Output format

Python API

from superscrape.sites.amazon import Amazon
from superscrape.analyzers.vision import batch_analyze_first_images
from superscrape.reporters.visual_report import aggregate_report, render_markdown

# Scrape
products = Amazon.search_images("portable blender", top_n=10)

# Analyze with GPT Vision
analyses = batch_analyze_first_images(products)

# Generate report
report = aggregate_report("portable blender", products, analyses)
markdown = render_markdown(report)

Environment Variables

Variable	Required	Description
`OPENAI_API_KEY`	For visual analysis	OpenAI API key for GPT Vision
`BYTEPLUSES_API_KEY`	Optional	BytePlus API key for lifestyle image generation

API Server (Optional)

SuperScrape includes an optional FastAPI server with real-time job tracking:

# Install API dependencies
pip install "superscrape[api]"

# Start the server
uvicorn api.main:app --host 0.0.0.0 --port 8001

# Or use Docker
docker compose up --build

API endpoints:

POST /jobs -- Submit a scraping + analysis job
GET /jobs/{id} -- Job status
GET /jobs/{id}/stream -- SSE real-time progress
GET /reports -- List generated reports
GET /health -- Health check

Architecture

CLI / API Request
    |
    v
+---------------------------+
|  Scraping Layer            |
|  sites/amazon.py           |
|  sites/instagram.py        |
|  sites/reddit.py           |
+------------+--------------+
             |
             v
      Camoufox Browser
      (C++ anti-detection)
             |
             v
+---------------------------+
|  AI Analysis               |
|  analyzers/vision.py       |
|  (OpenAI GPT Vision)       |
+------------+--------------+
             |
             v
+---------------------------+
|  Reports                   |
|  reporters/visual_report   |
|  Markdown + JSON + HTML    |
+---------------------------+

Anti-Bot Test Results

Tested with Camoufox against major platforms:

Platform	Status	Notes
Amazon	Pass	Search, product pages, images
Instagram	Pass	Public profiles, no login required
Reddit	Pass	Playwright+stealth gets blocked, Camoufox passes
eBay	Pass	Product listings, prices
Walmart	Pass	Product pages
Etsy	Pass	Listings, prices
Cloudflare Challenge	Pass	Generic CF challenge page

Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

MIT License -- see LICENSE for details.

Powered by CanMarket.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
api		api
frontend		frontend
nginx		nginx
scripts		scripts
superscrape		superscrape
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SuperScrape

Features

Prerequisites

Installation

Quick Start

CLI Reference

Options

Python API

Environment Variables

API Server (Optional)

Architecture

Anti-Bot Test Results

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SuperScrape

Features

Prerequisites

Installation

Quick Start

CLI Reference

Options

Python API

Environment Variables

API Server (Optional)

Architecture

Anti-Bot Test Results

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages