Skip to content

RajpurohitHitesh/AmazonScrapperAPI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

15 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Amazon Scraper API - Multi-Country Product Scraping

Python Flask License Playwright

A powerful REST API service for scraping product data from 15+ Amazon marketplaces worldwide

Features โ€ข Installation โ€ข Quick Start โ€ข Usage โ€ข Contributing


๐ŸŽฏ Overview

REST API service for scraping Amazon product data across 15 countries. Built with Flask, Playwright, and BeautifulSoup for reliable data extraction.

Perfect for:

  • E-commerce price monitoring
  • Product research & analytics
  • Inventory management systems
  • Market research applications

โœจ Features

  • โœ… 15+ Amazon Marketplaces - Support for US, UK, India, Japan, and more
  • โœ… Auto Country Detection - Automatically detects country from URL
  • โœ… 12 Essential Fields - Clean, structured product data
  • โœ… API Authentication - Secure API key-based access
  • โœ… Anti-Detection - Playwright with stealth scripts + device profiles
  • โœ… Rate Limiting - Per API key/IP throttling
  • โœ… Metrics - Prometheus-ready /metrics endpoint
  • โœ… Swagger UI - Interactive API docs at /docs
  • โœ… Caching - Short TTL cache by ASIN + country
  • โœ… Easy Deployment - One-command setup for VPS
  • โœ… CORS Support - Ready for web applications
  • โœ… Production Ready - Systemd service, logging, error handling

๐Ÿ“ Project Structure

AmazonScrapperPython/
โ”œโ”€โ”€ api_server.py           # Flask API server
โ”œโ”€โ”€ api_config.py           # Country configurations
โ”œโ”€โ”€ .env.example            # Example environment file
โ”œโ”€โ”€ .gitignore              # Git ignore rules
โ”œโ”€โ”€ requirements.txt        # Python dependencies
โ”œโ”€โ”€ setup.py                # Package installation
โ”œโ”€โ”€ LICENSE                 # MIT License
โ”œโ”€โ”€ INSTALL.txt             # Detailed installation guide
โ”œโ”€โ”€ CONTRIBUTING.md         # Contribution guidelines
โ”œโ”€โ”€ README.md               # This file
โ”œโ”€โ”€ start.bat               # Quick start (Windows)
โ”œโ”€โ”€ start.sh                # Quick start (Linux/Mac)
โ””โ”€โ”€ scrapers/
    โ”œโ”€โ”€ __init__.py
    โ”œโ”€โ”€ base_scraper.py     # Base scraper class
    โ”œโ”€โ”€ india_scraper.py    # Amazon India
    โ”œโ”€โ”€ usa_scraper.py      # Amazon USA
    โ””โ”€โ”€ uk_scraper.py       # Amazon UK

๐Ÿš€ Installation

Method 1: Quick Start (Recommended)

Windows:

start.bat

Linux/Mac:

chmod +x start.sh
./start.sh

Script automatically:

  • โœ… Checks Python installation
  • โœ… Installs dependencies
  • โœ… Creates .env file
  • โœ… Starts the server

Method 2: Manual Installation

Prerequisites

  • Python 3.7 or higher
  • Internet connection (for Playwright browser downloads)

Quick Setup

1. Clone the repository:

git clone https://github.com/RajpurohitHitesh/AmazonScrapperPython.git
cd AmazonScrapperPython

2. Install dependencies:

pip install -r requirements.txt
python -m playwright install chromium

3. Configure environment:

# Windows
copy .env.example .env

# Linux/Mac
cp .env.example .env

4. Edit .env and set your API key:

API_KEY=your_secure_api_key_here

๐Ÿ’ก Tip: Generate a secure API key with:

python -c "import secrets; print(secrets.token_urlsafe(32))"

5. Run the server:

python api_server.py

Server will start at: http://127.0.0.1:5000

โœ… Installation complete! You can now use the API.

Method 3: Docker (Recommended for server/deployment)

Build and run:

docker build -t amazon-scraper-api .
docker run -p 5000:5000 --env-file .env amazon-scraper-api

Or with docker-compose:

docker-compose up --build

Configure your domain in .env:

API_DOMAIN=https://api.yourdomain.com
ALLOWED_ORIGINS=https://yourdomain.com,https://app.yourdomain.com

๐Ÿ“ก API Usage

Base URL:

  • Local Development: http://127.0.0.1:5000
  • Production: https://your-domain.com (configure in .env)

Health Check

Check if API is running:

# Local
curl http://127.0.0.1:5000/api/health

# Production
curl https://your-domain.com/api/health

Readiness (for load balancers/containers)

# Local
curl http://127.0.0.1:5000/api/ready

# Production
curl https://your-domain.com/api/ready

Swagger UI

Open interactive docs at:

Metrics

Prometheus endpoint:

Response includes queue depth and cache size.

Scrape Product

Endpoint: POST /api/scrape

Headers:

X-API-Key: your_api_key_here
Content-Type: application/json

Request Body:

{
  "url": "https://www.amazon.in/dp/B0FMDNZ61S",
  "headless": true,
  "proxy": "http://user:pass@host:port"
}

Example with cURL:

# Local
curl -X POST http://127.0.0.1:5000/api/scrape \
  -H "X-API-Key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.amazon.in/dp/B0FMDNZ61S"}'

# Production
curl -X POST https://your-domain.com/api/scrape \
  -H "X-API-Key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.amazon.in/dp/B0FMDNZ61S"}'

Example with Python:

import requests

# Change base_url based on your setup
base_url = "http://127.0.0.1:5000"  # Local
# base_url = "https://your-domain.com"  # Production

url = f"{base_url}/api/scrape"
headers = {
    "X-API-Key": "your_api_key_here",
    "Content-Type": "application/json"
}
data = {
    "url": "https://www.amazon.in/dp/B0FMDNZ61S"
}

response = requests.post(url, json=data, headers=headers)
print(response.json())

Success Response:

{
  "success": true,
  "cached": false,
  "country": "India",
  "country_code": "IN",
  "detected_country": "IN",
  "data": {
    "asin": "B0FMDNZ61S",
    "merchant": "Amazon India",
    "name": "Product Name",
    "category": "Electronics",
    "subcategory": "Smartphones",
    "brand": "Samsung",
    "current_price": 1299.00,
    "original_price": 1999.00,
    "currency": "โ‚น",
    "currency_code": "INR",
    "stock_status": "in_stock",
    "image_path": "https://m.media-amazon.com/images/I/...",
    "images": ["https://m.media-amazon.com/images/I/..."],
    "rating": 4.2,
    "review_count": 1850,
    "bullet_points": ["..."],
    "variations": ["Color", "Size"],
    "delivery_eta": "Tomorrow",
    "seller": {
      "name": "Amazon",
      "fulfilled_by_amazon": true
    },
    "offers_count": 5,
    "buy_box_winner": "Amazon",
    "seller_type": "amazon",
    "description": "...",
    "specifications": {
      "Brand": "Samsung"
    }
  }
}

Error Response:

{
  "success": false,
  "error": "Invalid URL",
  "message": "Please provide a valid Amazon product URL"
}

๐Ÿ“ฆ Response Fields

API returns these fields (when available):

Field Type Description
asin string Amazon Standard Identification Number
merchant string Country-specific Amazon (e.g., Amazon India, Amazon USA)
name string Product title
category string Main category
subcategory string Subcategory
brand string Brand name
current_price float Current price (numeric)
original_price float Original/MRP price (numeric)
currency string Currency symbol
currency_code string Currency code
stock_status string in_stock or out_of_stock
image_path string Main product image URL
images array Additional image URLs
rating float Average rating (0-5)
review_count int Number of reviews
bullet_points array Key feature bullets
variations array Variation labels (e.g., size/color)
delivery_eta string Delivery estimate (if shown)
seller object Seller info (name, FBA)
offers_count int Offers count (if shown)
buy_box_winner string Buy box seller (if shown)
seller_type string amazon or marketplace
description string Description text
specifications object Key-value specs

๐ŸŒ Supported Countries (15 Amazon Marketplaces)

Country Domain Currency
๐Ÿ‡บ๐Ÿ‡ธ United States amazon.com USD
๐Ÿ‡จ๐Ÿ‡ฆ Canada amazon.ca CAD
๐Ÿ‡ฒ๐Ÿ‡ฝ Mexico amazon.com.mx MXN
๐Ÿ‡ง๐Ÿ‡ท Brazil amazon.com.br BRL
๐Ÿ‡ฌ๐Ÿ‡ง United Kingdom amazon.co.uk GBP
๐Ÿ‡ฉ๐Ÿ‡ช Germany amazon.de EUR
๐Ÿ‡ซ๐Ÿ‡ท France amazon.fr EUR
๐Ÿ‡ฎ๐Ÿ‡น Italy amazon.it EUR
๐Ÿ‡ช๐Ÿ‡ธ Spain amazon.es EUR
๐Ÿ‡ณ๐Ÿ‡ฑ Netherlands amazon.nl EUR
๐Ÿ‡ฆ๐Ÿ‡ช UAE amazon.ae AED
๐Ÿ‡ฎ๐Ÿ‡ณ India amazon.in INR
๐Ÿ‡ฏ๐Ÿ‡ต Japan amazon.co.jp JPY
๐Ÿ‡ฆ๐Ÿ‡บ Australia amazon.com.au AUD
๐Ÿ‡ธ๐Ÿ‡ฌ Singapore amazon.sg SGD

๐Ÿ”ง Configuration

Environment Variables (.env)

# API Server
API_HOST=0.0.0.0          # 0.0.0.0 for public, 127.0.0.1 for local
API_PORT=5000             # Server port
API_KEY=your_key_here     # Authentication key
API_KEYS=key1,key2        # Optional rotated keys
ENABLE_JWT=False          # Optional JWT auth
JWT_SECRET=your_secret

# Application
DEBUG_MODE=True           # Enable debug logging
HEADLESS_MODE=True        # Run browser without GUI
SCRAPE_TIMEOUT_SECONDS=30 # Scrape timeout
SCRAPE_MAX_RETRIES=2      # Retry attempts
MAX_CONCURRENCY=3         # Max concurrent scrapes
PROXY_URLS=               # Optional proxy list (comma-separated)

# CORS
ALLOWED_ORIGINS=http://localhost:8000,https://yourdomain.com

# Rate limiting
RATE_LIMIT_PER_MINUTE_KEY=60
RATE_LIMIT_PER_MINUTE_IP=120

# Cache
CACHE_TTL_SECONDS=300
CACHE_MAX_ITEMS=1000

# Readiness checks
READY_CHECK_ASIN=
READY_CHECK_COUNTRY=US
READY_CHECK_INTERVAL_SECONDS=900

๐Ÿ” Authentication

All API requests require authentication via API key:

Method 1: Header (Recommended)

X-API-Key: your_api_key_here

Method 2: Query Parameter

?api_key=your_api_key_here

Optional JWT:

Authorization: Bearer <jwt>

๐Ÿ—๏ธ Architecture

Base Scraper Class

All country scrapers inherit from BaseAmazonScraper:

  • Playwright browser contexts with stealth scripts
  • ASIN extraction from URLs
  • Common scraping methods
  • Error handling

Country-Specific Scrapers

Each country has its own scraper module:

  • india_scraper.py - Amazon India
  • usa_scraper.py - Amazon USA
  • uk_scraper.py - Amazon UK
  • More countries coming soon...

Automatic Country Detection

API automatically detects country from product URL:

amazon.in โ†’ India Scraper
amazon.com โ†’ USA Scraper
amazon.co.uk โ†’ UK Scraper

๐Ÿ”— Integration Examples

Laravel (PHP)

use App\Services\AmazonScraperService;

$scraper = new AmazonScraperService();
$result = $scraper->scrapeProduct('https://www.amazon.in/dp/B0FMDNZ61S');

if ($result['success']) {
    $data = $result['data'];
    // Use data...
}

Configuration (config/services.php)

'amazon_scraper' => [
    'url' => env('AMAZON_SCRAPER_URL', 'http://127.0.0.1:5000'),  // Local or production URL
    'api_key' => env('AMAZON_SCRAPER_API_KEY'),
    'timeout' => env('AMAZON_SCRAPER_TIMEOUT', 60),
],

Environment (.env)

# Local Development
AMAZON_SCRAPER_URL=http://127.0.0.1:5000

# Production
# AMAZON_SCRAPER_URL=https://your-domain.com

AMAZON_SCRAPER_API_KEY=your_api_key_here
AMAZON_SCRAPER_TIMEOUT=60

๐Ÿ–ฅ๏ธ VPS Deployment (Always Running)

For production deployment on VPS with systemd service (24/7 operation):

See INSTALL.txt for complete guide including:

  • Ubuntu/Debian setup
  • Systemd service configuration
  • Nginx reverse proxy
  • SSL certificate setup
  • Firewall configuration
  • Always-running configuration
  • Monitoring and logs

Quick command to make it always run:

sudo systemctl enable amazon-scraper-api
sudo systemctl start amazon-scraper-api

๐Ÿ› ๏ธ Development

Adding New Country Scraper

  1. Add country in api_config.py:
AMAZON_COUNTRIES = {
    'DE': {
        'name': 'Germany',
        'domain': 'amazon.de',
        'currency': 'EUR',
    'currency_code': 'EUR'
    }
}
  1. (Optional) Add a custom scraper if needed and register in get_scraper_for_country.

๐Ÿ“Š Logging

Development

  • JSON logs to console with DEBUG_MODE=True
  • Request IDs in logs

Production

  • File log: api.log
  • Systemd journal: sudo journalctl -u amazon-scraper-api

๐Ÿ› Troubleshooting

Playwright Browser Issues

# Reinstall browser binaries
python -m playwright install chromium

Port Already in Use

# Change API_PORT in .env
API_PORT=5001

CORS Errors

# Add your domain to ALLOWED_ORIGINS
ALLOWED_ORIGINS=https://yourdomain.com,http://localhost:8000

Service Not Starting (VPS)

# Check status
sudo systemctl status amazon-scraper-api

# View logs
sudo journalctl -u amazon-scraper-api -f

๐Ÿค Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

Ways to contribute:

  • ๐Ÿ› Report bugs
  • ๐Ÿ’ก Suggest features
  • ๐ŸŒ Add new country scrapers
  • ๐Ÿ“ Improve documentation
  • โšก Optimize performance

๐Ÿ“„ License

This project is licensed under the MIT License - see LICENSE file for details.

Disclaimer: This software is for educational purposes only. Users are responsible for complying with Amazon's Terms of Service.

๐Ÿ™ Support

If you find this project helpful:

  • โญ Star the repository
  • ๐Ÿ› Report issues
  • ๐Ÿ”€ Submit pull requests
  • ๐Ÿ“ข Share with others

๐Ÿ“ž Contact


Made with โค๏ธ for the developer community

โฌ† Back to Top

About

Most advance Amazon Product Scrapper

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published