A powerful REST API service for scraping product data from 15+ Amazon marketplaces worldwide
Features โข Installation โข Quick Start โข Usage โข Contributing
REST API service for scraping Amazon product data across 15 countries. Built with Flask, Playwright, and BeautifulSoup for reliable data extraction.
Perfect for:
- E-commerce price monitoring
- Product research & analytics
- Inventory management systems
- Market research applications
- โ 15+ Amazon Marketplaces - Support for US, UK, India, Japan, and more
- โ Auto Country Detection - Automatically detects country from URL
- โ 12 Essential Fields - Clean, structured product data
- โ API Authentication - Secure API key-based access
- โ Anti-Detection - Playwright with stealth scripts + device profiles
- โ Rate Limiting - Per API key/IP throttling
- โ Metrics - Prometheus-ready /metrics endpoint
- โ Swagger UI - Interactive API docs at /docs
- โ Caching - Short TTL cache by ASIN + country
- โ Easy Deployment - One-command setup for VPS
- โ CORS Support - Ready for web applications
- โ Production Ready - Systemd service, logging, error handling
AmazonScrapperPython/
โโโ api_server.py # Flask API server
โโโ api_config.py # Country configurations
โโโ .env.example # Example environment file
โโโ .gitignore # Git ignore rules
โโโ requirements.txt # Python dependencies
โโโ setup.py # Package installation
โโโ LICENSE # MIT License
โโโ INSTALL.txt # Detailed installation guide
โโโ CONTRIBUTING.md # Contribution guidelines
โโโ README.md # This file
โโโ start.bat # Quick start (Windows)
โโโ start.sh # Quick start (Linux/Mac)
โโโ scrapers/
โโโ __init__.py
โโโ base_scraper.py # Base scraper class
โโโ india_scraper.py # Amazon India
โโโ usa_scraper.py # Amazon USA
โโโ uk_scraper.py # Amazon UK
Windows:
start.batLinux/Mac:
chmod +x start.sh
./start.shScript automatically:
- โ Checks Python installation
- โ Installs dependencies
- โ Creates .env file
- โ Starts the server
- Python 3.7 or higher
- Internet connection (for Playwright browser downloads)
1. Clone the repository:
git clone https://github.com/RajpurohitHitesh/AmazonScrapperPython.git
cd AmazonScrapperPython2. Install dependencies:
pip install -r requirements.txt
python -m playwright install chromium3. Configure environment:
# Windows
copy .env.example .env
# Linux/Mac
cp .env.example .env4. Edit .env and set your API key:
API_KEY=your_secure_api_key_here๐ก Tip: Generate a secure API key with:
python -c "import secrets; print(secrets.token_urlsafe(32))"5. Run the server:
python api_server.pyServer will start at: http://127.0.0.1:5000
โ Installation complete! You can now use the API.
Build and run:
docker build -t amazon-scraper-api .
docker run -p 5000:5000 --env-file .env amazon-scraper-apiOr with docker-compose:
docker-compose up --buildConfigure your domain in .env:
API_DOMAIN=https://api.yourdomain.com
ALLOWED_ORIGINS=https://yourdomain.com,https://app.yourdomain.comBase URL:
- Local Development:
http://127.0.0.1:5000 - Production:
https://your-domain.com(configure in.env)
Check if API is running:
# Local
curl http://127.0.0.1:5000/api/health
# Production
curl https://your-domain.com/api/health# Local
curl http://127.0.0.1:5000/api/ready
# Production
curl https://your-domain.com/api/readyOpen interactive docs at:
- Local: http://127.0.0.1:5000/docs
- Production: https://your-domain.com/docs
Prometheus endpoint:
- Local: http://127.0.0.1:5000/metrics
- Production: https://your-domain.com/metrics
Response includes queue depth and cache size.
Endpoint: POST /api/scrape
Headers:
X-API-Key: your_api_key_here
Content-Type: application/json
Request Body:
{
"url": "https://www.amazon.in/dp/B0FMDNZ61S",
"headless": true,
"proxy": "http://user:pass@host:port"
}Example with cURL:
# Local
curl -X POST http://127.0.0.1:5000/api/scrape \
-H "X-API-Key: your_api_key_here" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.amazon.in/dp/B0FMDNZ61S"}'
# Production
curl -X POST https://your-domain.com/api/scrape \
-H "X-API-Key: your_api_key_here" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.amazon.in/dp/B0FMDNZ61S"}'Example with Python:
import requests
# Change base_url based on your setup
base_url = "http://127.0.0.1:5000" # Local
# base_url = "https://your-domain.com" # Production
url = f"{base_url}/api/scrape"
headers = {
"X-API-Key": "your_api_key_here",
"Content-Type": "application/json"
}
data = {
"url": "https://www.amazon.in/dp/B0FMDNZ61S"
}
response = requests.post(url, json=data, headers=headers)
print(response.json())Success Response:
{
"success": true,
"cached": false,
"country": "India",
"country_code": "IN",
"detected_country": "IN",
"data": {
"asin": "B0FMDNZ61S",
"merchant": "Amazon India",
"name": "Product Name",
"category": "Electronics",
"subcategory": "Smartphones",
"brand": "Samsung",
"current_price": 1299.00,
"original_price": 1999.00,
"currency": "โน",
"currency_code": "INR",
"stock_status": "in_stock",
"image_path": "https://m.media-amazon.com/images/I/...",
"images": ["https://m.media-amazon.com/images/I/..."],
"rating": 4.2,
"review_count": 1850,
"bullet_points": ["..."],
"variations": ["Color", "Size"],
"delivery_eta": "Tomorrow",
"seller": {
"name": "Amazon",
"fulfilled_by_amazon": true
},
"offers_count": 5,
"buy_box_winner": "Amazon",
"seller_type": "amazon",
"description": "...",
"specifications": {
"Brand": "Samsung"
}
}
}Error Response:
{
"success": false,
"error": "Invalid URL",
"message": "Please provide a valid Amazon product URL"
}API returns these fields (when available):
| Field | Type | Description |
|---|---|---|
asin |
string | Amazon Standard Identification Number |
merchant |
string | Country-specific Amazon (e.g., Amazon India, Amazon USA) |
name |
string | Product title |
category |
string | Main category |
subcategory |
string | Subcategory |
brand |
string | Brand name |
current_price |
float | Current price (numeric) |
original_price |
float | Original/MRP price (numeric) |
currency |
string | Currency symbol |
currency_code |
string | Currency code |
stock_status |
string | in_stock or out_of_stock |
image_path |
string | Main product image URL |
images |
array | Additional image URLs |
rating |
float | Average rating (0-5) |
review_count |
int | Number of reviews |
bullet_points |
array | Key feature bullets |
variations |
array | Variation labels (e.g., size/color) |
delivery_eta |
string | Delivery estimate (if shown) |
seller |
object | Seller info (name, FBA) |
offers_count |
int | Offers count (if shown) |
buy_box_winner |
string | Buy box seller (if shown) |
seller_type |
string | amazon or marketplace |
description |
string | Description text |
specifications |
object | Key-value specs |
| Country | Domain | Currency |
|---|---|---|
| ๐บ๐ธ United States | amazon.com | USD |
| ๐จ๐ฆ Canada | amazon.ca | CAD |
| ๐ฒ๐ฝ Mexico | amazon.com.mx | MXN |
| ๐ง๐ท Brazil | amazon.com.br | BRL |
| ๐ฌ๐ง United Kingdom | amazon.co.uk | GBP |
| ๐ฉ๐ช Germany | amazon.de | EUR |
| ๐ซ๐ท France | amazon.fr | EUR |
| ๐ฎ๐น Italy | amazon.it | EUR |
| ๐ช๐ธ Spain | amazon.es | EUR |
| ๐ณ๐ฑ Netherlands | amazon.nl | EUR |
| ๐ฆ๐ช UAE | amazon.ae | AED |
| ๐ฎ๐ณ India | amazon.in | INR |
| ๐ฏ๐ต Japan | amazon.co.jp | JPY |
| ๐ฆ๐บ Australia | amazon.com.au | AUD |
| ๐ธ๐ฌ Singapore | amazon.sg | SGD |
# API Server
API_HOST=0.0.0.0 # 0.0.0.0 for public, 127.0.0.1 for local
API_PORT=5000 # Server port
API_KEY=your_key_here # Authentication key
API_KEYS=key1,key2 # Optional rotated keys
ENABLE_JWT=False # Optional JWT auth
JWT_SECRET=your_secret
# Application
DEBUG_MODE=True # Enable debug logging
HEADLESS_MODE=True # Run browser without GUI
SCRAPE_TIMEOUT_SECONDS=30 # Scrape timeout
SCRAPE_MAX_RETRIES=2 # Retry attempts
MAX_CONCURRENCY=3 # Max concurrent scrapes
PROXY_URLS= # Optional proxy list (comma-separated)
# CORS
ALLOWED_ORIGINS=http://localhost:8000,https://yourdomain.com
# Rate limiting
RATE_LIMIT_PER_MINUTE_KEY=60
RATE_LIMIT_PER_MINUTE_IP=120
# Cache
CACHE_TTL_SECONDS=300
CACHE_MAX_ITEMS=1000
# Readiness checks
READY_CHECK_ASIN=
READY_CHECK_COUNTRY=US
READY_CHECK_INTERVAL_SECONDS=900All API requests require authentication via API key:
Method 1: Header (Recommended)
X-API-Key: your_api_key_hereMethod 2: Query Parameter
?api_key=your_api_key_hereOptional JWT:
Authorization: Bearer <jwt>All country scrapers inherit from BaseAmazonScraper:
- Playwright browser contexts with stealth scripts
- ASIN extraction from URLs
- Common scraping methods
- Error handling
Each country has its own scraper module:
india_scraper.py- Amazon Indiausa_scraper.py- Amazon USAuk_scraper.py- Amazon UK- More countries coming soon...
API automatically detects country from product URL:
amazon.in โ India Scraper
amazon.com โ USA Scraper
amazon.co.uk โ UK Scraperuse App\Services\AmazonScraperService;
$scraper = new AmazonScraperService();
$result = $scraper->scrapeProduct('https://www.amazon.in/dp/B0FMDNZ61S');
if ($result['success']) {
$data = $result['data'];
// Use data...
}'amazon_scraper' => [
'url' => env('AMAZON_SCRAPER_URL', 'http://127.0.0.1:5000'), // Local or production URL
'api_key' => env('AMAZON_SCRAPER_API_KEY'),
'timeout' => env('AMAZON_SCRAPER_TIMEOUT', 60),
],# Local Development
AMAZON_SCRAPER_URL=http://127.0.0.1:5000
# Production
# AMAZON_SCRAPER_URL=https://your-domain.com
AMAZON_SCRAPER_API_KEY=your_api_key_here
AMAZON_SCRAPER_TIMEOUT=60For production deployment on VPS with systemd service (24/7 operation):
See INSTALL.txt for complete guide including:
- Ubuntu/Debian setup
- Systemd service configuration
- Nginx reverse proxy
- SSL certificate setup
- Firewall configuration
- Always-running configuration
- Monitoring and logs
Quick command to make it always run:
sudo systemctl enable amazon-scraper-api
sudo systemctl start amazon-scraper-api- Add country in
api_config.py:
AMAZON_COUNTRIES = {
'DE': {
'name': 'Germany',
'domain': 'amazon.de',
'currency': 'EUR',
'currency_code': 'EUR'
}
}- (Optional) Add a custom scraper if needed and register in
get_scraper_for_country.
- JSON logs to console with DEBUG_MODE=True
- Request IDs in logs
- File log:
api.log - Systemd journal:
sudo journalctl -u amazon-scraper-api
# Reinstall browser binaries
python -m playwright install chromium# Change API_PORT in .env
API_PORT=5001# Add your domain to ALLOWED_ORIGINS
ALLOWED_ORIGINS=https://yourdomain.com,http://localhost:8000# Check status
sudo systemctl status amazon-scraper-api
# View logs
sudo journalctl -u amazon-scraper-api -fContributions are welcome! See CONTRIBUTING.md for guidelines.
Ways to contribute:
- ๐ Report bugs
- ๐ก Suggest features
- ๐ Add new country scrapers
- ๐ Improve documentation
- โก Optimize performance
This project is licensed under the MIT License - see LICENSE file for details.
Disclaimer: This software is for educational purposes only. Users are responsible for complying with Amazon's Terms of Service.
If you find this project helpful:
- โญ Star the repository
- ๐ Report issues
- ๐ Submit pull requests
- ๐ข Share with others
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: your.email@example.com
Made with โค๏ธ for the developer community