A comprehensive Python-based web scraper designed to extract shop information from Etsy, including sales data, contact information, and revenue analytics through the EverBee API integration.
- 🌐 Multi-Platform Contact Scraping: Extracts contact information from Instagram, Facebook, Pinterest, Twitter/X, and general websites
- 📊 EverBee API Integration: Retrieves average product prices and monthly revenue data
- 🎯 Intelligent Filtering: Filter shops based on revenue thresholds and contact information availability
- 🔄 Resume Capability: Automatically resumes scraping from where it left off using backup files
- 🛡️ Rate Limiting Protection: Built-in retry logic with exponential backoff to handle API rate limits
- 📝 Comprehensive Logging: Detailed logging system for monitoring scraping progress and debugging
- 🐍 Python 3.7+
- 🌐 Chrome browser installed
- 🔑 Valid EverBee account and login credentials
- 📁
category_m.jsonfile containing Etsy category URLs
- Clone the repository:
git clone https://github.com/yourusername/etsy-scraper.git
cd etsy-scraper- Install required dependencies:
pip install -r requirements.txtrequests
undetected-chromedriver
selenium
beautifulsoup4
cloudscraper
-
🔐 EverBee Authentication:
- The script will automatically open EverBee in Chrome
- Login manually when prompted
- The script will extract and store your authentication token
-
📂 Category Data:
- Ensure
category_m.jsonexists in the project directory - This file should contain structured category data with URLs to scrape
- Ensure
Run the script and choose your extraction mode:
python main.py- 📧 Extract if email: Only saves shops that have email addresses
- 📱 Extract if phone number: Only saves shops that have phone numbers
- 📧📱 Extract if email AND phone number: Only saves shops that have both
- 📧📱 Extract if email OR phone number: Saves shops that have either contact method
- 📊
shop_data.csv: Main output file containing all scraped shop data - 📋
scraper.log: Detailed logging information - ❌
unprocessed_shops.txt: List of shops that couldn't be processed due to errors
| Column | Description |
|---|---|
| 🏪 Shop Name | Etsy shop name |
| 🔗 Shop URL | Direct link to the Etsy shop |
| 📊 Sales Number | Total number of sales |
| 💰 Average Product Price | Average price of products (from EverBee) |
| 💵 Monthly Revenue | Estimated monthly revenue (from EverBee) |
| 📱 Phone | Extracted phone numbers |
| Extracted email addresses |
The script automatically filters out shops with monthly revenue less than $5,000. You can modify this threshold in the main() function:
if monthly_revenue < 5000: # Change this value
continueDefault settings include:
- ⏰ Initial delay: 30 seconds for API requests
- 🎲 Random delays: 1-3 seconds between shop requests
- 📦 Batch processing: 5 shops before taking a break
The scraper attempts to extract contact information from:
- 📱 Shop's social media profiles (Instagram, Facebook, Pinterest, Twitter/X)
- 🌐 Shop's external website links
- 📄 About/Contact pages
- 📝 Social media bio sections
- ⏳ Rate Limiting: Automatic retry with exponential backoff
- 🌐 Network Errors: Request retry logic with configurable attempts
- ❌ Invalid Data: Graceful error handling with logging
- 🔄 Resume Capability: Automatically skips already processed shops
The script creates detailed logs including:
- 📈 Processing progress
- 🔄 API response status
- ❌ Error messages and stack traces
- ⏰ Rate limiting notifications
- 📊 Shop processing statistics
- 📋 Etsy's Terms of Service
- 🔒 Applicable data protection laws (GDPR, CCPA, etc.)
- 🌐 Website scraping best practices
- ⏱️ Rate limiting and respectful scraping practices
-
🌐 Chrome Driver Issues:
- Ensure Chrome browser is installed and up to date
- The script uses undetected-chromedriver which should handle driver management
-
🔑 EverBee Authentication:
- Make sure you're logged into EverBee in the browser
- Token extraction happens automatically after login
-
⏰ Rate Limiting:
- The script includes built-in rate limiting
- If you encounter persistent rate limits, consider increasing delay times
-
📁 Missing category_m.json:
- Ensure the category file exists and contains valid JSON structure
- Check that URLs in the category file are accessible
- 🍴 Fork the repository
- 🌟 Create a feature branch (
git checkout -b feature/amazing-feature) - 💾 Commit your changes (
git commit -m 'Add some amazing feature') - 📤 Push to the branch (
git push origin feature/amazing-feature) - 🔄 Open a Pull Request
This tool is provided as-is for educational purposes. Users are responsible for ensuring their use complies with all applicable laws and website terms of service. The authors are not responsible for any misuse of this tool.
If you encounter issues or have questions:
- 🔍 Check the existing issues on GitHub
- 📋 Review the log files for error details
- 🆕 Create a new issue with detailed information about the problem