A CLI tool for collecting Magic: The Gathering card prices. Features eBay API integration, advanced web scraping with CAPTCHA solving and proxy rotation, and a beautiful interactive menu interface.
Demo_Video_1080p.mov
- Interactive Menu: Easy-to-use menu interface
- Playwright Integration: Browser automation for eBay!
- Adaptive Analyzer: Auto-discovers page structure & selectors
- Flexible Search: Search by card name, set, type, or custom query
- API Integration: eBay Finding API support (official)
- Scrapy Framework: Production-grade spider with advanced features
- CAPTCHA Solving: Dual-mode (2Captcha API + local ML with TensorFlow)
- Proxy Rotation: Built-in middleware for rotating IP addresses
- Database Storage: SQLite integration with SQLAlchemy ORM
- Data Visualization: Beautiful formatted tables with filtering
- Beautiful CLI: Colorful ASCII art with styled menus
- Debug Mode: Auto-screenshots and visible browser option
- Configurable: Easy configuration for CAPTCHA, proxies, and APIs
- dbt Analytics: Transform and analyze scraped data with SQL models
- Data Quality Tests: Automated testing for data integrity
- Interactive Analytics: View price trends, statistics, and insights
Transform your scraped data into actionable insights with built-in dbt (data build tool) support!
- Price Statistics: Min/max/average prices, median, spreads by card
- Price Trends: Track price changes over time
- Top Cards: Ranked by price, volume, and "hotness" score
- Data Quality Tests: Ensure your data is clean and accurate
- Interactive Documentation: Auto-generated docs with data lineage
# Install dbt
./setup_dbt.sh
# Or manually
pip install dbt-core dbt-sqlite
# Run dbt models (from main menu option 7)
python mtgscraper.py
→ Select option 7 (Run dbt Models)
# View analytics (from main menu option 9)
→ Select option 9 (View Analytics Results)| Model | Description |
|---|---|
card_price_stats |
Price statistics by card (min, max, avg, median, spread) |
price_trends |
Daily price changes and trends over time |
top_cards |
Top cards ranked by various metrics |
dim_cards |
Card dimension table with latest info |
fct_card_prices |
All price observations (fact table) |
See DBT_README.md for detailed documentation.
- Browser Automation with real Chrome
- Adaptive structure analyzer (auto-discovers selectors)
- Bypasses robots.txt (acts like real user)
- Handles JavaScript automatically
- Stealth mode - harder to detect
- Can run in visible mode to watch it work
- Auto-screenshots for debugging
- Multiple search types: Card name, set, type, custom
- Slower but more effective than Scrapy
- eBay Browse API (RESTful)
- OAuth 2.0 with Client ID + Client Secret
- Free from eBay Developers Program
- No blocking, clean JSON responses
- Modern RESTful API with better rate limits
- Multiple search types: Card name, set, type, custom
- Traditional HTTP Scraping
- Very fast (no browser overhead)
- Production features: CAPTCHA solving, proxies, AutoThrottle
Recommendation: Use Playwright (option 1) for actual scraping, or API (option 2) for this demo.
- TCGPlayer API
- CardKingdom integration
- StarCityGames
- Card Market (Europe)
- Python 3.8 or higher
- pip package manager
- macOS, Linux, or Windows
- Clone the repository:
git clone <your-repo-url>
cd scraperTool- Quick Setup (macOS/Linux):
./setup.shManual Setup:
# Create virtual environment
python3 -m venv venv
# Activate on macOS/Linux
source venv/bin/activate
# Activate on Windows
# venv\Scripts\activate
# Install core dependencies (fast)
pip install -r requirements-minimal.txt
# OR install full features (includes TensorFlow - large download)
pip install -r requirements.txt
# Install dbt core
pip install dbt-core dbt-sqliteNote: TensorFlow is ~500MB. Use requirements-minimal.txt if you don't need local ML CAPTCHA solving.
- (Optional) If using Scrapy-Splash for JavaScript rendering:
docker run -p 8050:8050 scrapinghub/splashNote: Always activate the virtual environment (source venv/bin/activate) before running the scraper!
1. eBay Browse API (Recommended for production):
# Register at https://developer.ebay.com/
# Get your OAuth credentials (Client ID + Client Secret)
export EBAY_CLIENT_ID="your-app-id-here"
export EBAY_CLIENT_SECRET="your-cert-id-here"2. CAPTCHA Solver (Optional - Choose one):
Option A - 2Captcha (Paid, Reliable):
# Install: pip install 2captcha-python
# Register at https://2captcha.com/ (~$3 per 1000 solves)
export CAPTCHA_API_KEY="your-2captcha-api-key"Option B - Local ML Solver (Free, Experimental):
# Install: pip install tensorflow opencv-python pillow (large download ~500MB)
# Enable local ML solver
export USE_LOCAL_CAPTCHA=trueTip: Both are optional! The scraper works without CAPTCHA solving.
3. Rotating Proxies (For large-scale scraping): Get Proxy rotation with Proxy Rot here:
# Create proxies.txt with format: http://user:pass@host:port
export PROXY_LIST="proxies.txt"Make persistent:
echo 'export EBAY_CLIENT_ID="your-client-id"' >> ~/.zshrc
echo 'export EBAY_CLIENT_SECRET="your-client-secret"' >> ~/.zshrc
echo 'export CAPTCHA_API_KEY="your-key"' >> ~/.zshrcUse option 6 in the menu to check configuration status!
Simply run the script to launch the beautiful interactive menu:
python mtgscraper.pyThe menu provides:
SCRAPING METHODS:
- 1. Playwright Scraper - Browser automation (With adaptive structure analyzer)
- 2. eBay API Search - Official API
- 3. Scrapy Spider - Fast HTTP scraper (blocked by robots.txt)
VIEW & ANALYZE:
- 4. View Results - Browse collected data in formatted tables
- 5. View Card Details - See detailed information for specific cards
- 6. Database Statistics - View stats about your collection
DBT ANALYTICS:
- 7. Run dbt Models - Transform data into analytics tables
- 8. Run dbt Tests - Check data quality
- 9. View Analytics Results - See price trends and statistics
- 10. Generate dbt Docs - Create interactive documentation
EXPORT & AUTOMATE:
- 11. Export to CSV - Save results to CSV file
- 12. Upload to S3 - Upload data to AWS S3
- 13. Schedule Cron Job - Automate scraping
- 14. Remove Cron Jobs - Remove scheduled tasks
SETTINGS:
- 15. Configure Settings - Set up CAPTCHA, proxies, and API keys
- 16. Clear Database - Remove all data (with confirmation)
- 0. Exit - Quit the program
You can also use the tool directly from command line for scripting:
# Quick scrape without menu
python mtgscraper.py --card "Black Lotus" --pages 5
# Or with short flags
python mtgscraper.py -c "Lightning Bolt" -p 3-
Get your OAuth credentials:
# Register at developer.ebay.com # Get Client ID (App ID) and Client Secret (Cert ID) export EBAY_CLIENT_ID="your-app-id" export EBAY_CLIENT_SECRET="your-cert-id"
-
Run and search:
python mtgscraper.py # Select option 2 (eBay API Search) # Enter card name # OAuth token will be obtained automatically # View results with option 4
-
Install Playwright:
pip install playwright playwright install chromium
-
Run and scrape:
python mtgscraper.py # Select option 1 - Playwright (default!) # Choose search type (card name, set, type, custom) # Enter your search query # Choose headless or visible mode # Watch the adaptive analyzer discover page structure! # Get actual eBay results!
Search Examples:
- By card: "Black Lotus", "Lightning Bolt"
- By set: "Alpha", "Beta", "Modern Masters"
- By type: "creature", "planeswalker", "instant"
- Custom: "foil rare mythic"
- Get OAuth credentials and run:
export EBAY_CLIENT_ID="your-app-id" export EBAY_CLIENT_SECRET="your-cert-id" python mtgscraper.py # Select option 2 - eBay Browse API # OAuth authentication happens automatically # Choose search type # Get official data
Shows ethical scraping with Scrapy, but blocked by robots.txt:
# Select option 3
# Gets 0 results (respects robots.txt)
# Demonstrates production Scrapy architecturepython mtgscraper.py
# Select option 1 (API simulated mode available)
# Or select option 2 (basic scraping without CAPTCHA solver)| Option | Short | Description | Default |
|---|---|---|---|
--menu |
- | Use interactive menu | True |
--card |
-c |
Card name for direct scraping | - |
--pages |
-p |
Number of pages to scrape | 3 |
Examples:
# Interactive menu (default)
python mtgscraper.py
# Direct scrape
python mtgscraper.py -c "Force of Will" -p 5
# Skip menu
python mtgscraper.py --no-menu -c "Lightning Bolt"scraperTool/
├── mtgscraper/
│ ├── __init__.py # Package initialization
│ ├── settings.py # Scrapy settings (AutoThrottle, robots.txt)
│ ├── items.py # Data models for scraped items
│ ├── pipelines.py # Database pipeline with SQLAlchemy
│ ├── middlewares.py # CAPTCHA solver & proxy rotation
│ └── spiders/
│ ├── __init__.py # Spiders package initialization
│ └── ebay_spider.py # eBay scraping spider
├── mtgscraper.py # Main CLI entry point (interactive menu)
├── scrapy.cfg # Scrapy project configuration
├── requirements.txt # Python dependencies
├── setup.sh # Quick setup script
└── README.md # This file
- Playwright: Modern browser automation framework (headless Chrome/Firefox)
- Scrapy: High-performance web scraping framework
- SQLAlchemy: SQL toolkit and ORM for database operations
- Click: Command-line interface creation framework
- Colorama: Cross-platform colored terminal text
- Tabulate: Pretty-print tabular data
- BeautifulSoup4: HTML parsing library
- 2Captcha-Python: Commercial CAPTCHA solving API client
- TensorFlow: ML framework for local CAPTCHA solving (optional)
- OpenCV: Computer vision library for image processing (optional)
- Scrapy-Splash: JavaScript rendering support (optional)
The SQLite database (mtg_cards.db) contains a single table with the following structure:
| Column | Type | Description |
|---|---|---|
| id | Integer | Primary key |
| card_name | String | Name of the card |
| set_name | String | Set/edition name |
| price | String | Listed price |
| condition | String | Card condition |
| seller | String | Seller information |
| url | String | Link to listing |
| source | String | Source site (e.g., "eBay") |
| timestamp | String | Scrape timestamp |
| shipping | String | Shipping information |
| buy_it_now | Boolean | Buy It Now listing flag |
- User Agent: Modern Chrome browser user agent (looks like real user)
- Download Delay: 2 seconds between requests (randomized)
- AutoThrottle: Automatically adjusts speed based on server load
- Concurrent Requests: Limited to 2 per domain (respectful)
- Robots.txt: Enabled - respects website policies
- Cookies: Enabled for session management
- Retry Logic: Smart retry with backoff for temporary failures
Why These Settings Matter:
- Mimics human browsing behavior
- Reduces server load
- Less likely to be blocked
These settings can be customized in mtgscraper/settings.py.
To add a new scraping source:
- Create a new spider in
mtgscraper/spiders/:
# mtgscraper/spiders/tcgplayer_spider.py
import scrapy
from mtgscraper.items import MtgCardItem
class TcgplayerMtgSpider(scrapy.Spider):
name = 'tcgplayer'
allowed_domains = ['tcgplayer.com']
def start_requests(self):
# Your scraping logic here
pass
def parse(self, response):
# Your parsing logic here
pass- Update the CLI in
mtgscraper.pyto include the new source:
@click.option('--source', '-s', default='ebay',
type=click.Choice(['ebay', 'tcgplayer']),
help='Source to scrape from')- Add source-specific configuration to
mtgscraper/settings.pyif needed.
To modify the output format, edit the view command in mtgscraper.py. The tabulate library supports multiple table formats:
print(tabulate(rows, headers=headers, tablefmt='fancy_grid'))
# Other formats: 'grid', 'simple', 'plain', 'html', 'latex', etc.- Respect robots.txt: Always obey website rules (ROBOTSTXT_OBEY = True)
- Use realistic User-Agents: Identify as a real browser, not as a bot
- Control crawl rate: AutoThrottle adjusts speed based on server capacity
- Limit concurrency: Only 2 requests per domain at a time
- Randomized delays: Mimic human browsing patterns
This scraper includes production-ready features:
- CAPTCHA Solving: 2Captcha integration (configure with API key)
- Proxy Rotation: Built-in middleware for rotating proxies
- AutoThrottle: Adapts crawling speed to server load
- Smart Retries: Exponential backoff for failed requests
- API Support: Official eBay API integration
- Configuration Manager: Easy setup through menu option 6
- Distributed crawling with Scrapy Cloud
- More CAPTCHA types (hCaptcha, FunCaptcha)
- User-Agent rotation middleware
- Redis-based job queue
- Monitoring dashboard
CAPTCHA Challenges
- Option 1 (Recommended): Use 2Captcha API (~$3 per 1000 solves)
export CAPTCHA_API_KEY="your-key"
- Option 2 (Free): Enable local ML solver (experimental)
export USE_LOCAL_CAPTCHA=true- Uses TensorFlow + OpenCV
- The middleware automatically tries both methods with fallback
Import Error: No module named 'mtgscraper'
- Make sure you're running the script from the project root directory
- Ensure the virtual environment is activated:
source venv/bin/activate
Database Not Found
- Run a scrape first using option 1 in the menu
- The database is created automatically on first scrape
No Results Found
- Try a more generic card name (e.g., "Bolt" instead of "Lightning Bolt Alpha Edition")
- Increase the number of pages to scrape
- Check if eBay is blocking (see above)
Scrapy Deprecation Warnings
- Warnings about
process_start_requests()and Splash middleware are normal - These are from scrapy-splash package compatibility with Scrapy 2.13+
- The code works correctly despite the warnings
- Can be safely ignored - they're about future Scrapy versions
Twisted Reactor Issues
- Fixed by using subprocess to run Scrapy
- This allows multiple scrapes in one session
- The interactive menu now works smoothly for repeated scrapes
- TCGPlayer spider
- CardKingdom spider
- StarCityGames spider
- Card Market (Europe)
- Export to CSV/JSON
- Price history tracking over time
- Price alerts via email/SMS
- Card comparison tool
- Market analysis and statistics
- Web dashboard interface (Flask/Django)
- Rotating proxy middleware
- User-Agent rotation middleware
- CAPTCHA solving integration
- Distributed crawling with Scrapy Cloud
- Redis-based task queue
- Monitoring and alerting (Prometheus/Grafana)