Simple ComBase data scraper with English interface.
- Install dependencies:
pip install -r config/requirements.txt- Run the scraper:
Single Thread (Simple):
python simple_scraper.pyParallel (10 Threads - Faster):
python parallel_scraper.py- Press
Ctrl+Cto stop safely
- Parallel Processing: 10 threads for 10x speed improvement
- Search Delay: 2-minute wait after search before scraping starts
- Deduplication: Removes duplicate food parts from organism names
- Thread-Safe: Real-time progress tracking across all threads
- Data saved to
data/directory - Each file contains 1,000 records
- Complete organism names with ID, name, and food description