A robust and scalable toolkit for extracting data from the Bedetheque.com portal. Designed to work as a flexible research wrapper for comic book metadata.
- Comprehensive Scraping: Detailed extraction of Authors, Series, Albums, and Magazines (Revues).
- Relational Mapping: Automatically maps connections between authors and their works.
- Async Architecture: Built on top of
httpxandSQLAlchemy Asyncfor high-performance data handling. - Throttling: Integrated rate limiting to respect server boundaries.
- Clean Models: Rich domain models for easy integration into your own applications.
- Python 3.11+
- Async-compatible database (e.g., PostgreSQL or SQLite)
pip install -r requirements.txt(Ensure sqlalchemy, asyncpg, httpx, beautifulsoup4, and python-dotenv are installed)
Create a .env file in the root directory (see .env.example):
DATABASE_URL=postgresql+asyncpg://user:password@host:5432/postgres
REQUESTS_PER_SECOND=2Check out the scripts/examples directory for ready-to-use research scripts:
- Search Series:
python scripts/examples/example_serie.py - Research Authors:
python scripts/examples/example_auteur.py - Browse Magazines:
python scripts/examples/example_revue.py
bedetheque/: Core library (Models, Parsers, Scrapers, Repositories).scripts/: Operational and utility scripts.scripts/examples/: Reference implementations for each entity research.
- HTTP Client: HTTPX
- HTML Parsing: BeautifulSoup4
- Data Layer: SQLAlchemy 2.0 (Async)
Developed with ❤️ for the comic book collector community.