Skip to content

vsoeiro/bedetheque

Repository files navigation

Bedetheque Scraper

A robust and scalable toolkit for extracting data from the Bedetheque.com portal. Designed to work as a flexible research wrapper for comic book metadata.

✨ Features

  • Comprehensive Scraping: Detailed extraction of Authors, Series, Albums, and Magazines (Revues).
  • Relational Mapping: Automatically maps connections between authors and their works.
  • Async Architecture: Built on top of httpx and SQLAlchemy Async for high-performance data handling.
  • Throttling: Integrated rate limiting to respect server boundaries.
  • Clean Models: Rich domain models for easy integration into your own applications.

🚀 Getting Started

1. Prerequisites

  • Python 3.11+
  • Async-compatible database (e.g., PostgreSQL or SQLite)

2. Installation

pip install -r requirements.txt

(Ensure sqlalchemy, asyncpg, httpx, beautifulsoup4, and python-dotenv are installed)

3. Configuration

Create a .env file in the root directory (see .env.example):

DATABASE_URL=postgresql+asyncpg://user:password@host:5432/postgres
REQUESTS_PER_SECOND=2

4. Basic Usage

Check out the scripts/examples directory for ready-to-use research scripts:

  • Search Series: python scripts/examples/example_serie.py
  • Research Authors: python scripts/examples/example_auteur.py
  • Browse Magazines: python scripts/examples/example_revue.py

📁 Project Structure

  • bedetheque/: Core library (Models, Parsers, Scrapers, Repositories).
  • scripts/: Operational and utility scripts.
  • scripts/examples/: Reference implementations for each entity research.

🛠️ Built With


Developed with ❤️ for the comic book collector community.

About

A robust and scalable Python toolkit for scraping and analyzing comic book metadata from Bedetheque.com

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors