Casino Intelligence Collector

An end-to-end system for automatically gathering structured data about online casinos. This tool collects comprehensive information including licensing, RTP (Return to Player), fairness certifications, game providers, security measures, withdrawal methods, and reviews.

🌐 Live Control Dashboard

Try the interactive web dashboard: https://jmenichole.github.io/casino-scraper-agent/

The control dashboard provides a modern web interface to:

View casino data with interactive charts and statistics
Browse and search collected casinos
Upload and analyze your own data files
View detailed information for each casino
Toggle between light and dark themes

See the Dashboard Documentation for more details.

Features

Comprehensive Data Collection: Gathers 7+ categories of casino information
- Licensing information (authority, jurisdiction, verification)
- RTP (Return to Player) percentages
- Fairness certifications (eCOGRA, iTech Labs, etc.)
- Game providers (NetEnt, Microgaming, etc.)
- Security measures (SSL, encryption, 2FA)
- Withdrawal methods and policies
- Reviews and ratings
Multiple Output Formats:
- JSON (detailed structured data)
- CSV (flattened for analysis)
- Text summary reports
Configurable & Extensible:
- JSON-based configuration
- Environment variable support
- Customizable scraping delays
- Extensible scraper architecture
Respectful Scraping:
- Configurable delays between requests
- User-agent rotation
- Error handling and retry logic

Installation

Prerequisites

Python 3.8 or higher
pip package manager

Setup

Clone the repository:

git clone https://github.com/jmenichole/casino-scraper-agent.git
cd casino-scraper-agent

Install dependencies:

pip install -r requirements.txt

Quick Start

Scrape a Single Casino

python -m casino_collector.cli --url https://example-casino.com

Scrape Multiple Casinos from a File

Create a text file with casino URLs (one per line, or mixed with other text):

https://casino1.com
https://casino2.com
https://casino3.com

Run the scraper:

python -m casino_collector.cli --file casinos.txt

Note: The file reader now intelligently extracts URLs from mixed content, so you can use files with descriptions, comments, and other text mixed in with the URLs.

Copy-Paste a List of Casinos (NEW!)

You can now directly paste a list of casino URLs without creating a file:

python -m casino_collector.cli --list "https://casino1.com
https://casino2.com
https://casino3.com"

This method automatically extracts URLs from the text you paste, so you can copy content like:

My favorite casinos:
https://casino1.com - Great bonuses
https://casino2.com
Check out https://casino3.com too!

And the tool will extract all valid URLs automatically!

Use a Configuration File

Copy the example config:

cp config.example.json config.json

Edit config.json with your settings and target URLs
Run with the config:

python -m casino_collector.cli --config config.json

Usage

Command Line Options

usage: python -m casino_collector.cli [-h] (-u URL | -f FILE | -l LIST | --config CONFIG)
                                       [-o OUTPUT] [--no-json] [--no-csv]
                                       [--no-summary] [--delay-min DELAY_MIN]
                                       [--delay-max DELAY_MAX]
                                       [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                                       [--log-file LOG_FILE] [--quiet] [--version]

Options:
  -h, --help            Show this help message and exit
  -u URL, --url URL     Single casino URL to scrape
  -f FILE, --file FILE  File containing list of casino URLs (one per line, or mixed with text)
  -l LIST, --list LIST  Multi-line list of casino URLs or text containing URLs (paste directly)
  --config CONFIG       Path to configuration file (JSON)
  -o OUTPUT, --output OUTPUT
                        Output directory for results (default: output)
  --no-json             Skip JSON output
  --no-csv              Skip CSV output
  --no-summary          Skip summary report generation
  --delay-min DELAY_MIN
                        Minimum delay between requests in seconds (default: 1.0)
  --delay-max DELAY_MAX
                        Maximum delay between requests in seconds (default: 3.0)
  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Logging level (default: INFO)
  --log-file LOG_FILE   Path to log file
  --quiet               Suppress console output (only log to file)
  --version             Show version number

Examples

Scrape with custom output directory:

python -m casino_collector.cli --url https://example-casino.com --output ./data

Copy-paste a list of casinos directly:

python -m casino_collector.cli --list "https://casino1.com
https://casino2.com
https://casino3.com"

Copy-paste with mixed content (extracts URLs automatically):

python -m casino_collector.cli --list "Affiliate links:
https://www.casino1.com/?ref=xyz - Great bonuses
https://www.casino2.com
Check out https://www.casino3.com for rewards"

Scrape with custom delays (be more respectful):

python -m casino_collector.cli --file casinos.txt --delay-min 2 --delay-max 5

Generate only JSON output:

python -m casino_collector.cli --url https://example-casino.com --no-csv --no-summary

Debug mode with detailed logging:

python -m casino_collector.cli --url https://example-casino.com --log-level DEBUG

Web Dashboard

The Casino Intelligence Collector includes a modern web-based control dashboard for viewing and analyzing collected data.

Features

Interactive Dashboard: Real-time statistics and charts
Casino Browser: Search, filter, and sort casinos
Detailed Views: Click any casino for comprehensive information
Data Upload: Load your own JSON files
Dark Mode: Toggle between light and dark themes
Responsive Design: Works on desktop, tablet, and mobile

Accessing the Dashboard

Visit the live dashboard: https://jmenichole.github.io/casino-scraper-agent/

Or run it locally:

cd docs
python -m http.server 8000
# Visit http://localhost:8000

Using the Dashboard

Load Data: Upload a JSON file from the output/ directory or use sample data
Explore: View statistics, charts, and browse all casinos
Details: Click any casino card to see comprehensive information
Theme: Toggle between light and dark modes

See the Dashboard Documentation for complete details.

Output Files

The collector generates three types of output files in the specified output directory:

1. JSON Output (`casino_data_YYYYMMDD_HHMMSS.json`)

Complete structured data in JSON format with all collected information:

[
  {
    "name": "Example Casino",
    "url": "https://example-casino.com",
    "description": "Casino description...",
    "licenses": [
      {
        "authority": "Malta Gaming Authority",
        "license_number": "MGA/B2C/123/2020",
        "jurisdiction": "Malta",
        "verified": false
      }
    ],
    "rtp_info": [...],
    "fairness": [...],
    "providers": [...],
    "security": {...},
    "withdrawal_methods": [...],
    "reviews": [...],
    "collection_date": "2024-01-01T12:00:00",
    "data_completeness_score": 85.7
  }
]

2. CSV Output (`casino_data_YYYYMMDD_HHMMSS.csv`)

Flattened data suitable for spreadsheet analysis with key metrics and aggregated information.

3. Summary Report (`casino_summary_YYYYMMDD_HHMMSS.txt`)

Human-readable text report with:

Overall statistics
Data completeness metrics
Individual casino summaries

Data Model

CasinoData

The main data structure containing:

name: Casino name
url: Casino website URL
description: Brief description
licenses: List of licensing information
- authority: Licensing authority name
- license_number: License number (if available)
- jurisdiction: Country/region
- verified: Verification status
rtp_info: List of RTP percentages
- game_name: Name of the game
- rtp_percentage: RTP percentage (0-100)
- game_category: Category (slots, table games, etc.)
- provider: Game provider
fairness: List of fairness certifications
- testing_agency: Name of testing agency
- certification: Certification details
- certified: Certification status
providers: List of game providers
- name: Provider name
- games_count: Number of games
- popular_games: List of popular titles
security: Security information
- ssl_certificate: SSL status
- encryption_type: Type of encryption
- two_factor_auth: 2FA availability
- responsible_gambling_tools: List of RG tools
- data_protection_compliance: Compliance (GDPR, etc.)
withdrawal_methods: List of withdrawal options
- method: Payment method name
- min_amount: Minimum withdrawal
- max_amount: Maximum withdrawal
- processing_time: Processing duration
- fees: Fee information
reviews: List of reviews
- source: Review platform
- rating: Rating (0-5)
- review_count: Number of reviews
- positive_aspects: List of pros
- negative_aspects: List of cons
collection_date: When data was collected
data_completeness_score: Completeness percentage (0-100)

Configuration

Configuration File Format

Create a config.json file:

{
  "scraper": {
    "delay_min": 1.0,
    "delay_max": 3.0,
    "timeout": 30,
    "max_retries": 3
  },
  "storage": {
    "output_dir": "output",
    "auto_save_json": true,
    "auto_save_csv": true,
    "generate_summary": true
  },
  "logging": {
    "level": "INFO",
    "log_file": "logs/casino_collector.log"
  },
  "target_urls": [
    "https://casino1.com",
    "https://casino2.com"
  ]
}

Environment Variables

You can also configure via environment variables:

SCRAPER_DELAY_MIN: Minimum delay between requests
SCRAPER_DELAY_MAX: Maximum delay between requests
OUTPUT_DIR: Output directory path
LOG_LEVEL: Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)

Architecture

The system is organized into modular components:

casino_collector/
├── __init__.py           # Package initialization
├── __main__.py           # Module entry point
├── cli.py                # Command-line interface
├── config.py             # Configuration management
├── models/               # Data models
│   └── __init__.py       # Pydantic models for casino data
├── scrapers/             # Scraping modules
│   ├── __init__.py
│   ├── base.py           # Base scraper class
│   └── generic_scraper.py # Generic casino scraper
├── storage/              # Data storage and export
│   └── __init__.py       # Storage handlers
└── utils/                # Utility functions
    └── __init__.py       # Helper functions

Extending the System

Adding a Custom Scraper

Create a new scraper by extending the BaseScraper class:

from casino_collector.scrapers.base import BaseScraper
from casino_collector.models import CasinoData

class CustomCasinoScraper(BaseScraper):
    def scrape(self, url: str, **kwargs) -> CasinoData:
        # Your custom scraping logic
        casino_data = CasinoData(name="Casino Name", url=url)
        # ... collect data ...
        return casino_data

Adding New Data Categories

Extend the data models in casino_collector/models/__init__.py:

class NewCategory(BaseModel):
    field1: str
    field2: int

class CasinoData(BaseModel):
    # ... existing fields ...
    new_category: Optional[NewCategory] = None

Best Practices

Respect Robots.txt: Always check a website's robots.txt before scraping
Use Appropriate Delays: Default 1-3 seconds between requests; increase if needed
Monitor Resources: Watch memory usage when scraping many sites
Validate Data: Review collected data for accuracy
Legal Compliance: Ensure you have permission to scrape target websites
Rate Limiting: Don't overwhelm target servers with requests

Limitations

This is a generic scraper that works best with well-structured websites
Some data may not be available on all casino websites
Accuracy depends on the structure and availability of information on target sites
Dynamic content (JavaScript-rendered) may not be captured by the basic scraper
Some casinos may block automated scraping attempts

Troubleshooting

No data collected

Check if the URL is accessible
Verify the website structure hasn't changed
Try increasing the timeout value
Check logs for specific error messages

Low completeness scores

Some data categories may not be available on the website
The scraper may need customization for specific sites
Consider creating site-specific scrapers for better accuracy

Request errors

Website may be blocking automated requests
Check your internet connection
Verify the URL is correct and accessible
Try increasing delays between requests

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

This project is provided as-is for educational and research purposes.

Disclaimer

This tool is for research and educational purposes only. Users are responsible for:

Complying with target websites' terms of service
Respecting robots.txt files
Following applicable laws and regulations
Obtaining necessary permissions before scraping

The authors are not responsible for misuse of this tool.

Support

For issues, questions, or suggestions, please open an issue on GitHub.

Changelog

Version 1.0.0

Initial release
Generic casino scraper implementation
JSON, CSV, and text report outputs
Command-line interface
Configuration file support
Comprehensive data model with 7+ categories

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
casino_collector		casino_collector
docs		docs
.gitignore		.gitignore
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
casinos.example.txt		casinos.example.txt
casinoslist.md		casinoslist.md
config.example.json		config.example.json
demo.py		demo.py
example.py		example.py
example_paste.py		example_paste.py
requirements.txt		requirements.txt
setup.py		setup.py
test_system.py		test_system.py

Folders and files

Latest commit

History

Repository files navigation

Casino Intelligence Collector

🌐 Live Control Dashboard

Features

Installation

Prerequisites

Setup

Quick Start

Scrape a Single Casino

Scrape Multiple Casinos from a File

Copy-Paste a List of Casinos (NEW!)

Use a Configuration File

Usage

Command Line Options

Examples

Web Dashboard

Features

Accessing the Dashboard

Using the Dashboard

Output Files

1. JSON Output (casino_data_YYYYMMDD_HHMMSS.json)

2. CSV Output (casino_data_YYYYMMDD_HHMMSS.csv)

3. Summary Report (casino_summary_YYYYMMDD_HHMMSS.txt)

Data Model

CasinoData

Configuration

Configuration File Format

Environment Variables

Architecture

Extending the System

Adding a Custom Scraper

Adding New Data Categories

Best Practices

Limitations

Troubleshooting

No data collected

Low completeness scores

Request errors

Contributing

License

Disclaimer

Support

Changelog

Version 1.0.0

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. JSON Output (`casino_data_YYYYMMDD_HHMMSS.json`)

2. CSV Output (`casino_data_YYYYMMDD_HHMMSS.csv`)

3. Summary Report (`casino_summary_YYYYMMDD_HHMMSS.txt`)

Packages