A bleeding-edge Rust sports calendar scraper with intelligent parsing, rich validation, and beautiful CLI output. Extracts Arsenal FC and Springboks rugby fixtures with data quality analysis for organizing friend watching parties.
Help Ollie organize friend groups around sports events by providing reliable fixture data with quality warnings for calendar integration.
- Graceful Degradation: Exact parsing โ Weekday tolerance โ Year assumptions
- Rich Metadata: Structured
ParseMetadatawith weekday mismatch detection - Time Independence: Deterministic tests using mocked dates (works in 2027!)
- Three-Tier System: Warning โ Error โ Critical severity levels
- Date Range Filtering: Current year to 2 years future only
- Calendar-Ready Output: Rich descriptions with data quality notes
- Backend: Axum API server (ready for implementation)
- Frontend: Leptos WebAssembly (ready for implementation)
- Scraping: chromiumoxide (headless browser) + scraper for dynamic content
- CLI: Beautiful colored output showing 42 real Arsenal fixtures
- Error Handling: Comprehensive anyhow integration
# Show available commands
cargo run --bin calpal -- --help
# Scrape Arsenal fixtures with detailed output
cargo run --bin calpal -- scrape --team arsenal --verbose
# View supported teams
cargo run --bin calpal -- teams
# Run comprehensive test suite
cargo test --package fixture-scrapercalpal/
โโโ CLAUDE.md # ๐ค AI agent context and architecture docs
โโโ README.md # ๐ This file
โโโ Cargo.toml # โ๏ธ Workspace with bleeding-edge dependencies
โโโ fixture-scraper/ # ๐ง Core parsing & validation (sophisticated!)
โ โโโ src/
โ โ โโโ lib.rs # Domain models and core traits
โ โ โโโ parsing.rs # Multi-stage parsing with metadata
โ โ โโโ validation.rs # Three-tier validation system
โ โ โโโ arsenal.rs # Arsenal scraper using shared utilities
โโโ cli/ # ๐จ Beautiful command-line interface
โโโ api/ # ๐ Axum REST API (ready for implementation)
โโโ frontend/ # โก Leptos WASM app (ready for implementation)
โโโ .github/workflows/ # ๐ Automated scraping (planned)
All team scrapers implement this async trait with sophisticated validation:
#[async_trait::async_trait]
pub trait FixtureScraper {
async fn scrape(&self) -> Result<Vec<ValidatedFixture>, ScrapeError>;
fn team_name(&self) -> &str;
fn source_url(&self) -> &str;
}Bleeding-edge domain model with structured parsing metadata:
pub struct Fixture {
pub team: String,
pub opponent: String,
pub datetime: DateTime<Utc>, // Always UTC internally
pub venue: String,
pub competition: String,
pub parse_metadata: ParseMetadata, // Rich structured metadata
}
pub struct ParseMetadata {
pub original_source: String,
pub weekday_mismatch: Option<WeekdayMismatch>,
pub timezone_assumptions: String,
pub parsing_strategy: ParsingStrategy,
}pub enum FixtureValidation {
Valid, // Ready for calendar
ValidWithWarnings(Vec<Issue>), // Usable with notes
Invalid(Vec<Issue>), // Critical problems
Historical(DateTime<Utc>), // Past fixtures (filtered)
}- Rust 1.88+ (latest stable)
- cargo-leptos for frontend development
# Development
cargo watch -x "test --lib"
cargo leptos watch # Frontend development
cargo run --bin calpal # CLI tool
cargo fmt # Format code to Rust standards
cargo clippy # Rust best practices linting
# Testing
cargo test
cargo test --package scraper@0.1.0
cargo test --integration
# Building
cargo build --release
cargo leptos build # Production WASM build๐ BREAKTHROUGH ACHIEVED: Real Arsenal Data Working!
- 42 Real Arsenal Fixtures - Successfully scraping live data with proper venues
- Headless Browser Integration - chromiumoxide handling dynamic JavaScript content
- Real Venue Data - Emirates Stadium, Old Trafford, Anfield, international venues
- Complete Season Coverage - Friendlies, Premier League, cup competitions through May 2026
- Production-Ready Pipeline - Scraping โ Parsing โ Validation โ Beautiful CLI display
โ PRODUCTION-READY EXCELLENCE ACHIEVED
- 37/37 Tests Passing - Comprehensive coverage including integration and browser tests
- Zero Clippy Warnings - Clean, idiomatic Rust code throughout codebase
- Zero Compiler Warnings - No dead code, unused imports, or lint issues
- Arsenal scraper producing 42 fixtures - From teaser elements to real accordion data
- Multi-stage parsing system - Exact โ Weekday tolerance โ Year assumptions
- Rich validation system - Three-tier Warning/Error/Critical classification
- Beautiful CLI interface - Professional colored output showing real fixture data
- Time-independent tests - Deterministic behavior using mocked dates
- Structured metadata - ParseMetadata with headless browser support
- Production build verified - Release compilation successful
๐ง Next Phase: Expansion
- Implement Springboks scraper using proven headless browser approach
- Add calendar export functionality (ICS generation)
- GitHub Actions for automated scraping
- Nested CLAUDE.md documentation for growing codebase
๐ฎ Future Interface
- Axum API server with ValidatedFixture endpoints
- Leptos frontend with data quality indicators
- Rich calendar integration showing parsing metadata
- OAuth Google Calendar integration with quality warnings
- Arsenal: https://www.arsenal.com/fixtures (clean HTML with times/venues)
- Springboks:
- Internal Storage: Always UTC using
chrono::DateTime<Utc> - Source Parsing: Handle SAST (UTC+2) from .co.za sites, GMT from UK sites
- User Display: Convert to London time (GMT/BST) for UK users
- Libraries:
chronoandchrono-tzfor robust timezone handling - Daylight Saving: Proper GMT โ BST transitions
- Test-Driven Development: Tests define the specification
- AI Guardrails: Tests prevent context drift and maintain code quality
- Direct & Achievable: Focus on essential functionality
- Never Change Tests to Fit Code: Fix the implementation, not the tests
- 4-space indentation (Rust standard via
cargo fmt) - Explicit error handling (no
unwrap()in production code) - Descriptive variable names
- Comprehensive documentation with examples
- Clever & Teachable: Elegant solutions that demonstrate advanced Rust patterns while remaining understandable
- Zero-cost Abstractions: Leverage Rust's type system for compile-time guarantees
- Functional Composition: Use iterator chains, combinators, and type-driven design
- NeoVim-friendly terminal workflows
This is a learning project that showcases sophisticated Rust patterns through clear, well-tested implementations. Code should be both clever and teachable - demonstrating advanced techniques while maintaining readability and comprehensive documentation.
- anyhow 1.0.98
- chrono 0.4.41 + chrono-tz 0.10.4
- serde 1.0.219 + serde_json 1.0.141
- tokio 1.46.1
- reqwest 0.12.22
- scraper 0.23.1
- chromiumoxide 0.7.0 + futures 0.3 (headless browser for dynamic content)
- clap 4.5.41 + colored 3.0.0
- axum 0.8.4 + tower 0.5.2
- leptos 0.8.5 + wasm-bindgen 0.2.100
- async-trait 0.1.88
- mockall 0.13.1 (testing)
Built with Rust ๐ฆ - Teaching advanced patterns through elegant implementation