A comprehensive, cloud-agnostic data analytics platform for Formula 1 racing data. This system extracts real-time and historical F1 data from the OpenF1 API, processes it through multiple analytical layers, and prepares features for AI/ML applications.
This platform demonstrates my approach to solving complex data engineering challenges through principled architecture and iterative delivery. Having transitioned into leadership roles, I continue to leverage hands-on technical experience to guide teams through architectural decisions, technical debt management, and scaling challenges.
Leadership Through Technical Expertise:
- Solution Architecture: Transform ambiguous requirements into clear technical specifications
- Risk Mitigation: Build systems that gracefully handle failure and change
- Technical Mentorship: Create learning opportunities through well-structured, documented code
- Cloud-Agnostic Architecture: Deploy on AWS, Azure, GCP, or run locally
- Real-time Data Ingestion: Live F1 data from OpenF1 API
- Multi-Layer Analytics: Raw data → Analytics → AI-ready features
- Comprehensive Testing: Unit tests, integration tests, cloud mocking
- Scalable Design: Configurable for small datasets to production workloads
- AI/ML Ready: Feature engineering for driver performance, team analytics, and race predictions
┌─────────────────────────────────────────────────────────────┐
│ F1 Data Platform Architecture │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ OpenF1 │ │ Extract │ │ Raw Data │ │
│ │ API │───▶│ Transform │───▶│ Layer │ │
│ │ │ │ Load │ │ │ │
│ └─────────────┘ └──────────────┘ └─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ Analytics │◀───│ Data Trans- │◀───│ Analytics │ │
│ │ Layer │ │ formation │ │ Layer │ │
│ │ │ │ Pipeline │ │ │ │
│ └─────────────┘ └──────────────┘ └─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ AI/ML │◀───│ Feature │◀───│ AI Features │ │
│ │ Applications│ │ Engineering │ │ Layer │ │
│ │ │ │ │ │ │ │
│ └─────────────┘ └──────────────┘ └─────────────┘ │
│ │
├─────────────────────────────────────────────────────────────┤
│ Cloud Abstraction Layer │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────────┐ │
│ │ AWS │ │ Azure │ │ GCP │ │ Local │ │
│ │ S3/RDS │ │Blob/SQL │ │Storage/ │ │ File/SQLite │ │
│ │ │ │ │ │BigQuery │ │ │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
- Python 3.9+
- Git
-
Clone the repository
git clone <repository-url> cd showcase-f1-pipeline
-
Install dependencies
pip install -r requirements.txt
-
Run tests
python run_tests.py --type unit
-
Basic usage
from f1_data_platform import F1DataPlatform # Initialize platform platform = F1DataPlatform(environment="local") # Extract data for 2023 season stats = pipeline.extract_year_data(2023) print(f"Extracted {stats['total_records']} records")
Stores unprocessed data directly from OpenF1 API:
- Meetings: Grand Prix events and scheduling
- Sessions: Practice, qualifying, sprint, race sessions
- Drivers: Driver information and team assignments
- Car Data: Telemetry, positions, speed traces
- Weather: Track conditions and weather data
Processed and aggregated data for analysis:
- Race Results: Final positions, points, lap times
- Driver Performance: Session statistics, consistency metrics
- Team Analytics: Constructor standings, strategy analysis
- Lap Analysis: Sector times, tire strategies, pit stops
# Basic configuration
F1_DATA_PLATFORM_ENV=local # local, development, production
F1_DATA_PLATFORM_LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR
# Cloud provider selection
F1_CLOUD_PROVIDER=local # local, aws, azure, gcp
# AWS Configuration (if using AWS)
AWS_REGION=us-east-1
AWS_S3_BUCKET=your-f1-bucket
AWS_RDS_ENDPOINT=your-rds-endpoint
# Azure Configuration (if using Azure)
AZURE_STORAGE_ACCOUNT=yourstorageaccount
AZURE_STORAGE_CONTAINER=f1-data
AZURE_SQL_SERVER=your-sql-server
# GCP Configuration (if using GCP)
GCP_PROJECT_ID=your-project-id
GCP_BUCKET_NAME=your-f1-bucket
GCP_DATASET_ID=f1_datasetCreate config.yaml for advanced configuration:
environment: local
log_level: INFO
storage:
provider: local
local_path: ./data
# aws:
# bucket_name: f1-data-bucket
# region: us-east-1
database:
provider: local
db_path: ./f1_data.db
# aws_rds:
# endpoint: your-endpoint.rds.amazonaws.com
# database: f1_analytics
# username: f1_user
api:
base_url: https://api.openf1.org/v1
rate_limit: 100
retry_attempts: 3
timeout: 30
processing:
batch_size: 1000
parallel_workers: 4
memory_limit_mb: 2048python run_tests.py# Unit tests only
python run_tests.py --type unit
# Integration tests only
python run_tests.py --type integration
# Fast tests (excluding slow integration tests)
python run_tests.py --type fast
# Cloud-specific tests
python run_tests.py --type aws
python run_tests.py --type azure
python run_tests.py --type local# Run tests with coverage report
python run_tests.py --verbose
# View HTML coverage report
open htmlcov/index.html # macOS/Linux
start htmlcov/index.html # Windowsfrom f1_data_platform.config.settings import Settings
from f1_data_platform.cloud_swap import CloudProviderFactory
from f1_data_platform.extractors import DataExtractor
# Setup
settings = Settings(environment="local")
cloud_provider = CloudProviderFactory.create("local", settings.get_cloud_provider_config())
extractor = DataExtractor(settings, cloud_provider)
# Extract data for 2023 season
stats = extractor.extract_year_data(2023, save_raw=True, save_to_db=True)
print(f"Processed {stats['endpoints_processed']} endpoints")
print(f"Total records: {stats['total_records']}")from f1_data_platform.transformers import DataTransformer
# Initialize transformer
transformer = DataTransformer(settings, cloud_provider)
# Create analytics tables
transformer.setup_analytics_tables()
# Transform specific session data
session_key = 9158 # Example session
analytics_data = transformer.process_session_analytics(session_key)from f1_data_platform.transformers import AIPreparationTransformer
# Initialize AI transformer
ai_transformer = AIPreparationTransformer(settings, cloud_provider)
# Setup AI feature tables
ai_transformer.setup_ai_tables()
# Generate driver performance features
driver_features = ai_transformer.create_driver_performance_features(2023)
print(f"Generated features for {len(driver_features)} drivers")# Switch between cloud providers without code changes
aws_settings = Settings(
environment="production",
storage=StorageConfig(
provider="aws",
bucket_name="production-f1-data",
region="us-east-1"
),
database=DatabaseConfig(
provider="aws_rds",
endpoint="prod.rds.amazonaws.com",
database="f1_analytics"
)
)
aws_provider = CloudProviderFactory.create("aws", aws_settings.get_cloud_provider_config())# Check system health
health_status = extractor.health_check()
print(f"API Status: {health_status['api_accessible']}")
print(f"Storage Status: {health_status['storage_accessible']}")
# Check cloud provider health
cloud_health = cloud_provider.health_check()
print(f"Storage Health: {cloud_health['storage']}")
print(f"Database Health: {cloud_health['database']}")f1_data_platform/
├── __init__.py # Package initialization
├── config/
│ ├── __init__.py
│ └── settings.py # Configuration management
├── cloud_swap/ # Cloud abstraction layer
│ ├── __init__.py
│ ├── interfaces/ # Abstract base classes
│ ├── providers/ # Cloud-specific implementations
│ └── factory.py # Provider factory
├── extractors/ # Data extraction
│ ├── __init__.py
│ ├── openf1_client.py # OpenF1 API client
│ └── data_extractor.py # Main extraction logic
├── models/ # Data models and schemas
│ ├── __init__.py
│ ├── raw_data.py # Raw data models
│ ├── analytics.py # Analytics models
│ ├── ai_features.py # AI feature models
│ └── schemas.py # Schema management
├── transformers/ # Data transformation
│ ├── __init__.py
│ ├── data_transformer.py # Analytics transformation
│ └── ai_transformer.py # AI feature engineering
├── storage/ # Storage utilities
│ ├── __init__.py
│ └── utils.py # Storage helper functions
└── utils/ # General utilities
├── __init__.py
├── logging.py # Logging configuration
└── helpers.py # Helper functions
tests/
├── __init__.py
├── conftest.py # Test configuration
├── pytest.ini # Pytest configuration
├── unit/ # Unit tests
│ ├── test_openf1_client.py
│ └── test_cloud_swap.py
└── integration/ # Integration tests
└── test_pipeline_integration.py
docs/ # Documentation
├── architecture.md # Architecture details
├── api_reference.md # API documentation
└── deployment.md # Deployment guides
- Extraction: OpenF1 API → Raw Data Storage
- Analytics Processing: Raw Data → Analytics Tables
- Feature Engineering: Analytics Data → AI Features
- AI/ML Consumption: Features → Models → Insights
Each step is independently executable and resumable, enabling flexible processing workflows.
- Implement
StorageProviderandDatabaseProviderinterfaces - Create provider-specific implementation in
cloud_swap/providers/ - Register in
CloudProviderFactory - Add configuration schema in
settings.py
- Create extractor class implementing data source interface
- Add data models in
models/ - Implement transformation logic in
transformers/ - Add comprehensive tests
- Extend
DataTransformerwith custom analytics methods - Add new analytics models in
models/analytics.py - Create database tables via
SchemaManager - Implement validation and testing
Main class for extracting F1 data from OpenF1 API.
class DataExtractor:
def extract_year_data(self, year: int, save_raw: bool = True, save_to_db: bool = True) -> Dict
def extract_session_data(self, session_key: int) -> Dict
def health_check(self) -> DictProcesses raw data into analytics-ready format.
class DataTransformer:
def setup_analytics_tables(self) -> None
def process_session_analytics(self, session_key: int) -> pd.DataFrame
def calculate_lap_times(self, session_key: int) -> pd.DataFrameGenerates AI/ML ready features.
class AIPreparationTransformer:
def setup_ai_tables(self) -> None
def create_driver_performance_features(self, year: int) -> pd.DataFrame
def create_team_strategy_features(self, year: int) -> pd.DataFrame# Set environment
export F1_DATA_PLATFORM_ENV=local
export F1_CLOUD_PROVIDER=local
# Run extraction
python -m f1_data_platform.extractors.data_extractor --year 2023# Configure AWS credentials
aws configure
# Set environment variables
export F1_DATA_PLATFORM_ENV=production
export F1_CLOUD_PROVIDER=aws
export AWS_S3_BUCKET=production-f1-data
export AWS_RDS_ENDPOINT=prod.rds.amazonaws.com
# Deploy
python deploy_aws.pyFROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY f1_data_platform/ ./f1_data_platform/
COPY config.yaml .
CMD ["python", "-m", "f1_data_platform.extractors.data_extractor", "--year", "2023"]- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
# Clone and setup
git clone <repository-url>
cd showcase-f1-pipeline
pip install -r requirements-dev.txt
# Run tests
python run_tests.py --type unit
# Run linting
flake8 f1_data_platform/
black f1_data_platform/
mypy f1_data_platform/This project is licensed under the MIT License - see the LICENSE file for details.
- OpenF1 API: Providing comprehensive F1 data
- Formula 1: For the amazing sport that makes this data possible
- Contributors: Everyone who helps improve this project
- Issues: GitHub Issues
- Documentation: Full Documentation
- API Reference: API Docs
Built with ❤️ for Formula 1 fans and data enthusiasts!
Direct mappings from OpenF1 API endpoints:
meetings: Grand Prix informationsessions: Practice, qualifying, race sessionsdrivers: Driver information per sessionlaps: Detailed lap informationcar_data: Telemetry dataposition: Position changes throughout sessionspit: Pit stop informationweather: Weather conditions
Business-focused aggregated tables:
grand_prix_results: Race results with championship implicationsgrand_prix_performance: Performance metrics and comparisonsdriver_championships: Championship standings over timeteam_performance: Constructor championship data
ML-ready datasets:
driver_performance_features: Feature-engineered driver datarace_context_features: Contextual race informationtelemetry_aggregates: Aggregated telemetry for modeling
Run the complete test suite:
pytestRun tests with coverage:
pytest --cov=f1_pipelineRun only unit tests (no cloud dependencies):
pytest tests/unit/- Install development dependencies:
pip install -r requirements-dev.txt- Install pre-commit hooks:
pre-commit install- Run code formatting:
black f1_pipeline/
flake8 f1_pipeline/- Implement the provider interface in
cloud_swap/providers/ - Add configuration schema in
config/schemas/ - Update the provider factory in
cloud_swap/factory.py - Add comprehensive tests
The pipeline includes built-in monitoring:
- Metrics: Prometheus metrics for data volumes, processing times
- Logging: Structured logging with correlation IDs
- Health Checks: API and database connectivity monitoring
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Run the test suite
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenF1 API for providing comprehensive F1 data
- FastF1 for inspiration on F1 data analysis
- The F1 community for their passion and insights
For questions and support:
- Check the documentation
- Review common issues
- Open an issue on GitHub