This machine learning project predicts overtakes in Formula 1 races by analyzing telemetry and event data. Using an XGBoost classifier, we can predict whether an overtake will occur based on various race conditions and car performance metrics.
Formula 1 overtaking is influenced by numerous factors including car performance, track characteristics, tire conditions, and driver skill. This model aims to quantify these relationships and provide predictive insights that could be valuable for race strategy planning.
- Data Preprocessing: Clean and normalize raw F1 telemetry data
- Feature Engineering: Extract and transform relevant features for optimal model performance
- Model Training: Train and optimize an XGBoost classifier for overtake prediction
- Performance Evaluation: Comprehensive metrics including accuracy, precision, recall, and F1-score
- Prediction System: Make predictions on new race scenarios
Download the model weights and datasets from: Google Drive - F1 Overtake Prediction Files
- Python 3.8 or higher
- pip package manager
-
Clone this repository:
git clone git@github.com:aashreyj/f1-overtaking-prediction.git cd f1-overtake-prediction -
Install dependencies:
pip install -r requirements.txt
-
Download the model weights and datasets from: Google Drive - F1 Overtake Prediction Files
-
Start Jupyter Lab or Notebook:
jupyter lab
or
jupyter notebook
-
Open
experiments.ipynbto explore the data and experiment with different approaches.
Process the raw data using the preprocessing scripts:
python preprocessing/process_dataset_overtakes.py
python preprocessing/process_dataset_not_overtakes.pyTrain the model using:
python train.pyEvaluate the model's performance:
python eval.pyMake predictions on new data:
python infer.pyYou can make predictions in this ways:
- Using the
infer.pyscript:
python infer.py --input your_input_data.csv --output predictions.csvThe project uses two main processed datasets:
processed_data_overtakes.csv: Contains data points where overtakes occurredprocessed_data_no_overtakes.csv: Contains data points where no overtakes occurred
The resources/ directory contains several visualizations to help understand the data:
correlation_matrix.png: Shows feature correlationsovertakes_by_circuit.png: Displays the distribution of overtakes across different circuitstop_circuits_for_overtaking.png: Highlights the circuits with the most overtaking opportunitiesham_all_laps.pngandham_lap_9.png: Visualizations of Lewis Hamilton's racing data