Historic sports data and prediction of game winners
[Ulrike Anklam]
[Data Analytics Berlin, 09-10-2020]
This is my 10 day final project of the Ironhack Data Analytics bootcamp. The main goal of my project was it to use a machine learning algorithm to make predictions on sports events.
I used an open-source dataset from Kaggle. The dataset was created by Max Horowitz. He collected the data through the official NFL API and has since 2016 updated the dataset with the new season data.
I used the v5 dataset with play-by-play data from the 2009 season to the 2018 season, which covers all games (more 2500) and a total of 316 538 plays.
- downloaded the dataset
- notebook Data Wrangling and Cleaning
- analysis of different features
- created games dataset based on play-by-play data with features from each team
- trained ML model on it
.csv files on google drive
It is possible to predict, if the hometeam in a game setting will win or not, with an 85% accuracy.