This project was designed around a dataset we found on Kaggle relating to the outcomes of CSGO Professional Matches between 2015-2020. Our task was to take the dataset and load it into a MYSQL relational database, computing some new and interesting variables that where not present in the pre-existing dataset.
To run the project, you must download the csv files from the dataset at the link provided in relevant sources below. You must then place the csv files inside a folder named "csgo_data", which will be accessed for data entry into the database.
For more info on using MySql with Python and setting up Python-Dotenv, please see the related links section.
- Players (name, team, country, eventID, eventName)
- Gives an overview of all players, as well as the relevant events that they participated in
- Matches (date, matchID, eventID, team1, team2, bestOf, winner)
- Provides insight on the involved teams in matches, who won, and when the match occurred
- Maps (mapName, pickRate, banRate, totalPicks, totalBans)
- Displays the name of a particular map, as well as its percentage pick/ban rate and total number of picks/bans
- Player Analytics (playerName, teamName, matchID, kills, deaths)
- Gives a summary of a particular player's performance for a particular match
When a team wins a match 2-0, they will not go on to play a 3rd map. In these cases, the 3rd pick and ban column in the CSV (t1_removed_3, t2_removed_3) is given '0.0' to represent a map not being picked. To get around this, we simply removed any keys of '0.0' from the map dictionaries used.
We also found that the results of matches were incorrect in that if only 1 map was played those results were listed as a pair (5, 16) for example, which would mean team 1 had won 5 maps, and team 2 had won 16 (which is not correct). We solved this issue by combining the wins for both teams to equal the maps played, and if it was higher than 5 (maximum number of maps possible), we then assumed it was only one map that had been played.
During some events, only 1-2 maps were played per match, leaving data for the remaining columns (player stats regarding maps 2 and 3) empty. As a result, if maps 2 or 3 were not played during a match, then the player stats for those maps will be merely added as 0.
The earliest date for every CSV is different. Economy.csv ends in 2017-04-04, picks ends in 2016-04-12, players ends in 2015-10-07, and results ends in 2015-11-03. Therefore, for some matches will be unable to be joined with entries from the other tables.
- Kaggle Dataset on CSGO Professional Matches
- Python-Dotenv
- MySql-Connector-Python