Project FIFA MoneyBall

Objective

The objective of this project is to build a linear regression model which can predict the market value of a FIFA 21 player. I want to find out which indicators are most important to predict the market value of a player.

After data cleaning and before modeling, I will perform analysis on the dataset based on different questions (see section 'Questions and visualization').

Dataset

I am using the FIFA 21 COMPLETE PLAYER DATASET dataset from Kaggle: fifa21_male2.csv

This dataset contains 107 columns with 17125 rows. Each row shows the information for one FIFA 21 player. There is information on different topics:

General information

Player Name Club of the Player

Nationaliy Height

Weight Contract

Foot Best Position
Monetary values

Market value Wage Release clause
Player's statistics

Overal Rating Potential

Growth Attacking

Skill Movement

Hits Dribbling

Defending Physical

After data cleaning and dropping unnecessary columns for the model the dataframe contains 70 columns, the amount of rows remained the same.

Data cleaning and wrangling

In this section I performed the following steps for data cleaning:

Changing the column header names (lower letter and no spaces)
Removing duplicate rows in dataframe
Changing column types
- Extracting only year from column 'joined'
- Changing column types to numerical
- Changing monetary columns (€)
Dealing with NaN-values
- Overview of present NaN-values
- Overview of how to proceed with columns containing NaN-values
- Replacing NaN-values
Dropping unnecessary columns

Questions and visualization

I performed an analysis on four different topics to gain more insight about the relationship of monetary values and specific indicators. The idea is to find out which factors make it most likely for a player to have a high wage, market value and release clause:

Comparison value/wage/release clause for the different field positions
Comparison of wage for left/right foot (general VS split for different positions)
Relationship of age and wage
Comparison of club and market value

Linear regression model

To create a linear regression model the following steps were performed:

Pre-processing data for the linear regression model
- Correlation of numerical columns
- Insights after first exploration
- Boxcox transformation
- Dealing with outliers
Normalization of the data
Dealing with categorical columns
Comparison of data sets by checking R-score
- Overview of different R2-scores
Modeling and model validation
Reporting

Main results


R2-score	0.930

Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import statsmodels

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Data		Data
Output		Output
Visualization		Visualization
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project FIFA MoneyBall

Objective

Dataset

Data cleaning and wrangling

Questions and visualization

Linear regression model

Main results

Libraries

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages


Player Name	Club of the Player
Nationaliy	Height
Weight	Contract
Foot	Best Position


Overal Rating	Potential
Growth	Attacking
Skill	Movement
Hits	Dribbling
Defending	Physical


Market value	Wage	Release clause

Folders and files

Latest commit

History

Repository files navigation

Project FIFA MoneyBall

Objective

Dataset

Data cleaning and wrangling

Questions and visualization

Linear regression model

Main results

Libraries

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages