The goal of this project is the prediction of the price of diamonds based on their characteristics (weight, color, quality of cut, etc.), putting into practice machine learning techniques.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
What things you need to install the software and how to install them
%matplotlib inline
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import RFECV
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_squared_log_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.ensemble import GradientBoostingRegressor
rom sklearn.ensemble import HistGradientBoostingClassifier
-
Correlation was studied for this analysis:
- Columns 'x','y','z' were removed from the analysis and prediction, since they were very related.
- Column 'id' was removed since it does not provide any information for this purpose.
- Remaining columns ('carat','cut', 'color','clarity','depth','table'): were used in order to predict the prices.
-
Get Dummies was used for the column 'cut'
-
Numerical value was used in columns 'color' and 'clarity'
-
From best to worst prediction:
- RamdomForestRegressor
- HistGradientBoostingRegressor
- GradientBoostingRegressor
- SupportVectorRegression (SVR)
- Kaggle - The Dataset used
