Diamonds dataset analysis to see how different parameters affect to price. This analysis will later be used to build a model that estimates price.
Previous analysis on python and data processing. Dashboard building in Tableau to show main conclussions.
I've made two main analysis:
-
For numerical variables: at first, I will just consider for the model carat. Price has also big correlation with x, y and z but those also are correlated with carat.
-
For categorical variables: I didn't get a clear conclussion analysing Cut, Color and Clarity sepparated. I created groups for each parameter of Cut, Color and Clarity and represented it's relationship between carat and price. I saw there is a linear relationship where the line's slope is almost constant and the only thing that variates is the function's displacement.
Next steps: estimate the line's formula for each subgroup created so the price can be estimated very accurately.
Used Python, libraries needed: Numpy, Pandas, Matplotlib and Seaborn. Also used Tableau.
Database used can be found in the following link: https://www.kaggle.com/shivam2503/diamonds
Tableau Dashboard can be seen here: https://public.tableau.com/profile/marta.p.rez.puerta#!/vizhome/DiamondsDataset/DiamondsDashboard