In this project we have analyzed a data set with information about diamonds (carat, size, clarity, color...) using Pandas, Seaborn and Plotly. In second place, we have made a data visualization dashboard using Tableau.
Ironhack Data Analytics Project Module 2: Statistics & Data Visualization.
In this project it is been used Python, Pandas, Seaborn and Plotly to make an exploratory process of the data and to draw some explanatory plots.
Also we have used Scipy in order to make some hypothesis testing.
You can find my Tableau Dashboard here.
After doing the data exploratory process in Jupyter Notebook and making the Tableau Dashboard, our main conclusions are:
- Carat is the most relevant characteristic of a diamond's price.
- Fair is the cut form that have diamonds with higher carat. This is the reason that even being the worst cut form, have a higher average price.
- Premium cut form together with I and J colors, are the most expensive diamonds although this colors are two of the lowest quality. This is because of the carat weight of I and J diamonds.
- As for Clarity, I1 and SI2 have two of the lowest clarity qualities but, as there a lot I and J color diamonds, the price is higher.
- Also, J and I are the colors with biggest diamonds.
The solution of the bonus about Hypothesis testing is in data_analysys_report.ipynb.
Furthermore, using the filters added to the Tableau Dashboard we can check that the results obtained in the data_analysys_report.ipynb are true indeed using the filters added to the dashboard.
For example, to examine the results of "Sub-Test 1: Fair cut + color G vs. Fair cut + color I", we can select the Cut Fair and The two colors in which we are interested. In Average Price by Color and Cut table we could see both mean price.
As our t statistic and p value told us, Fair cut + color G mean is a little higher than Fair cut + color I diamonds. But this difference it's not significant due to p value is greater than 0.05.
- Python==3.8.5
- pandas==1.1.3
- seaborn==0.11.0
- numpy==1.19.2
- scipy==1.5.4
βββ project
βββ .gitignore
βββ README.md
βββ notebooks
β βββ data_analysis_report.ipynb
βββ data
Doubts? Advice? Drop me a line! π
