Skip to content

ssarahreyes/ih_datamadpt1120_project_m2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

➑️ Data Analysis and Visualization of a Data Set

In this project we have analyzed a data set with information about diamonds (carat, size, clarity, color...) using Pandas, Seaborn and Plotly. In second place, we have made a data visualization dashboard using Tableau.

Image

βœ… Status

Ironhack Data Analytics Project Module 2: Statistics & Data Visualization.

πŸ’» Technology stack

In this project it is been used Python, Pandas, Seaborn and Plotly to make an exploratory process of the data and to draw some explanatory plots.

Also we have used Scipy in order to make some hypothesis testing.

πŸ“Š Tableau Dashboard

You can find my Tableau Dashboard here.

⚑ Main conclusions

After doing the data exploratory process in Jupyter Notebook and making the Tableau Dashboard, our main conclusions are:

  • Carat is the most relevant characteristic of a diamond's price.
  • Fair is the cut form that have diamonds with higher carat. This is the reason that even being the worst cut form, have a higher average price.
  • Premium cut form together with I and J colors, are the most expensive diamonds although this colors are two of the lowest quality. This is because of the carat weight of I and J diamonds.
  • As for Clarity, I1 and SI2 have two of the lowest clarity qualities but, as there a lot I and J color diamonds, the price is higher.
  • Also, J and I are the colors with biggest diamonds.

πŸš€ Bonus: Hypothesis Testing

The solution of the bonus about Hypothesis testing is in data_analysys_report.ipynb.

Furthermore, using the filters added to the Tableau Dashboard we can check that the results obtained in the data_analysys_report.ipynb are true indeed using the filters added to the dashboard.

For example, to examine the results of "Sub-Test 1: Fair cut + color G vs. Fair cut + color I", we can select the Cut Fair and The two colors in which we are interested. In Average Price by Color and Cut table we could see both mean price.

As our t statistic and p value told us, Fair cut + color G mean is a little higher than Fair cut + color I diamonds. But this difference it's not significant due to p value is greater than 0.05.

πŸ”§ Technology Stack

  • Python==3.8.5
  • pandas==1.1.3
  • seaborn==0.11.0
  • numpy==1.19.2
  • scipy==1.5.4

πŸ“ Folder structure

└── project
    β”œβ”€β”€ .gitignore
    β”œβ”€β”€ README.md
    β”œβ”€β”€ notebooks
    β”‚   β”œβ”€β”€ data_analysis_report.ipynb
    └── data

πŸ’Œ Contact info

Doubts? Advice? Drop me a line! 😏

About

Ironhack Project Module 2: statistics and visualization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors