GitHub - ssarahreyes/ih_datamadpt1120_project_m2: Ironhack Project Module 2: statistics and visualization

➡️ Data Analysis and Visualization of a Data Set

In this project we have analyzed a data set with information about diamonds (carat, size, clarity, color...) using Pandas, Seaborn and Plotly. In second place, we have made a data visualization dashboard using Tableau.

✅ Status

Ironhack Data Analytics Project Module 2: Statistics & Data Visualization.

💻 Technology stack

In this project it is been used Python, Pandas, Seaborn and Plotly to make an exploratory process of the data and to draw some explanatory plots.

Also we have used Scipy in order to make some hypothesis testing.

📊 Tableau Dashboard

You can find my Tableau Dashboard here.

⚡ Main conclusions

After doing the data exploratory process in Jupyter Notebook and making the Tableau Dashboard, our main conclusions are:

Carat is the most relevant characteristic of a diamond's price.
Fair is the cut form that have diamonds with higher carat. This is the reason that even being the worst cut form, have a higher average price.
Premium cut form together with I and J colors, are the most expensive diamonds although this colors are two of the lowest quality. This is because of the carat weight of I and J diamonds.
As for Clarity, I1 and SI2 have two of the lowest clarity qualities but, as there a lot I and J color diamonds, the price is higher.
Also, J and I are the colors with biggest diamonds.

🚀 Bonus: Hypothesis Testing

The solution of the bonus about Hypothesis testing is in data_analysys_report.ipynb.

Furthermore, using the filters added to the Tableau Dashboard we can check that the results obtained in the data_analysys_report.ipynb are true indeed using the filters added to the dashboard.

For example, to examine the results of "Sub-Test 1: Fair cut + color G vs. Fair cut + color I", we can select the Cut Fair and The two colors in which we are interested. In Average Price by Color and Cut table we could see both mean price.

As our t statistic and p value told us, Fair cut + color G mean is a little higher than Fair cut + color I diamonds. But this difference it's not significant due to p value is greater than 0.05.

🔧 Technology Stack

Python==3.8.5
pandas==1.1.3
seaborn==0.11.0
numpy==1.19.2
scipy==1.5.4

📁 Folder structure

└── project
    ├── .gitignore
    ├── README.md
    ├── notebooks
    │   ├── data_analysis_report.ipynb
    └── data

💌 Contact info

Doubts? Advice? Drop me a line! 😏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

➡️ Data Analysis and Visualization of a Data Set

✅ Status

💻 Technology stack

📊 Tableau Dashboard

⚡ Main conclusions

🚀 Bonus: Hypothesis Testing

🔧 Technology Stack

📁 Folder structure

💌 Contact info

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

➡️ Data Analysis and Visualization of a Data Set

✅ Status

💻 Technology stack

📊 Tableau Dashboard

⚡ Main conclusions

🚀 Bonus: Hypothesis Testing

🔧 Technology Stack

📁 Folder structure

💌 Contact info

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages