Skip to content

gonalvarez05/Project-2-Diamonds-Kaggle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

21 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Diamonds Project

Image

Overwiew

In this project, I made an exploratoy analysis of a diamonds dataset with pandas, matplotlib, seaborn and tableau. The main goal was to analyse how the distinct variables affect to the price of the diamond and create a model to predict the price of another dataset of diamonds that we didnΒ΄t had the price.

Data

The dataset has 40.455 diamonds and ten rows with their characteristics:

  • Carat: weight of the diamond
  • Cut: quality of the cut (Fair, Good, Very Good, Premium, Ideal)
  • Color: diamond colour, from J (worst) to D (best)
  • Clarity: a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
  • Depth: total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79)
  • Table: width of top of diamond relative to widest point (43--95))
  • Price: price in USD
  • x: length in mm
  • y: width in mm
  • z: depth in mm

Steps

  • Explore the dataset with pandas.
  • Analyse the statistical values.
  • Graphics with seaborn matplotlib for categorical and numerical variables.
  • Hypothesis testing grouping variables.
  • Report with tableau.
  • Making new features
  • Create and try ML models and see which one performs better.

πŸ“ Folder structure

└── ih_datamadpt1120_project_m2
    β”œβ”€β”€ .gitignore
    β”œβ”€β”€ README.md
    β”œβ”€β”€ notebooks
               β”œβ”€β”€ data_analysis_report.ipynb
               β”œβ”€β”€ ML-models.iypnb     

About

Exploratory analysis of the diamonds characteristics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors