Skip to content

AmuKanda99/Project-1

Repository files navigation

Project 5: Car Price Analysis

Dataset content

This dataset was downloaded from Kaggle. There are 205 entries of unique cars with different columns of values for features like engine size, car body type and horsepower.

Business Requirements

An automobile company aims to enter the US market and compete with local and European manufacturers. It needs to understand the factors influencing car prices in the US market to adjust its strategies and design accordingly.

Hypothesis and how to validate?

Certain features such as engine size, car body type and horsepower seem to have more impact on the price. Therefore, visualisations will be made to look at the strenght of the correlations between these features and the price of the car. Moreover, the car brand can be looked at to see if a name of a brand makes impact on the final pricing. A new column should be made to group the individual cars into their brands.

Project Plan

  • Extract: Data is extracted from CSV file onto a pandas DataFrame
  • Tranform: Using numpy and pandas libraries, data is cleaned and processed
  • Load: Cleaned data is saved as a CSV file in order for it to be visualised
  • Visualisation: Processed datased is extracted and manipulated for visualisation
  • Conclusion: The data visualisations help draw conclusions on the hypothesis

The rationale to map the business requirements to the Data Visualisations

  • Correlation between enginesize and price - scatter plot
  • Car brands and their prices - bar plot
  • Correlation between the numerical values - heatmap
  • Correlation between drivewheel and horsepower - box plot
  • Average car price per car body type - bar plot

Analysis techniques used

  • Exploratory analysis: Used to summarise the data
  • Correlation analysis: Measured correlation between numerical varaibles
  • Data visualisation: Used to look at correlations between variables
  • Generative AI: Used to explain certain car features such as symboling. Optimised code.

Ethical considerations

Public data so there was no privacy. The data is collected by an automobile consulting company so it is a fair dataset and shouldn't be have bias. There was no legal or societal issues.

Unfixed bugs

  • Plotly doesn't show graphs on GitHub. The plots should be saved as HTML files instead.
  • Overlapping labels on graphs were fixed by using plt.xticks() for a tidier graph

Development roadmap

As mentioned in the unfixed bugs section:

  • learned that I can save the Ploply graphs into HTML files, so that is something I would like to include in the next project
  • learned to fix labels on graphs to get a tidier graph

Data Analysis libraries

  • NumPy
  • pandas
  • Matplotlib
  • Seaborn
  • Plotly

Credits

  • Data used from Kaggle.
  • Class, facilitators and data coaches
  • CodeInstitue content (LMS)

About

Analytics Project 1 - Analyse how different variables impact US car pricing to help a new company be competitive against other manufacturers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors