This dataset was downloaded from Kaggle. There are 205 entries of unique cars with different columns of values for features like engine size, car body type and horsepower.
An automobile company aims to enter the US market and compete with local and European manufacturers. It needs to understand the factors influencing car prices in the US market to adjust its strategies and design accordingly.
Certain features such as engine size, car body type and horsepower seem to have more impact on the price. Therefore, visualisations will be made to look at the strenght of the correlations between these features and the price of the car. Moreover, the car brand can be looked at to see if a name of a brand makes impact on the final pricing. A new column should be made to group the individual cars into their brands.
- Extract: Data is extracted from CSV file onto a pandas DataFrame
- Tranform: Using numpy and pandas libraries, data is cleaned and processed
- Load: Cleaned data is saved as a CSV file in order for it to be visualised
- Visualisation: Processed datased is extracted and manipulated for visualisation
- Conclusion: The data visualisations help draw conclusions on the hypothesis
- Correlation between enginesize and price - scatter plot
- Car brands and their prices - bar plot
- Correlation between the numerical values - heatmap
- Correlation between drivewheel and horsepower - box plot
- Average car price per car body type - bar plot
- Exploratory analysis: Used to summarise the data
- Correlation analysis: Measured correlation between numerical varaibles
- Data visualisation: Used to look at correlations between variables
- Generative AI: Used to explain certain car features such as symboling. Optimised code.
Public data so there was no privacy. The data is collected by an automobile consulting company so it is a fair dataset and shouldn't be have bias. There was no legal or societal issues.
- Plotly doesn't show graphs on GitHub. The plots should be saved as HTML files instead.
- Overlapping labels on graphs were fixed by using plt.xticks() for a tidier graph
As mentioned in the unfixed bugs section:
- learned that I can save the Ploply graphs into HTML files, so that is something I would like to include in the next project
- learned to fix labels on graphs to get a tidier graph
- NumPy
- pandas
- Matplotlib
- Seaborn
- Plotly
- Data used from Kaggle.
- Class, facilitators and data coaches
- CodeInstitue content (LMS)