Project Car Price Analysis is a comprehensive data analysis dashboard for car price analysis, enabling users to explore, analyze, and visualize car price data interactively. The dashboard is built with Streamlit and supports CSV data input, providing an intuitive interface.
- The dataset contains car listings with features such as brand, model, engine size, and price. It is sourced from Kaggle and is well within the repository's maximum size of 100Gb.
- Enable users to compare average car prices by brand.
- Allow exploration of price distributions for individual brands.
- Provide insights into how engine size affects car price.
- Support interactive data exploration for business decision-making.
- Hypothesis 1: Certain car brands have consistently higher average prices.
- Validation: Visualize average price by brand using horizontal bar charts.
- Hypothesis 2: Engine size is positively correlated with car price.
- Validation: Scatter plot and regression analysis of engine size vs. price.
- Hypothesis 3: Price distributions vary significantly between brands.
- Validation: Display histograms for selected brands.
- Data collection from Kaggle.
- Data cleaning and preprocessing in Jupyter notebooks.
- Exploratory analysis and feature engineering.
- Dashboard development in Streamlit.
- Iterative testing and refinement based on feedback.
- Average price by brand → Horizontal bar chart for clear comparison.
- Price distribution per brand → Histogram for selected brand.
- Engine size vs. price → Scatter plot for correlation analysis.
- Groupby and aggregation for summary statistics.
- Value counts and histograms for distribution analysis.
- Scatter plots and regression for correlation.
- Used generative AI tools (GitHub Copilot) for code suggestions, design thinking, and optimization.
- The dataset does not contain personal or sensitive information.
- Bias may exist in the dataset due to market representation; addressed by transparent reporting.
- No legal or societal issues identified.
- Home Page: Project overview and navigation.
- Price vs Brand: Horizontal bar chart of average price by brand; dropdown to select brand for histogram.
- Price vs. Enginesize: Scatter plot and regression line.
- Horsepower Distribution: Analysis of horsepower data.
- Brand Explorer: Interactive exploration of brands.
- Data insights are communicated using clear visualizations and concise text, suitable for both technical and non-technical audiences.
- No major unfixed bugs. Minor limitations include:
- Streamlit does not support direct click events on Plotly charts.
- Some brand names may be truncated if the chart height is not sufficiently increased.
- Feedback from peers led to improved chart sizing and widget selection.
- Challenges included handling categorical data and ensuring consistent visualization sizing.
- Strategies: Used session state for data sharing, standardized histogram bins/range, and increased chart height.
- Next steps: Explore advanced interactivity (e.g., Dash callbacks), learn more about deployment and scaling Streamlit apps.
- pandas: Data loading, cleaning, and aggregation.
- Example:
df = pd.read_csv('CarPrice_Working.csv')
- Example:
- numpy: Numerical operations.
- plotly: Interactive visualizations.
- Example:
px.bar(df_byBrand, x='price', y='carBrand', orientation='h')
- Example:
- streamlit: Dashboard interface and widgets.
- Data cleaning steps adapted from Kaggle tutorials.
- Widget usage and dashboard layout inspired by Streamlit documentation.
- CI logo from Code Institute.
- Example images from open-source image repositories.
- Thanks to Code Institute instructors and peers for feedback and support.
