This project takes raw retail sales data and transforms it into actionable insights and a working sales prediction tool. I handled the entire process — from cleaning and exploring the data to building a predictive machine learning model and visualizing the results.
This repository contains two versions of the notebook — one with Plotly visuals and another using Matplotlib and Seaborn for static plots. Plotly visuals and the saved prediction model (.pkl file) won’t display or run directly on GitHub. Please download the notebook(s) and model files to run the dashboard and predictions locally.
-
In this analysis, I set out to answer:
-
What kind of products sell more and which ones sell less?
-
Does the type of outlet or its age affect sales?
-
Do item features like fat content, weight, or type influence sales?
-
Can we build a reliable model to predict sales?
-
Which features matter most when predicting sales?
• Top-selling categories: Household, Fruits & Vegetables, Snack Foods
• Lowest sales: Seafood, Breakfast, Hard Drinks
• Outlet effect: Supermarket Type3 had the highest sales; Grocery Stores the lowest
• Outlet age: Medium-aged outlets performed best
• Feature impact: Outlet Type, Item MRP, and Item Type were the strongest predictors
• Model Used: Gradient Boosting Regressor (best accuracy)
• Outcome: Accurately predicts sales based on product and outlet details
• Pipeline: Preprocessing + Model saved for future use
• Python (Pandas, NumPy, Scikit-learn)
• Matplotlib / Seaborn / Plotly for visualization
• Dash for interactive prediction interface