This Django application forecasts profits for a retail store based on date, gender, product category, and age. It provides visual insights through historical data trends, forecasted values with confidence intervals, and a comparison of actual vs. forecasted profits.
- Getting Started
- Setup Instructions
- Dataset and Forecasting Model
- Challenges Faced
Follow these steps to set up the project, run the application, and view the retail store profit forecasting dashboard.
-
Clone the Repository
Clone this repository to your local machine using:
git clone git@github.com:TEBIAN/Retail-Store.git
-
Navigate to the Project Directory
cd retail-store-profit-forecasting -
Create and Activate a Virtual Environment
It’s recommended to use a virtual environment for dependency management:
python3 -m venv env source env/bin/activate # For Windows: env\Scripts\activate
-
Install Dependencies
Install the necessary packages using
pip:pip install -r requirements.txt
-
Set Up Django Application
Run the following to apply migrations and create the initial database
python forecasting_app/forecasting/manage.py migrate
-
Load Sample Data
Place your dataset file (e.g.,
sales_data.csv) in the project root directory. This dataset should include the following columns:date(date of sale)gender(customer gender)product_category(type of product)age(customer age)profit(profit generated from the sale)
-
Train the Model
Run the
train_modelfunction inml_model.pyto train and save the forecasting model. This function loads the dataset, preprocesses data, trains a Random Forest model, and saves the model tomodel.pklfor reuse.python forecasting_app/forecasting/manage.py shell >>> from app_name.ml_model import train_model >>> train_model()
-
Run the Django Server
Start the server using:
python manage.py runserver
-
Access the Application
Go to
http://127.0.0.1:8000/visualizations/to view visualizations for historical trends, forecasted values, and actual vs. forecasted comparisons.
The dataset used in this application contains records of retail sales, with each row representing a sale. Key columns include:
- Date: Date of sale, which we use to derive
year,month, andday. - Gender: Customer gender, an important categorical feature.
- Product Category: Type of product purchased.
- Age: Customer age, allowing for customer segmentation.
- Profit: Profit generated from each sale, the target variable for forecasting.
The forecasting model is built using Random Forest Regressor, which is robust for regression tasks involving numerical and categorical data. The model takes date-derived features, gender, age, and product category to predict the profit from future sales. Key visualizations provided by the app include:
- Historical Profit Trends: Shows past profit trends to help identify patterns.
- Forecasted Profit with Confidence Intervals: Provides 30-day profit forecasts, displaying upper and lower confidence limits.
- Actual vs. Forecasted Profit Comparison: Highlights the accuracy of the model by comparing actual profits with predicted values.
During the implementation, several challenges arose:
-
Data Preprocessing:
- Handling date features required extracting
year,month, andday, which was essential for the model to understand time-related patterns. - Encoding categorical features such as
genderandproduct categorywas necessary for using them in the machine learning pipeline.
- Handling date features required extracting
-
Pipeline and Transformer Compatibility:
- Initial errors were encountered due to differences between using
DataFramevs.NumPy arraysin theColumnTransformer. Ensuring that input data remained as a DataFrame resolved these issues.
- Initial errors were encountered due to differences between using
-
Model Interpretation and Confidence Intervals:
- Confidence intervals were approximated at ±10% of the forecasted values to provide an initial sense of forecast reliability. In production, more robust methods like quantile regression could enhance accuracy.
This application demonstrates an end-to-end solution for profit forecasting in a retail setting, combining machine learning, data visualization, and web deployment with Django. Future improvements could include additional model optimization, improved forecasting methods, and further refinement of confidence intervals.