- Overview
- Dataset
- Methodology
- Results & Insights
- Project Structure
- Installation & Usage
- API Endpoints
- Notebooks
- Requirements
- License
This project aims to segment mall customers into distinct groups using unsupervised machine learning techniques. The goal is to help businesses understand customer behavior, target marketing efforts, and personalize services. The solution includes:
- Data preprocessing and cleaning
- Clustering model development and evaluation
- API for serving segmentation results
- Visualizations and reporting
- File:
backend/data/raw/Mall_Customers.csv - Features:
CustomerID: Unique identifierGender: Male/FemaleAge: Customer ageAnnual Income (k$): Annual income in thousandsSpending Score (1-100): Score assigned by the mall
- Description:
- The dataset contains 200 records of mall customers, capturing demographic and behavioral information. It is widely used for clustering and segmentation tasks in retail analytics.
- Handle missing values and outliers
- Encode categorical variables (e.g., Gender)
- Scale numerical features for fair clustering
- Save processed data to
backend/data/processed/preprocessed_data.csv
- Algorithm: K-Means Clustering
- Cluster Selection:
- The optimal number of clusters was determined using the Silhouette Method, which measures how similar an object is to its own cluster compared to other clusters.
- The highest average silhouette score was achieved with 4 clusters.
- Model Output:
- Trained model saved as
backend/models/clustering_model.pkl - Cluster assignments saved in
backend/data/processed/clustered_data.csv
- Trained model saved as
- 2D and 3D cluster visualizations:
backend/reports/figures/Customer Segmentation 2D.pngbackend/reports/figures/Customer Segmentation 3D.png
- Detailed summary in
backend/reports/summary.txt
- Cluster Profiles:
- Cluster 1: Young, moderate income, moderate spending
- Cluster 2: Older, high income, low spending
- Cluster 3: High spending, varied income, often younger
- Cluster 4: Low income, low spending
- Business Value:
- Enables targeted marketing and personalized offers
- Helps identify high-value and at-risk customer segments
- Model Selection Rationale:
- Silhouette Method ensures well-separated, meaningful clusters
- 4 clusters provide actionable segmentation without overfitting
Mall Customer Segmentation
├── backend/
│ ├── data/
│ │ ├── raw/
│ │ └── processed/
│ ├── models/
│ ├── notebooks/
│ ├── reports/
│ │ └── figures/
| | └──summary.txt
│ ├── requirements.txt
│ └── src/
├── frontend/
│ ├── app/
│ ├── public/
│ ├── package.json
│ └── README.md
├── LICENSE
└── README.md
pip install -r backend/requirements.txtpython backend/src/main.pyuvicorn backend.src.main:app --reloadThe backend uses FastAPI to serve clustering results and other functionalities. Example endpoints:
/predict: Get cluster assignment for new customer data/clusters: Retrieve information about clusters
- Preprocessing:
backend/notebooks/preprocessing.ipynb - Modeling:
backend/notebooks/modeling.ipynb
Main dependencies (see backend/requirements.txt):
- pandas
- numpy
- fastapi[standard]
- joblib
- matplotlib
- scikit-learn
MIT