ποΈ Mall Customer Segmentation and Predictive Modeling π Overview This project explores customer segmentation using the Mall Customer dataset. It involves data exploration, clustering, dimensionality reduction, and predictive modeling of customer behavior.
π Dataset 200 rows Γ 5 columns
Features:
CustomerID (if used)
Gender
Age
Annual Income (k$)
Spending Score (1β100)
π§Ή Preprocessing Encoded Gender: Male β 1, Female β 0
Scaled numerical features for models when appropriate
Created labels for age_group and spending_score (low, medium, high)
π Exploratory Data Analysis Visualized distributions using histograms, boxplots, and heatmaps
Key insights:
Females have a higher average spending score
Age and income are not strongly correlated with spending
Spending patterns vary significantly across age groups
π PCA (Dimensionality Reduction) Applied Principal Component Analysis to visualize high-dimensional clusters
First 2β3 components explained majority of the variance
PCA plots helped show cluster separation visually
π¦ Clustering β KMeans Clustering Evaluated using Elbow Method and Silhouette Score
Best silhouette score at k=10, but k=5 chosen for interpretability
Visualized clusters in 2D and 3D with centroids
β Agglomerative Clustering Dendrogram used to determine cut-off (y=80)
Cluster centers manually computed
Silhouette scores compared across linkage strategies (ward, complete, average)
π§ Cluster Interpretation Cluster Profile Strategy 0 Practical Buyers (avg spenders) Loyalty offers, practical product focus 1 Young Big Spenders Premium campaigns, influencers 2 Young, Low-Income Spenders Discounts, trend-driven promotions 3 Rich but Frugal Quality/value-focused campaigns 4 Low Spend, Low Income Basic essentials, budget-focused ads
π§ͺ Predictive Modeling Task: Predict spending_score_label (0 = low, 1 = mid, 2 = high)
Features used: age_label, gender, annual_income
Models trained:
Logistic Regression (with scaling)
Random Forest
XGBoost
Evaluation:
Accuracy, Precision, F1-score (weighted)
Used Stratified K-Fold Cross-Validation due to small dataset size
π Tools Python, pandas, scikit-learn, matplotlib, seaborn