Customer_Segmentation/README.md at master · MC993/Customer_Segmentation

🛍️ Mall Customer Segmentation and Predictive Modeling 📊 Overview This project explores customer segmentation using the Mall Customer dataset. It involves data exploration, clustering, dimensionality reduction, and predictive modeling of customer behavior.

📁 Dataset 200 rows × 5 columns

Features:

CustomerID (if used)

Gender

Age

Annual Income (k$)

Spending Score (1–100)

🧹 Preprocessing Encoded Gender: Male → 1, Female → 0

Scaled numerical features for models when appropriate

Created labels for age_group and spending_score (low, medium, high)

📈 Exploratory Data Analysis Visualized distributions using histograms, boxplots, and heatmaps

Key insights:

Females have a higher average spending score

Age and income are not strongly correlated with spending

Spending patterns vary significantly across age groups

📉 PCA (Dimensionality Reduction) Applied Principal Component Analysis to visualize high-dimensional clusters

First 2–3 components explained majority of the variance

PCA plots helped show cluster separation visually

📦 Clustering ✅ KMeans Clustering Evaluated using Elbow Method and Silhouette Score

Best silhouette score at k=10, but k=5 chosen for interpretability

Visualized clusters in 2D and 3D with centroids

✅ Agglomerative Clustering Dendrogram used to determine cut-off (y=80)

Cluster centers manually computed

Silhouette scores compared across linkage strategies (ward, complete, average)

🧠 Cluster Interpretation Cluster Profile Strategy 0 Practical Buyers (avg spenders) Loyalty offers, practical product focus 1 Young Big Spenders Premium campaigns, influencers 2 Young, Low-Income Spenders Discounts, trend-driven promotions 3 Rich but Frugal Quality/value-focused campaigns 4 Low Spend, Low Income Basic essentials, budget-focused ads

🧪 Predictive Modeling Task: Predict spending_score_label (0 = low, 1 = mid, 2 = high)

Features used: age_label, gender, annual_income

Models trained:

Logistic Regression (with scaling)

Random Forest

XGBoost

Evaluation:

Accuracy, Precision, F1-score (weighted)

Used Stratified K-Fold Cross-Validation due to small dataset size

🛠 Tools Python, pandas, scikit-learn, matplotlib, seaborn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls