GitHub - MC993/Customer_Segmentation

🛍️ Mall Customer Segmentation and Predictive Modeling 📊 Overview This project explores customer segmentation using the Mall Customer dataset. It involves data exploration, clustering, dimensionality reduction, and predictive modeling of customer behavior.

📁 Dataset 200 rows × 5 columns

Features:

CustomerID (if used)

Gender

Age

Annual Income (k$)

Spending Score (1–100)

🧹 Preprocessing Encoded Gender: Male → 1, Female → 0

Scaled numerical features for models when appropriate

Created labels for age_group and spending_score (low, medium, high)

📈 Exploratory Data Analysis Visualized distributions using histograms, boxplots, and heatmaps

Key insights:

Females have a higher average spending score

Age and income are not strongly correlated with spending

Spending patterns vary significantly across age groups

📉 PCA (Dimensionality Reduction) Applied Principal Component Analysis to visualize high-dimensional clusters

First 2–3 components explained majority of the variance

PCA plots helped show cluster separation visually

📦 Clustering ✅ KMeans Clustering Evaluated using Elbow Method and Silhouette Score

Best silhouette score at k=10, but k=5 chosen for interpretability

Visualized clusters in 2D and 3D with centroids

✅ Agglomerative Clustering Dendrogram used to determine cut-off (y=80)

Cluster centers manually computed

Silhouette scores compared across linkage strategies (ward, complete, average)

🧠 Cluster Interpretation Cluster Profile Strategy 0 Practical Buyers (avg spenders) Loyalty offers, practical product focus 1 Young Big Spenders Premium campaigns, influencers 2 Young, Low-Income Spenders Discounts, trend-driven promotions 3 Rich but Frugal Quality/value-focused campaigns 4 Low Spend, Low Income Basic essentials, budget-focused ads

🧪 Predictive Modeling Task: Predict spending_score_label (0 = low, 1 = mid, 2 = high)

Features used: age_label, gender, annual_income

Models trained:

Logistic Regression (with scaling)

Random Forest

XGBoost

Evaluation:

Accuracy, Precision, F1-score (weighted)

Used Stratified K-Fold Cross-Validation due to small dataset size

🛠 Tools Python, pandas, scikit-learn, matplotlib, seaborn

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages