This project performs customer segmentation on the Mall Customers dataset using K-Means clustering and classifies new customers into segments using supervised machine learning models.
- Dataset: Mall_Customers.csv
- Techniques:
- Outlier removal using IQR
- Feature scaling with StandardScaler
- Dimensionality reduction with PCA
- Clustering using K-Means
- Cluster evaluation using Silhouette Score & Davies-Bouldin Index
- Classification using Random Forest and LightGBM
- Preprocessing: Handled nulls, duplicates, outliers, and label-encoded categorical features.
- Scaling: Standardized numeric features.
- PCA: Reduced features to 2D for visualization.
- Clustering: Applied K-Means, determined optimal
kusing the Elbow method. - Evaluation: Used Silhouette Score and Davies-Bouldin Index to evaluate clustering.
- Classification: Trained RandomForest and LightGBM classifiers to predict clusters.
- pandas
- seaborn
- matplotlib
- scikit-learn
- lightgbm
Install all dependencies with:
pip install -r requirements.txt