Skip to content

Applied K-Means clustering on Mall Customers dataset with PCA for dimensionality reduction. Cluster labels were added to the dataset and used to train Random Forest and LightGBM classifiers to predict customer segments on new data.

Notifications You must be signed in to change notification settings

hashibk/KaggleMallCustomersDataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Mall Customer Segmentation & Classification

This project performs customer segmentation on the Mall Customers dataset using K-Means clustering and classifies new customers into segments using supervised machine learning models.

πŸ“Š Project Overview

  • Dataset: Mall_Customers.csv
  • Techniques:
    • Outlier removal using IQR
    • Feature scaling with StandardScaler
    • Dimensionality reduction with PCA
    • Clustering using K-Means
    • Cluster evaluation using Silhouette Score & Davies-Bouldin Index
    • Classification using Random Forest and LightGBM

πŸ” Workflow

  1. Preprocessing: Handled nulls, duplicates, outliers, and label-encoded categorical features.
  2. Scaling: Standardized numeric features.
  3. PCA: Reduced features to 2D for visualization.
  4. Clustering: Applied K-Means, determined optimal k using the Elbow method.
  5. Evaluation: Used Silhouette Score and Davies-Bouldin Index to evaluate clustering.
  6. Classification: Trained RandomForest and LightGBM classifiers to predict clusters.

πŸ“¦ Requirements

  • pandas
  • seaborn
  • matplotlib
  • scikit-learn
  • lightgbm

Install all dependencies with:

pip install -r requirements.txt

About

Applied K-Means clustering on Mall Customers dataset with PCA for dimensionality reduction. Cluster labels were added to the dataset and used to train Random Forest and LightGBM classifiers to predict customer segments on new data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published