This project focuses on segmenting customers based on their purchasing behavior using the K-Means clustering algorithm. The dataset used is a real-world online retail dataset, and the project walks through data cleaning, feature engineering, clustering, and visualization.
- Cleans and preprocesses transactional data
- Aggregates customer-level features: Frequency, Quantity, Total Spend
- Applies standard scaling to normalize data
- Uses the Elbow Method to find the optimal number of clusters
- Performs K-Means clustering to group customers
- Visualizes the customer segments using PCA (Principal Component Analysis)
- Python
- Pandas
- Scikit-learn
- Matplotlib
- Seaborn
- Google Colab / Jupyter Notebook
Customer_Segmentation.ipynb: The main notebook containing the complete pipelineOnline Retail.xlsx: The dataset used is included as Online Retail.xlsxREADME.md: You're reading it π
- Elbow Curve to choose k
- PCA scatterplot showing customer clusters
- Data cleaning and preprocessing
- Unsupervised machine learning with K-Means
- Dimensionality reduction using PCA
- How to interpret and present clustering results
Built during my AIML internship at Dlithe.
Feel free to reach out or connect on LinkedIn (www.linkedin.com/in/fragan-d-souza-64626a29b)