Skip to content

JavieraAlmendrasVilla/Customer-Segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ML-powered customer analytics

Project Overview

This project demonstrates the development of an unsupervised ML algorithm that segments customers based on lifetime value (LTV) and provides actionable business insights. The goal is to help e-commerce companies identify high-value customers, understand purchasing behavior, and inform personalized marketing strategies.

The platform uses the Online Retail Dataset (UCI Machine Learning Repository) and was developed and deployed as a proof-of-concept in one day.

Dataset

Column Description
InvoiceNo Unique invoice number
StockCode Unique product code
Description Product description
Quantity Number of items purchased
InvoiceDate Date and time of invoice
UnitPrice Price per unit
CustomerID Unique customer identifier
Country Customer country
  • Data preprocessing steps:

    • Removed rows without CustomerID
    • Removed negative UnitPrice rows
    • Aggregated at customer-level to compute RFM metrics: Recency, Frequency, Monetary (TotalRevenue)

Methodology

  1. Customer Segmentation

    • Calculated RFM metrics:

      • Recency: Days since last purchase
      • Frequency: Number of invoices/orders
      • Monetary: Total spend
    • Due to heavy skew in the Monetary and Frequency distributions, standard IQR-based outlier detection was ineffective. Instead, thresholds were set empirically, guided by Elbow and Silhouette analysis, to ensure well-separated, meaningful customer segments.

    • Applied log transformation and standard scaling

    • Used KMeans clustering to identify 3 segments:

      • VIP (Blue)
      • Middle Class (Turquoise)
      • Inactive (Brown)

clusters

  1. KPIs Computed per Segment

    • Total Revenue per Segment
    • Average Order Value (AOV)
    • Purchase Frequency
    • Recency
    • Segment Size
    • Revenue Contribution (%)

See the Analytics Dashboard on Tableau


Key Findings

Segment % of Customers % of Revenue Notes
VIP 21% 57% High frequency, high spend. Most profitable segment.
Middle Class 40% 35% Highest AOV, lower frequency than VIP. Opportunity for upselling.
Inactive 39% 8% Rarely purchase; low revenue contribution.

Insights:

  • 61% of customers (VIP + Middle Class) generate 92% of total revenue
  • Middle Class customers have higher AOV but half the frequency of VIPs
  • Targeted campaigns should focus on retention and upsell for VIP and Middle Class customers
  • Inactive segment can be targeted with win-back campaigns or cost-efficient marketing strategies

How to Run

  1. Clone the repository:
git clone https://github.com/JavieraAlmendrasVilla/Customer-Segmentation.git
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the notebook:
jupyter notebook customer_segmentation.ipynb

Technologies Used

  • Python (pandas, numpy, matplotlib, seaborn, scikit-learn)
  • Jupyter Notebook
  • KMeans clustering for segmentation
  • PCA for visualization

Portfolio Takeaway

This project demonstrates the ability to:

  • Preprocess and clean real-world e-commerce data
  • Compute RFM metrics and detect outliers
  • Segment customers with unsupervised learning
  • Derive actionable business insights
  • Visualize results clearly for stakeholders

About

Customer segmentation of the Online Retail Dataset (UCI) using the unsupervised ML clustering algorithm K-means. Findings presented on a Tableau Dashboard that you can find in the README

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published