Marketing Analytics: Customer Segmentation & Spending Dynamics

📌 Project Overview

This project is a comprehensive data analytics study of a customer database for a retail food company. The goal was to identify distinct customer segments, analyze spending behaviors, and optimize marketing campaign performance.

Using Python, we transitioned from raw data cleaning to K-Means Clustering and Linear Regression Residual Analysis to uncover "hidden" spending patterns that demographics alone could not explain.

📂 Dataset

The dataset (ifood_df.csv) contains 2,205 customer records (after cleaning) with 39 features, including:

Demographics: Age, Income, Marital Status, Education, Household Structure (Kids/Teens).
Behavioral: Recency (days since last purchase), Complaints, Web/Store/Catalog Visits.
Spending (Mnt): Monetary value spent on Wines, Fruits, Meat, Fish, Sweets, and Gold.
Campaign History: Acceptance of 5 previous marketing campaigns and the current response.

🛠️ Tech Stack

Python 3.x
Pandas & NumPy: Data manipulation and statistical analysis.
Seaborn & Matplotlib: Advanced data visualization.
Scikit-Learn: K-Means Clustering, StandardScaler, Linear Regression, Random Forest.

⚙️ Methodology

1. Data Cleaning & Preprocessing

Before analysis, the raw data underwent a rigorous health check:

Duplicate Removal: Identified and removed 184 duplicate rows to prevent model overfitting.
Feature Pruning: Dropped constant columns (Z_CostContact, Z_Revenue) that added zero variance.
Data Integrity: Recalculated the MntTotal column to ensure it mathematically equaled the sum of all individual product categories (Wines + Meat + Fruits + etc.), fixing discrepancies in 2,000+ rows.
Feature Engineering: Created new features such as Children (Total Kids + Teens) and Has_Child for family structure analysis.

2. Exploratory Data Analysis (EDA)

We utilized NumPy and Seaborn to understand the shape of the business:

The "Income Effect": Identified an exponential relationship between Income and Spending (Correlation: 0.82).
Product Mix: Discovered that Wines (50%) and Meat (27%) account for nearly 80% of total revenue.
The "Child Penalty": Spending drops by ~50% with one child and craters with 2+ children.

3. Customer Segmentation (K-Means Clustering)

We used K-Means Clustering on standardized features (Income, Total Spending, Recency) to identify 4 distinct personas. The optimal $K=4$ was selected via the Elbow Method.

Cluster Name	Profile	Strategy
Active Whales	High Income, High Spend, Active (<30 days).	Retain: Cross-sell premium items.
Churning VIPs	High Income, High Spend, Inactive (>70 days).	Win-Back: Target with "Campaign 5" (their favorite).
Promising Recent	Low Income, Low Spend, Active (<30 days).	Nurture: Offer lower-ticket deals to build habits.
At-Risk Budget	Low Income, Low Spend, Inactive (>70 days).	Automate: Move to low-cost drip campaigns.

4. Advanced "True Spender" Analysis (Residuals)

To find value beyond simple income brackets, we built a Linear Regression model to predict "Expected Spending" based on Income. We then analyzed the Residuals (Actual - Expected) to find over-performers.

Key Findings:

Education: "Basic" education customers over-spend relative to their low income (+$$233), while PhDs actually under-spend relative to their high income (-$$22).
Family: Having Teens causes a larger drop in discretionary food spending than having Toddlers.
The "PhD Diet": Deep-dive category analysis revealed that PhDs significantly over-spend on Wine (+$61) but under-spend on Meat, Fish, and Fruits.

📊 Visualizations

The notebook includes several intricate visualizations:

Cluster Scatter Plot: Visualizing the income gap between VIPs and Mass Market.
Campaign Acceptance Heatmap: Showing the drastic difference in conversion rates between "Active Whales" (27%) and "At-Risk Budget" (4%).
Residual Heatmap: A color-coded matrix showing which Education levels over/under-spend on specific food categories.
Normalized Profile Bar Charts: Comparing the relative strengths of Income vs. Recency across clusters.

🚀 Strategic Recommendations

Based on the data, the following actions are recommended:

The "Win-Back" Campaign: Currently, your highest value segment ("Churning VIPs") is drifting away. They historically loved Campaign 5. Re-launch a lookalike of Campaign 5 targeted specifically at this cluster.
Stop "Grocery" Ads for Families: Families with kids are not buying Meat/Fish from you (likely due to price). Pivot their marketing to "Treats" (Sweets/Gold) or Bulk Deals.
Target PhDs with Wine: PhDs are "Liquid Dieters." Stop sending them fruit baskets. Market exclusive vintage wines to unlock their wallet share.
Catalog is King: Analysis showed that NumCatalogPurchases is the strongest predictor of a customer spending more than their income suggests. Invest in the print catalog for high-income prospects.

How to Run

Install Requirements

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

Launch the Notebook

jupyter notebook Marketing_Analytics_Project.ipynb

Download the CSV Used in This Project

This can be found here (Kaggle account may be required)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Marketing_Analytics_Project.ipynb		Marketing_Analytics_Project.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Marketing Analytics: Customer Segmentation & Spending Dynamics

📌 Project Overview

📂 Dataset

🛠️ Tech Stack

⚙️ Methodology

1. Data Cleaning & Preprocessing

2. Exploratory Data Analysis (EDA)

3. Customer Segmentation (K-Means Clustering)

4. Advanced "True Spender" Analysis (Residuals)

📊 Visualizations

🚀 Strategic Recommendations

How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Marketing Analytics: Customer Segmentation & Spending Dynamics

📌 Project Overview

📂 Dataset

🛠️ Tech Stack

⚙️ Methodology

1. Data Cleaning & Preprocessing

2. Exploratory Data Analysis (EDA)

3. Customer Segmentation (K-Means Clustering)

4. Advanced "True Spender" Analysis (Residuals)

📊 Visualizations

🚀 Strategic Recommendations

How to Run

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages