Skip to content

Oheb/Marketing_Analytics_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Marketing Analytics: Customer Segmentation & Spending Dynamics

📌 Project Overview

This project is a comprehensive data analytics study of a customer database for a retail food company. The goal was to identify distinct customer segments, analyze spending behaviors, and optimize marketing campaign performance.

Using Python, we transitioned from raw data cleaning to K-Means Clustering and Linear Regression Residual Analysis to uncover "hidden" spending patterns that demographics alone could not explain.

📂 Dataset

The dataset (ifood_df.csv) contains 2,205 customer records (after cleaning) with 39 features, including:

  • Demographics: Age, Income, Marital Status, Education, Household Structure (Kids/Teens).
  • Behavioral: Recency (days since last purchase), Complaints, Web/Store/Catalog Visits.
  • Spending (Mnt): Monetary value spent on Wines, Fruits, Meat, Fish, Sweets, and Gold.
  • Campaign History: Acceptance of 5 previous marketing campaigns and the current response.

🛠️ Tech Stack

  • Python 3.x
  • Pandas & NumPy: Data manipulation and statistical analysis.
  • Seaborn & Matplotlib: Advanced data visualization.
  • Scikit-Learn: K-Means Clustering, StandardScaler, Linear Regression, Random Forest.

⚙️ Methodology

1. Data Cleaning & Preprocessing

Before analysis, the raw data underwent a rigorous health check:

  • Duplicate Removal: Identified and removed 184 duplicate rows to prevent model overfitting.
  • Feature Pruning: Dropped constant columns (Z_CostContact, Z_Revenue) that added zero variance.
  • Data Integrity: Recalculated the MntTotal column to ensure it mathematically equaled the sum of all individual product categories (Wines + Meat + Fruits + etc.), fixing discrepancies in 2,000+ rows.
  • Feature Engineering: Created new features such as Children (Total Kids + Teens) and Has_Child for family structure analysis.

2. Exploratory Data Analysis (EDA)

We utilized NumPy and Seaborn to understand the shape of the business:

  • The "Income Effect": Identified an exponential relationship between Income and Spending (Correlation: 0.82).
  • Product Mix: Discovered that Wines (50%) and Meat (27%) account for nearly 80% of total revenue.
  • The "Child Penalty": Spending drops by ~50% with one child and craters with 2+ children.

3. Customer Segmentation (K-Means Clustering)

We used K-Means Clustering on standardized features (Income, Total Spending, Recency) to identify 4 distinct personas. The optimal $K=4$ was selected via the Elbow Method.

Cluster Name Profile Strategy
Active Whales High Income, High Spend, Active (<30 days). Retain: Cross-sell premium items.
Churning VIPs High Income, High Spend, Inactive (>70 days). Win-Back: Target with "Campaign 5" (their favorite).
Promising Recent Low Income, Low Spend, Active (<30 days). Nurture: Offer lower-ticket deals to build habits.
At-Risk Budget Low Income, Low Spend, Inactive (>70 days). Automate: Move to low-cost drip campaigns.

4. Advanced "True Spender" Analysis (Residuals)

To find value beyond simple income brackets, we built a Linear Regression model to predict "Expected Spending" based on Income. We then analyzed the Residuals (Actual - Expected) to find over-performers.

Key Findings:

  • Education: "Basic" education customers over-spend relative to their low income (+$$233), while PhDs actually under-spend relative to their high income (-$$22).
  • Family: Having Teens causes a larger drop in discretionary food spending than having Toddlers.
  • The "PhD Diet": Deep-dive category analysis revealed that PhDs significantly over-spend on Wine (+$61) but under-spend on Meat, Fish, and Fruits.

📊 Visualizations

The notebook includes several intricate visualizations:

  1. Cluster Scatter Plot: Visualizing the income gap between VIPs and Mass Market.
  2. Campaign Acceptance Heatmap: Showing the drastic difference in conversion rates between "Active Whales" (27%) and "At-Risk Budget" (4%).
  3. Residual Heatmap: A color-coded matrix showing which Education levels over/under-spend on specific food categories.
  4. Normalized Profile Bar Charts: Comparing the relative strengths of Income vs. Recency across clusters.

🚀 Strategic Recommendations

Based on the data, the following actions are recommended:

  1. The "Win-Back" Campaign: Currently, your highest value segment ("Churning VIPs") is drifting away. They historically loved Campaign 5. Re-launch a lookalike of Campaign 5 targeted specifically at this cluster.
  2. Stop "Grocery" Ads for Families: Families with kids are not buying Meat/Fish from you (likely due to price). Pivot their marketing to "Treats" (Sweets/Gold) or Bulk Deals.
  3. Target PhDs with Wine: PhDs are "Liquid Dieters." Stop sending them fruit baskets. Market exclusive vintage wines to unlock their wallet share.
  4. Catalog is King: Analysis showed that NumCatalogPurchases is the strongest predictor of a customer spending more than their income suggests. Invest in the print catalog for high-income prospects.

How to Run

  1. Install Requirements
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
  1. Launch the Notebook
jupyter notebook Marketing_Analytics_Project.ipynb
  1. Download the CSV Used in This Project

This can be found here (Kaggle account may be required)


About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors