Skip to content

MohamedZOUABI/insurance-risk-segmentation-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Perfect — here’s a polished, professional README introduction section you can directly paste into your GitHub repo:


🚗 Insurance Risk Segmentation using Machine Learning

📌 Project Overview

This project applies unsupervised machine learning techniques to segment auto insurance policyholders based on risk characteristics, claims behavior, and financial attributes.

The objective is to identify distinct customer risk profiles that support:

  • Smarter underwriting decisions
  • Premium optimization
  • Loss ratio management
  • Targeted marketing strategies
  • Profitability improvement

By leveraging clustering algorithms such as K-Means, Agglomerative Clustering, and HDBSCAN, this project uncovers hidden patterns within policyholder data and translates them into actionable business insights.


🧠 Problem Statement

Insurance companies manage diverse customer portfolios with varying levels of risk exposure. Traditional pricing approaches may overlook nuanced behavioral and financial patterns.

This project answers:

  • Can we segment policyholders into meaningful risk groups?
  • Which clusters are the most and least profitable?
  • How do claims frequency, credit score, and driving experience influence risk?
  • Are there geographic risk concentrations?

🔬 Methodology

The workflow includes:

  1. Data Cleaning & Preprocessing

    • Missing value imputation
    • Credit score normalization
    • Date feature engineering (days since last claim)
    • Feature scaling and encoding
  2. Exploratory Data Analysis (EDA)

    • Distribution analysis
    • Correlation matrix
    • Geographic risk visualization
    • Claims vs premium analysis
  3. Feature Engineering

    • Derived temporal features
    • Standardization of numerical variables
    • One-hot encoding of categorical variables
  4. Clustering Algorithms

    • K-Means
    • Agglomerative Clustering
    • HDBSCAN (density-based clustering)
    • Silhouette score & elbow method evaluation
  5. Cluster Profiling & Business Insights

    • Risk characteristics by segment
    • Loss ratio & profitability analysis
    • Geographic cluster mapping
    • Vehicle type and driving experience distribution

📊 Key Outcomes

  • Identification of distinct low-risk and high-risk policyholder segments
  • Profitability analysis at the cluster level
  • Data-driven pricing and underwriting recommendations
  • Geographic visualization of risk concentration
  • Clear segmentation strategy to support business decision-making

🛠 Tech Stack

  • Python
  • Pandas & NumPy
  • Scikit-learn
  • HDBSCAN
  • Matplotlib & Seaborn
  • Folium (Geospatial visualization)

About

Machine learning–based insurance risk segmentation using clustering algorithms to identify policyholder risk profiles and profitability patterns.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors