Skip to content
This repository was archived by the owner on Mar 13, 2026. It is now read-only.

burning-cost/insurance-eqrn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

insurance-eqrn

Extreme Quantile Regression Neural Networks for insurance pricing.

The problem

Your EVT model gives you the 1-in-200 claim for the portfolio. EQRN gives you the 1-in-200 claim for the Kensington flat vs the Somerset farmhouse. That difference is your reinsurance margin.

The standard approach to extreme severity modelling — fit a GPD to all claims above a threshold, read off the 99.5th percentile — pools everything together. It gives you one shape parameter and one scale parameter for the whole book. If your TPBI claims have a heavier tail for younger injured parties and lighter for older ones, the pooled model averages those tails away. Your per-segment VaR is wrong and your XL pricing is wrong.

The solution is covariate-dependent GPD parameters: xi(x) and sigma(x) as functions of risk characteristics, not pooled scalars. This is what EQRN does.

EQRN (Pasche & Engelke 2024, Annals of Applied Statistics) is the first method to estimate covariate-dependent GPD parameters using a neural network. This library is the first Python implementation.

What this library provides

  • EQRNModel — two-step fitting: LightGBM intermediate quantile + GPD neural network
  • EQRNDiagnostics — QQ plot, threshold stability, calibration, xi scatter
  • Out-of-fold intermediate quantile estimation (prevents leakage into GPD step)
  • Orthogonal GPD reparameterisation for stable gradient training
  • predict_quantile — conditional VaR at any extreme level (0.99, 0.995, ...)
  • predict_tvar — conditional TVaR / expected shortfall
  • predict_exceedance_prob — P(claim > threshold | risk profile)
  • predict_xl_layer — expected loss in per-risk XL layer (attachment, limit)

Install

pip install insurance-eqrn

PyTorch is required. For CPU-only:

pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install insurance-eqrn

Quickstart

import numpy as np
from insurance_eqrn import EQRNModel, EQRNDiagnostics

# X: covariate matrix (e.g. risk characteristics)
# y: claim severity values (above basic threshold)
model = EQRNModel(
    tau_0=0.85,             # intermediate quantile level
    hidden_sizes=(32, 16, 8),
    n_epochs=300,
    shape_fixed=False,      # covariate-dependent xi
    seed=42,
)
model.fit(X_train, y_train, X_val=X_val, y_val=y_val)

# Per-segment 99.5th percentile severity
var_995 = model.predict_quantile(X_test, q=0.995)

# TVaR for reinsurance pricing
tvar_99 = model.predict_tvar(X_test, q=0.99)

# XL layer: £500k xs £500k
xl_loss = model.predict_xl_layer(X_test, attachment=500_000, limit=500_000)

# Fitted GPD parameters per observation
params = model.predict_params(X_test)
# DataFrame with columns: xi, sigma, nu, threshold

The two-step method

Step 1: Intermediate quantile (LightGBM, out-of-fold)

Fits a quantile regression at level tau_0 (default 0.8) using K-fold cross-validation. Out-of-fold predictions are mandatory here. If you use in-sample predictions, the GPD network in Step 2 sees artificially clean thresholds and learns the wrong exceedance set.

Step 2: GPD neural network on exceedances

Identifies observations above their predicted threshold (~20% of training data at tau_0=0.8). Trains a feedforward network mapping (X, Q_hat(tau_0)) → (nu(x), xi(x)) using the orthogonal GPD deviance loss.

The orthogonal parameterisation (nu = sigma * (xi + 1)) makes the Fisher information matrix diagonal, which stabilises Adam training substantially compared to the direct (sigma, xi) parameterisation.

Prediction

For a new observation x at target level tau > tau_0:

Q_x(tau) = Q_hat_x(tau_0) + sigma(x)/xi(x) * [((1-tau_0)/(1-tau))^xi(x) - 1]

At xi ≈ 0 (exponential limit), this is Q_hat + sigma * log((1-tau_0)/(1-tau)).

Parameters

Parameter Default Description
tau_0 0.8 Intermediate quantile level. Increase for smaller datasets
hidden_sizes (32, 16, 8) Network hidden layer widths
n_epochs 500 Maximum training epochs
patience 50 Early stopping patience
shape_fixed False If True, xi is a scalar. Start here before fitting full model
l2_pen 1e-4 L2 weight decay
shape_penalty 0 Penalty on variance of xi(x) — smooths the shape surface
p_drop 0 Dropout probability. Try 0.1–0.2 for small datasets
n_folds 5 K-fold folds for OOF intermediate quantile
seed None Random seed

Diagnostics

from insurance_eqrn import EQRNDiagnostics

diag = EQRNDiagnostics(model)

# GPD QQ plot — should track the diagonal if the tail model is correct
diag.qq_plot(X_test, y_test)

# Predicted vs empirical coverage at each quantile level
diag.calibration_plot(X_test, y_test, levels=[0.9, 0.95, 0.99, 0.995])

# Mean residual life plot — linearity onset shows where GPD approximation holds
diag.mean_residual_life_plot(y_train)

# Threshold stability — fit shape_fixed models at each tau_0, look for plateau
diag.threshold_stability_plot(X_train, y_train)

# Summary table: predicted vs empirical exceedance rates
diag.summary_table(X_test, y_test)

Insurance applications

Motor TPBI (Third-Party Bodily Injury)

Young injured parties have longer annuity streams and heavier tails. EQRN lets you model xi(x) as a function of injured party age, claim type, solicitor involvement. Output: P(claim > £500k | risk profile) per policy.

Property large loss

Commercial property fire severity varies by construction class, sum insured, sprinkler status. EQRN provides 1-in-200 loss conditional on risk characteristics — input to CAT reinsurance models.

Per-risk XL pricing

# Price layer: £1M xs £500k, conditional on risk
xl = model.predict_xl_layer(X_test, attachment=500_000, limit=1_000_000)

Solvency II SCR

EQRN provides per-segment 99.5th percentile severity, which is the correct input for simulation-based SCR calculations on heterogeneous portfolios. Segment-level conditional VaR is more conservative than pooled EVT for high-risk segments and more accurate for low-risk segments.

When not to use EQRN

  • Frequency modelling: EQRN models severity above a threshold. Frequency is a separate model.
  • Attritional claims: Claims below tau_0 are not modelled by EQRN.
  • Small books (n_exceedances < 200): Set shape_fixed=True as a minimum. Below ~100 exceedances, fall back to marginal EVT.
  • No covariates: Use insurance-evt directly.

Performance

No formal benchmark against a fixed public dataset yet. The relevant comparison is with marginal EVT (single pooled GPD) on the same data. Pasche & Engelke (2024) show that EQRN produces better-calibrated extreme quantiles than marginal EVT when covariate effects on the tail are present (e.g., younger injured parties have heavier tails in TPBI). On simulated data with known covariate-dependent shape parameter xi(x), EQRN with shape_fixed=False recovers the true xi(x) surface; pooled GPD produces a single xi that averages across the variation. The practical question is always whether your book has enough heterogeneity in tail shape to justify the extra complexity. Use diag.threshold_stability_plot() and compare shape_fixed=True vs shape_fixed=False calibration plots — if the covariate-dependent model doesn't improve calibration, use the simpler marginal EVT approach. Below 200 tail observations, the covariate-dependent model will overfit regardless of regularisation.

Reference

Pasche, O.C. & Engelke, S. (2024). "Neural networks for extreme quantile regression with an application to forecasting of flood risk." Annals of Applied Statistics, 18(4), 2818–2839. DOI:10.1214/24-AOAS1907.

R reference implementation: opasche/EQRN (CRAN, March 2025).

Releases

No releases published

Packages

 
 
 

Contributors

Languages