insurance-eqrn

Extreme Quantile Regression Neural Networks for insurance pricing.

The problem

Your EVT model gives you the 1-in-200 claim for the portfolio. EQRN gives you the 1-in-200 claim for the Kensington flat vs the Somerset farmhouse. That difference is your reinsurance margin.

The standard approach to extreme severity modelling — fit a GPD to all claims above a threshold, read off the 99.5th percentile — pools everything together. It gives you one shape parameter and one scale parameter for the whole book. If your TPBI claims have a heavier tail for younger injured parties and lighter for older ones, the pooled model averages those tails away. Your per-segment VaR is wrong and your XL pricing is wrong.

The solution is covariate-dependent GPD parameters: xi(x) and sigma(x) as functions of risk characteristics, not pooled scalars. This is what EQRN does.

EQRN (Pasche & Engelke 2024, Annals of Applied Statistics) is the first method to estimate covariate-dependent GPD parameters using a neural network. This library is the first Python implementation.

What this library provides

EQRNModel — two-step fitting: LightGBM intermediate quantile + GPD neural network
EQRNDiagnostics — QQ plot, threshold stability, calibration, xi scatter
Out-of-fold intermediate quantile estimation (prevents leakage into GPD step)
Orthogonal GPD reparameterisation for stable gradient training
predict_quantile — conditional VaR at any extreme level (0.99, 0.995, ...)
predict_tvar — conditional TVaR / expected shortfall
predict_exceedance_prob — P(claim > threshold | risk profile)
predict_xl_layer — expected loss in per-risk XL layer (attachment, limit)

Install

pip install insurance-eqrn

PyTorch is required. For CPU-only:

pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install insurance-eqrn

Quickstart

import numpy as np
from insurance_eqrn import EQRNModel, EQRNDiagnostics

# X: covariate matrix (e.g. risk characteristics)
# y: claim severity values (above basic threshold)
model = EQRNModel(
    tau_0=0.85,             # intermediate quantile level
    hidden_sizes=(32, 16, 8),
    n_epochs=300,
    shape_fixed=False,      # covariate-dependent xi
    seed=42,
)
model.fit(X_train, y_train, X_val=X_val, y_val=y_val)

# Per-segment 99.5th percentile severity
var_995 = model.predict_quantile(X_test, q=0.995)

# TVaR for reinsurance pricing
tvar_99 = model.predict_tvar(X_test, q=0.99)

# XL layer: £500k xs £500k
xl_loss = model.predict_xl_layer(X_test, attachment=500_000, limit=500_000)

# Fitted GPD parameters per observation
params = model.predict_params(X_test)
# DataFrame with columns: xi, sigma, nu, threshold

The two-step method

Step 1: Intermediate quantile (LightGBM, out-of-fold)

Fits a quantile regression at level tau_0 (default 0.8) using K-fold cross-validation. Out-of-fold predictions are mandatory here. If you use in-sample predictions, the GPD network in Step 2 sees artificially clean thresholds and learns the wrong exceedance set.

Step 2: GPD neural network on exceedances

Identifies observations above their predicted threshold (~20% of training data at tau_0=0.8). Trains a feedforward network mapping (X, Q_hat(tau_0)) → (nu(x), xi(x)) using the orthogonal GPD deviance loss.

The orthogonal parameterisation (nu = sigma * (xi + 1)) makes the Fisher information matrix diagonal, which stabilises Adam training substantially compared to the direct (sigma, xi) parameterisation.

Prediction

For a new observation x at target level tau > tau_0:

Q_x(tau) = Q_hat_x(tau_0) + sigma(x)/xi(x) * [((1-tau_0)/(1-tau))^xi(x) - 1]

At xi ≈ 0 (exponential limit), this is Q_hat + sigma * log((1-tau_0)/(1-tau)).

Parameters

Parameter	Default	Description
tau_0	0.8	Intermediate quantile level. Increase for smaller datasets
hidden_sizes	(32, 16, 8)	Network hidden layer widths
n_epochs	500	Maximum training epochs
patience	50	Early stopping patience
shape_fixed	False	If True, xi is a scalar. Start here before fitting full model
l2_pen	1e-4	L2 weight decay
shape_penalty	0	Penalty on variance of xi(x) — smooths the shape surface
p_drop	0	Dropout probability. Try 0.1–0.2 for small datasets
n_folds	5	K-fold folds for OOF intermediate quantile
seed	None	Random seed

Diagnostics

from insurance_eqrn import EQRNDiagnostics

diag = EQRNDiagnostics(model)

# GPD QQ plot — should track the diagonal if the tail model is correct
diag.qq_plot(X_test, y_test)

# Predicted vs empirical coverage at each quantile level
diag.calibration_plot(X_test, y_test, levels=[0.9, 0.95, 0.99, 0.995])

# Mean residual life plot — linearity onset shows where GPD approximation holds
diag.mean_residual_life_plot(y_train)

# Threshold stability — fit shape_fixed models at each tau_0, look for plateau
diag.threshold_stability_plot(X_train, y_train)

# Summary table: predicted vs empirical exceedance rates
diag.summary_table(X_test, y_test)

Insurance applications

Motor TPBI (Third-Party Bodily Injury)

Young injured parties have longer annuity streams and heavier tails. EQRN lets you model xi(x) as a function of injured party age, claim type, solicitor involvement. Output: P(claim > £500k | risk profile) per policy.

Property large loss

Commercial property fire severity varies by construction class, sum insured, sprinkler status. EQRN provides 1-in-200 loss conditional on risk characteristics — input to CAT reinsurance models.

Per-risk XL pricing

# Price layer: £1M xs £500k, conditional on risk
xl = model.predict_xl_layer(X_test, attachment=500_000, limit=1_000_000)

Solvency II SCR

EQRN provides per-segment 99.5th percentile severity, which is the correct input for simulation-based SCR calculations on heterogeneous portfolios. Segment-level conditional VaR is more conservative than pooled EVT for high-risk segments and more accurate for low-risk segments.

When not to use EQRN

Frequency modelling: EQRN models severity above a threshold. Frequency is a separate model.
Attritional claims: Claims below tau_0 are not modelled by EQRN.
Small books (n_exceedances < 200): Set shape_fixed=True as a minimum. Below ~100 exceedances, fall back to marginal EVT.
No covariates: Use insurance-evt directly.

Performance

No formal benchmark against a fixed public dataset yet. The relevant comparison is with marginal EVT (single pooled GPD) on the same data. Pasche & Engelke (2024) show that EQRN produces better-calibrated extreme quantiles than marginal EVT when covariate effects on the tail are present (e.g., younger injured parties have heavier tails in TPBI). On simulated data with known covariate-dependent shape parameter xi(x), EQRN with shape_fixed=False recovers the true xi(x) surface; pooled GPD produces a single xi that averages across the variation. The practical question is always whether your book has enough heterogeneity in tail shape to justify the extra complexity. Use diag.threshold_stability_plot() and compare shape_fixed=True vs shape_fixed=False calibration plots — if the covariate-dependent model doesn't improve calibration, use the simpler marginal EVT approach. Below 200 tail observations, the covariate-dependent model will overfit regardless of regularisation.

Reference

Pasche, O.C. & Engelke, S. (2024). "Neural networks for extreme quantile regression with an application to forecasting of flood risk." Annals of Applied Statistics, 18(4), 2818–2839. DOI:10.1214/24-AOAS1907.

R reference implementation: opasche/EQRN (CRAN, March 2025).

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
notebooks		notebooks
src/insurance_eqrn		src/insurance_eqrn
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

insurance-eqrn

The problem

What this library provides

Install

Quickstart

The two-step method

Parameters

Diagnostics

Insurance applications

When not to use EQRN

Performance

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

insurance-eqrn

The problem

What this library provides

Install

Quickstart

The two-step method

Parameters

Diagnostics

Insurance applications

When not to use EQRN

Performance

Reference

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages