TreeShap

🎯 Goals of the Repo

This repo (see xai_protein_modified.ipynb) aims to reproduce the results of the paper
📖 Explainable AI for Trees: From Local Explanations to Global Understanding by S.M. Lundberg et al. (2020).

It leverages some code from this repository and focuses exclusively on the NHANES I dataset, which is one of the three datasets used in the original study.

Outline

🔹 1. Data Loading – Load the dataset following the original approach.
🔹 2. Problem Exploration – Understand key features and distributions.
🔹 3. Model Training – Train:

🌳 an XGBoost model
📉 a linear model
(Hyperparameter tuning via RandomSearch)

🔹 4. Algorithm Implementation – Code Algorithm 1 (Explainer) from scratch.
🔹 5. SHAP Comparison – Compare our implementation with TreeExplainer from the SHAP library.
🔹 6. Complexity Analysis – Evaluate computational efficiency.
🔹 7. Dependence Plots – Analyze feature dependencies & interactions.
🔹 8. Global vs. Local SHAP – Compare SHAP values with the Gain method.

Main Results

🔄 Feature Interactions: Systolic BP & Age

SHAP allows decomposition of feature contributions into main effects and interaction effects. Below we analyze how systolic blood pressure influences mortality risk depending on age.

Left: Total SHAP effect (main + interaction), colored by age.
Middle: Isolated main effect of systolic BP.
Right: Interaction term between systolic BP and age.

🎯 Interpretation: High systolic blood pressure increases mortality risk mainly for younger patients. For older individuals, the effect may plateau or even reverse — revealing a strong interaction.

SHAP vs Gain: Global Feature Importance

To move from local explanations to a global understanding, we aggregated SHAP values over the entire NHANES I dataset and compared the results with classical XGBoost Gain-based feature importance.

SHAP bar plot (left) ranks features by their average impact on model output.
SHAP summary plot (middle) shows both importance and directionality (positive/negative impact).
Gain plot (right) lacks directionality and overemphasizes features that appear deeper in the tree.

🔑 Key Insight: While Gain highlights age, SHAP confirms this and adds interpretability by showing how and for whom features matter.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
docs		docs
figures		figures
notebooks		notebooks
README.md		README.md
bruteforce_treeshap.py		bruteforce_treeshap.py
treeinfo.py		treeinfo.py
xai_protein_modified.ipynb		xai_protein_modified.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TreeShap

🎯 Goals of the Repo

Outline

Main Results

🔄 Feature Interactions: Systolic BP & Age

SHAP vs Gain: Global Feature Importance

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

CorentinPernot/TreeShap

Folders and files

Latest commit

History

Repository files navigation

TreeShap

🎯 Goals of the Repo

Outline

Main Results

🔄 Feature Interactions: Systolic BP & Age

SHAP vs Gain: Global Feature Importance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages