This repo (see xai_protein_modified.ipynb) aims to reproduce the results of the paper
π Explainable AI for Trees: From Local Explanations to Global Understanding by S.M. Lundberg et al. (2020).
It leverages some code from this repository and focuses exclusively on the NHANES I dataset, which is one of the three datasets used in the original study.
πΉ 1. Data Loading β Load the dataset following the original approach.
πΉ 2. Problem Exploration β Understand key features and distributions.
πΉ 3. Model Training β Train:
- π³ an XGBoost model
- π a linear model
(Hyperparameter tuning via RandomSearch)
πΉ 4. Algorithm Implementation β Code Algorithm 1 (Explainer) from scratch.
πΉ 5. SHAP Comparison β Compare our implementation with TreeExplainer from the SHAP library.
πΉ 6. Complexity Analysis β Evaluate computational efficiency.
πΉ 7. Dependence Plots β Analyze feature dependencies & interactions.
πΉ 8. Global vs. Local SHAP β Compare SHAP values with the Gain method.
SHAP allows decomposition of feature contributions into main effects and interaction effects. Below we analyze how systolic blood pressure influences mortality risk depending on age.
- Left: Total SHAP effect (main + interaction), colored by age.
- Middle: Isolated main effect of systolic BP.
- Right: Interaction term between systolic BP and age.
π― Interpretation: High systolic blood pressure increases mortality risk mainly for younger patients. For older individuals, the effect may plateau or even reverse β revealing a strong interaction.
To move from local explanations to a global understanding, we aggregated SHAP values over the entire NHANES I dataset and compared the results with classical XGBoost Gain-based feature importance.
- SHAP bar plot (left) ranks features by their average impact on model output.
- SHAP summary plot (middle) shows both importance and directionality (positive/negative impact).
- Gain plot (right) lacks directionality and overemphasizes features that appear deeper in the tree.
π Key Insight: While Gain highlights age, SHAP confirms this and adds interpretability by showing how and for whom features matter.

