This repository contains code accompanying the FAccT 2025 paper "Evaluating Model Explanations without Ground Truth". It implements AXE: a ground-truth Agnostic eXplanation Evaluation framework. This provides the first principled strategy to meausure the quality of local feature-importance explanations such as those output by SHAP, LIME, or Integrated-Gradients. AXE is unique because it does not evaluate explanation quality by comparing real explanations with ideal explanations. In practical scenarios lacking access to an oracle that can produce ideal explanations to compare with, AXE can be use seamlessly to measure explanation quality.
Please see the paper for details, or the accompanying slides for a quick overview.
In this paper, we propose three foundational principles necessary to evaluate all local feature importance explanations.
- Local Contextualization: Explanations should depend on the datapoint they seek to explain.
- Model Relativism: Explanations should depend on the model they seek to explain.
- On-Manifold Evaluation: Explanations should be indedependent of off-point model behavior.
Surprisingly, no previous evaluation strategies satisfy all three of these basic principles. AXE satisfies all three of these principles, and has the added benefit of not needing access to ground truth explanations.
AXE operationalises the idea that the highest quality explanation must be one that helps the user predict the model output. In accordance with the 3 foundational principles above, AXE constructs a unique KNN model per datapoint explained, using only a subset of the most important features as determined by the explanation. The accuracy of this model is used to infer the quality of the explanation.
An implementation using FAISS can be found in Sec 4.2. This is recommended for high throughput applications where the use-case requires efficiency.
An implementation using exact KNN models from sklearn can be found in Sec 4.1. This is recommended for specialised applications where the use-case requires accuracy.
The repository contains all the code needed to replicate the experiments in the paper. The "explanation disagreement" example and "illustrative example comparing with PGI" are self contained in subdirectories Sec1 and Sec3 respectively. The code for the "fairwashing adversarial attack" experiment in Sec 4.1 and the "comparison with OpenXAI baselines" in Sec 4.2 built on existing repositories and is therefore included as submodules.
If you have any issues with the code please open a GitHub issue. For any direct suggestions, errors, or improvements please contact first author Kai Rawal. Please address any other feedback or comments using the author emails provided in the paper.
If you find this paper useful or use AXE in your research, please cite us using the citation below.
TBD
All code is released under the MIT License.