XAI-AttackBench is a modular research benchmark for evaluating the robustness of explanation methods against black-box adversarial attacks on tabular data. The goal of these attacks is to maximize the explanation drift while keeping the model prediction nearly unchanged.
Given a configuration (dataset, model, explainer, attack, metric) the benchmark:
- trains a model on tabular data
- fits a model-agnostic explainer (LIME / SHAP Kernel)
- generates adversarial samples
X_advfromX - enforces prediction fidelity constraints (
<= epsilon) - measures explanation drift (e.g. L2/ Cosine / Spearman)
- exports results as
$\texttt{JSON}$ (scores, timings, counters)
Important: All attacks in this repository are black-box, i.e. they only require access to model outputs (
predict,predict_proba) and do not rely on gradients or other model internals.
It is recommended to install the package into a clean virtual environment:
python -m venv .venv
source .venv/bin/activatepip install -e .Use the editable mode (-e) to be able to modify the source code without reinstalling the package.
After the installation is finished, benchmark experiments can be run right away.
To start an experiment, run the benchmark script with a specific configuration (dataset, model, attack, explainer). The script will:
- load and preprocess the dataset
- train the selected model
- fit the selected explainer
- generate adversarial samples using the chosen attack
- evaluate prediction fidelity (epsilon constraint)
- compute explanation drift scores using the available metrics
- write the results to a
$\texttt{JSON}$ file in theresults/directory
The argument format is as follows:
python skripts/run_benchmark.py <dataset> <model> <attack> <explainer> --seed <int>-
<dataset>: dataset name (e.g.credit,heart_uci,forest,housing,prisoners) -
<model>: model type (e.g.RF,MLP,CNN1D) -
<attack>: attack method (e.g.RandomWalkAttack,MonteCarloAttack,ColumnSwitchAttack,DataPoisoningAttack,GreedyHillClimb) -
<explainer>: explanation method (e.g.Lime,Shap) -
--seed: random seed for reproducibility (controls model init, explainer sampling, and attack randomness). Defaults to$42$ . -
--num_samples: Number of samples from the test set that are used for the evaluation. Defaults to$1,000$ . -
--smoke-test: If set, runs a quick test over all experiment combinations.
After running the command, the benchmark prints the progress and saves all results as a
python skripts/run_benchmark.py credit RF GreedyHillClimb Lime --seed 42 --num_samples 500To just check if everything is working, run a test like:
python skripts/run_benchmark.py -sWhen prompted either press Enter to select all or only specific parts that should be included in the smoke test. Then, all of the selected combinations will be run and a report in results/smoketest is created.
- RandomWalk
- RandomWalkWithMemory
- MonteCarlo
- TrainLookup
- ColumnSwitch
- DataPoisoning
- GreedyHillClimb
- LIME Tabular
- SHAP KernelExplainer
- L2
- Cosine
- Spearman
- Kendall-Tau
- Distortion (L1 + Kendall-Tau)
This repository is designed to be extended easily via inheritance:
- add new attacks via BaseAttack
- add new explainers via BaseExplainer
- add new metrics via BaseMetric
- add new datasets via BaseDataset
Most additions only require a single new file and a registration in skripts/run_benchmark.py.