XAI-AttackBench

Black-box attacks on explanations (LIME / SHAP) under prediction constraints.

XAI-AttackBench is a modular research benchmark for evaluating the robustness of explanation methods against black-box adversarial attacks on tabular data. The goal of these attacks is to maximize the explanation drift while keeping the model prediction nearly unchanged.

✅ What it does

Given a configuration (dataset, model, explainer, attack, metric) the benchmark:

trains a model on tabular data
fits a model-agnostic explainer (LIME / SHAP Kernel)
generates adversarial samples X_adv from X
enforces prediction fidelity constraints (<= epsilon)
measures explanation drift (e.g. L2/ Cosine / Spearman)
exports results as $\texttt{JSON}$ (scores, timings, counters)

Important: All attacks in this repository are black-box, i.e. they only require access to model outputs (predict, predict_proba) and do not rely on gradients or other model internals.

🚀 Installation (fast & simple)

It is recommended to install the package into a clean virtual environment:

1) Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate

2) Install the project in editable mode

pip install -e .

Use the editable mode (-e) to be able to modify the source code without reinstalling the package.

After the installation is finished, benchmark experiments can be run right away.

▶️ Run a benchmark

To start an experiment, run the benchmark script with a specific configuration (dataset, model, attack, explainer). The script will:

load and preprocess the dataset
train the selected model
fit the selected explainer
generate adversarial samples using the chosen attack
evaluate prediction fidelity (epsilon constraint)
compute explanation drift scores using the available metrics
write the results to a $\texttt{JSON}$ file in the results/ directory

Arguments

The argument format is as follows:

python skripts/run_benchmark.py <dataset> <model> <attack> <explainer> --seed <int>

<dataset>: dataset name (e.g. credit, heart_uci, forest, housing, prisoners)
<model>: model type (e.g. RF, MLP, CNN1D)
<attack>: attack method (e.g. RandomWalkAttack, MonteCarloAttack, ColumnSwitchAttack, DataPoisoningAttack, GreedyHillClimb)
<explainer>: explanation method (e.g. Lime, Shap)
--seed: random seed for reproducibility (controls model init, explainer sampling, and attack randomness). Defaults to $42$.
--num_samples: Number of samples from the test set that are used for the evaluation. Defaults to $1,000$.
--smoke-test: If set, runs a quick test over all experiment combinations.

After running the command, the benchmark prints the progress and saves all results as a $\texttt{JSON}$ file.

Example Run

python skripts/run_benchmark.py credit RF GreedyHillClimb Lime --seed 42 --num_samples 500

▶️ Run a Smoketest

To just check if everything is working, run a test like:

python skripts/run_benchmark.py -s

When prompted either press Enter to select all or only specific parts that should be included in the smoke test. Then, all of the selected combinations will be run and a report in results/smoketest is created.

💣 Attacks (included)

RandomWalk
RandomWalkWithMemory
MonteCarlo
TrainLookup
ColumnSwitch
DataPoisoning
GreedyHillClimb

🔍 Explainers

LIME Tabular
SHAP KernelExplainer

📏 Metrics (explanation drift)

L2
Cosine
Spearman
Kendall-Tau
Distortion (L1 + Kendall-Tau)

🔧 Extending

This repository is designed to be extended easily via inheritance:

add new attacks via BaseAttack
add new explainers via BaseExplainer
add new metrics via BaseMetric
add new datasets via BaseDataset

Most additions only require a single new file and a registration in skripts/run_benchmark.py.

Name		Name	Last commit message	Last commit date
Latest commit History 279 Commits
notebooks		notebooks
skripts		skripts
src/xai_bench		src/xai_bench
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XAI-AttackBench

Black-box attacks on explanations (LIME / SHAP) under prediction constraints.

✅ What it does

🚀 Installation (fast & simple)

1) Create and activate a virtual environment

2) Install the project in editable mode

▶️ Run a benchmark

Arguments

Example Run

▶️ Run a Smoketest

💣 Attacks (included)

🔍 Explainers

📏 Metrics (explanation drift)

🔧 Extending

About

Uh oh!

Contributors 5

Uh oh!

Languages

License

ZanderNic/Fooling_XAI

Folders and files

Latest commit

History

Repository files navigation

XAI-AttackBench

Black-box attacks on explanations (LIME / SHAP) under prediction constraints.

✅ What it does

🚀 Installation (fast & simple)

1) Create and activate a virtual environment

2) Install the project in editable mode

▶️ Run a benchmark

Arguments

Example Run

▶️ Run a Smoketest

💣 Attacks (included)

🔍 Explainers

📏 Metrics (explanation drift)

🔧 Extending

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 5

Uh oh!

Languages