FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation

ZeroCF & FitCF

In this paper, we first introduce ZeroCF, a faithful approach for leveraging important words derived from feature attribution methods to generate counterfactual examples in a zero-shot setting. Second, we present a new framework, FitCF, which further verifies aforementioned counterfactuals by label flip verification and then inserts them as demonstrations for few-shot prompting, outperforming three state-of-the-art baselines.

🚀 Experimental Setup

Datasets

We identify two widely used NLP datasets for counterfactual example generation:

AG News: news topic classification (https://paperswithcode.com/dataset/ag-news)
SST2: sentiment analysis (https://huggingface.co/datasets/stanfordnlp/sst2)

Models

We employ three LLMs with varying model sizes:

Llama3-8B (https://huggingface.co/meta-llama/Meta-Llama-3-8B)
Qwen2.5-32B (https://huggingface.co/Qwen/Qwen2.5-32B)
Qwen2.5-72B (https://huggingface.co/Qwen/Qwen2.5-72B)

Baselines

Polyjuice (Wu et al., 2021); [link to paper]
BAE (Garg and Ramakrishnan, 2020); [link to paper]
FIZLE (Bhattacharjee et al., 2024); [link to paper]

⚙️ Environment Setup

pip install -r requirements.txt

In addition, Polyjuice requires en_core_web_sm:

python -m spacy download en_core_web_sm

📝 Citation

@misc{wang2025fitcfframeworkautomaticfeature,
      title={FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation}, 
      author={Qianli Wang and Nils Feldhus and Simon Ostermann and Luis Felipe Villa-Arenas and Sebastian Möller and Vera Schmitt},
      year={2025},
      eprint={2501.00777},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.00777}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
evaluation		evaluation
experiments		experiments
figures		figures
results		results
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation

ZeroCF & FitCF

🚀 Experimental Setup

Datasets

Models

Baselines

⚙️ Environment Setup

📝 Citation

About

Uh oh!

Releases

Packages

Languages

qiaw99/FitCF

Folders and files

Latest commit

History

Repository files navigation

FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation

ZeroCF & FitCF

🚀 Experimental Setup

Datasets

Models

Baselines

⚙️ Environment Setup

📝 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages