In this paper, we first introduce ZeroCF, a faithful approach for leveraging important words derived from feature attribution methods to generate counterfactual examples in a zero-shot setting. Second, we present a new framework, FitCF, which further verifies aforementioned counterfactuals by label flip verification and then inserts them as demonstrations for few-shot prompting, outperforming three state-of-the-art baselines.

We identify two widely used NLP datasets for counterfactual example generation:
- AG News: news topic classification (https://paperswithcode.com/dataset/ag-news)
- SST2: sentiment analysis (https://huggingface.co/datasets/stanfordnlp/sst2)
We employ three LLMs with varying model sizes:
- Llama3-8B (https://huggingface.co/meta-llama/Meta-Llama-3-8B)
- Qwen2.5-32B (https://huggingface.co/Qwen/Qwen2.5-32B)
- Qwen2.5-72B (https://huggingface.co/Qwen/Qwen2.5-72B)
- Polyjuice (Wu et al., 2021); [link to paper]
- BAE (Garg and Ramakrishnan, 2020); [link to paper]
- FIZLE (Bhattacharjee et al., 2024); [link to paper]
pip install -r requirements.txtIn addition, Polyjuice requires en_core_web_sm:
python -m spacy download en_core_web_sm@misc{wang2025fitcfframeworkautomaticfeature,
title={FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation},
author={Qianli Wang and Nils Feldhus and Simon Ostermann and Luis Felipe Villa-Arenas and Sebastian Möller and Vera Schmitt},
year={2025},
eprint={2501.00777},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.00777},
}