This repository contains the multilingual version code of the paper, Explaining NLP Models via Minimal Contrastive Editing (MiCE), adapted to pytorch + accelerator for better long-term compatibility and reproducibility, plus we eliminated the now deprecated allennlp and allennlp-models package.
This programs was developed on windows 10 OS, using pytorch, transformers, nltk, and the common python libraries. Currently we are working on adapting the code to unix environments. The code development was centered around GPU for efficient execution time, although CPU only use is allowed, as of 2023 the authors do not find this to be an optimum approach.
- Domingo Benoit Cea, Msc. Universidad Técnica Federico Santa María
-
Clone the repository.
git clone https://github.com/allenai/mice.git cd mice -
Create a python virtual environment on python>=3.10. for more info follow link
python -m venv .env
-
Activate the environment.
source .env/bin/activate -
Download the requirements.
pip install -r requirements.txt
-
Download Task Data: If you want to work with the RACE dataset, download it here: Link. The commands below assume that this data, after downloaded, is stored in
data/RACE/. All other task-specific datasets are automatically downloaded by the commands below. -
Download Pretrained Models: You can download pretrained models by running:
bash download_models.sh
For each task (IMDB/Newsgroups/RACE), this script saves the:
- Predictor model to:
trained_predictors/{TASK}/model/. - Editor checkpoint to:
results/{TASK}/editors/{EXPERIMENT-NAME}/{TASK}_editor.pth.
- Predictor model to:
-
Generate Edits: Run the following command to generate edits for a particular task with our pretrained editor. It will write edits to
results/{TASK}/edits/{STAGE2EXP}/edits.csv.python run_stage_two.py -task {TASK} -stage2_exp {STAGE2EXP} -editor_path results/{TASK}/editors/mice/{TASK}_editor.pthFor instance, to generate edits for the IMDB task, the following command will save edits to
results/imdb/edits/mice_binary/edits.csv:python run_stage_two.py -task imdb -stage2_exp mice_binary -editor_path results/imdb/editors/mice/imdb_editor.pth
-
Inspect Edits: Inspect these edits with the demo notebook
notebooks/evaluation.ipynb.
The following command will train an editor (i.e. run Stage 1 of MiCE) for a particular task. It saves checkpoints to results/{TASK}/editors/{STAGE1EXP}/checkpoints/.
´python run_stage_one.py -task {TASK} -stage1_exp {STAGE1EXP}´
The following command will find MiCE edits (i.e. run Stage 2 of MiCE) for a particular task. It saves edits to results/{TASK}/edits/{STAGE2EXP}/edits.csv. -editor_path determines the Editor model to use. Defaults to our pretrained Editor.
´python run_stage_two.py -task {TASK} -stage2_exp {STAGE2EXP} -editor_path results/{TASK}/editors/mice/{TASK}_editor.pth´
The notebook notebooks/evaluation.ipynb contains some code to inspect edits.
To compute fluency of edits, see the EditEvaluator class in src/edit_finder.py.
Follow the steps below to extend this repo for your own task.
-
Create a subfolder within
src/predictors/{TASK} -
Dataset reader: Create a task specific dataset reader in a file
{TASK}_dataset_reader.pywithin that subfolder. It should have methods:text_to_instance(),_read(), andget_inputs(). -
Train Predictor: Create a training config (see
src/predictors/imdb/imdb_roberta.jsonfor an example). Then train the Predictor using AllenNLP (see above commands or commands inrun_all.shfor examples). -
Train Editor Model: Depending on the task, you may have to create a new
StageOneDatasetsubclass (seeRaceStageOneDatasetinsrc/dataset.pyfor an example of how to inherit fromStageOneDataset).- For classification tasks, the existing base
StageOneDatasetclass should work. - For new multiple-choice QA tasks with dataset readers patterned after the
RaceDatasetReader(src/predictors/race/race_dataset_reader.py), the existingRaceStageOneDatasetclass should work.
- For classification tasks, the existing base
-
Generate Edits: Depending on the task, you may have to create a new
Editorsubclass (seeRaceEditorinsrc/editor.pyfor an example of how to inherit fromEditor).- For classification tasks, the existing base
Editorclass should work. - For multiple-choice QA with dataset readers patterned after
RaceDatasetReader, the existingRaceEditorclass should work.
- For classification tasks, the existing base
If you want to cite the original paper please give them a cite!
@inproceedings{Ross2020ExplainingNM,
title = "Explaining NLP Models via Minimal Contrastive Editing (MiCE)",
author = "Ross, Alexis and Marasovi{\'c}, Ana and Peters, Matthew E.",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2021",
publisher = "Association for Computational Linguistics",
url= "https://arxiv.org/abs/2012.13985",
}