Skip to content

ssg-research/pii-patch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing

PATCH

Fine-Tuning

We show how to fine-tune a Huggingface model on the ECHR dataset (i) without defenses, and (ii) with differentially private training.

Build & Run

We recommend setting up a conda environment for this project.

$ conda create -n pii-leakage python=3.10
$ conda activate pii-leakage
$ pip install -e .

To install fastDP,

cd libs/fast-differential-privacy
python -m setup develop

Run fine-tuning

Add a wandb key to export WANDB_API_KEY= in ./finetune.sh. We have all the commands in finetune.sh. We can run the following:

./finetune.sh

Mechanistic Interpretability Analysis

To install EAP-IG,

conda create -n py312 python=3.12
conda activate py312
cd libs
git clone https://github.com/hannamw/EAP-IG.git
cd EAP-IG
pip install .
pip install cmapy
pip install seaborn==0.13.2

Analyzing Impact of DP on General Circuits

You can run the shell script with all the commands to generate relevant csv files

./gencircuits.sh

Attack

Assuming your fine-tuned model is located at ../echr_undefended run the following attacks. Otherwise, you can edit the model_ckpt attribute in the ../configs/<ATTACK>/echr-gpt2-small-undefended.yml file to point to the location of the model.

PII Extraction

This will extract PII from the model's generated text.

$ python extract_pii.py --config_path ../configs/pii-extraction-echr-qwen3-17-baseline-loc.yml

Credits

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •