Skip to content

Yurim990507/suppression-or-deletion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Suppression or Deletion: A Restoration-Based Representation-Level Analysis of Machine Unlearning (WWW '26)

arXiv Python 3.9+ PyTorch


Overview

As pretrained models are increasingly shared on the web, ensuring models can forget sensitive, copyrighted, or private information has become crucial. Current unlearning evaluations rely on output-based metrics, which cannot verify whether information is truly deleted or merely suppressed at the representation level.

This repository provides a restoration-based analysis framework using Sparse Autoencoders to:

  • Identify class-specific expert features in intermediate layers
  • Apply inference-time steering to restore unlearned information
  • Quantitatively distinguish between suppression and deletion

Installation

Prerequisites

  • Python 3.9 or higher
  • CUDA-capable GPU (recommended)

Setup

# Clone the repository
git clone https://github.com/Yurim990507/suppression-or-deletion.git
cd suppression-or-deletion

# Install dependencies
pip install -r requirements.txt

Quick Start

1. Download Pretrained Assets

Download the pretrained SAE models, expert features, and original model from Hugging Face:

# Install Hugging Face CLI
pip install huggingface_hub

# Download all pretrained files
huggingface-cli download Yurim0507/suppression-or-deletion --local-dir ./pretrained --repo-type=model

The files will be organized as:

pretrained/
├── cifar10/
│   ├── vit_base_16_original.pth          # Original ViT model
│   ├── sae_layer9_k16.pt                 # SAE model (layer 9, k=16)
│   ├── activations_layer9_stats.npy      # Normalization statistics
│   └── expert_features_layer9_k16.pt     # Expert features per class
└── imagenette/
    ├── vit_base_16_original.pth          # Original ViT model
    ├── sae_layer9_k32.pt                 # SAE model (layer 9, k=32)
    ├── activations_layer9_stats.npy      # Normalization statistics
    └── expert_features_layer9_k32.pt     # Expert features per class

Note: Dataset-specific SAE configurations:

  • CIFAR-10: k=16 (TopK sparsity)
  • Imagenette: k=32 (TopK sparsity)

2. Prepare Your Unlearned Model

Train an unlearned model using your preferred method (CF-k, SALUN, SCRUB, etc.) and save it as a .pth checkpoint.

Example checkpoint format:

{
    'model_state_dict': model.state_dict(),
    # ... other optional keys
}

3. Run Restoration Test

Simple Demo

python demo.py \
    --dataset cifar10 \
    --unlearned_model path/to/your/unlearned_model.pth \
    --target_class 0

Full Control

python recovery_test.py \
    --dataset cifar10 \
    --unlearned_model path/to/your/unlearned_model.pth \
    --target_class 0 \
    --layer 9 \
    --alpha 1.0 5.0 10.0 \
    --save_dir ./results

4. View Results

Results are saved in the --save_dir directory:

  • restoration_class{X}.png: Line plot showing restoration performance
  • restoration_class{X}_results.json: Detailed numerical results

Methodology

Sparse Autoencoder (SAE)

We train a sparse autoencoder on ViT layer activations with the following architecture:

  • Input: ViT hidden states (768-dim for ViT-Base)
  • Latent: 768-dim (hidden_mul=1)
  • Activation: TopK sparsity (only top K features active per sample)
  • Output: Reconstructed hidden states

Dataset-specific configurations:

Dataset K value (TopK sparsity)
CIFAR-10 16
Imagenette 32

Restoration mode:

  • direct_injection: Gradual addition method (default, used in experiments)
    • Only target class samples are restored
    • Non-target samples remain untouched

Datasets

This repository supports CIFAR-10 and Imagenette datasets used in our experiments.

Citation

If you find this work useful, please cite our paper:

@article{jang2026suppression,
  title={{Suppression or Deletion}: A Restoration-Based Representation-Level Analysis of Machine Unlearning},
  author={Jang, Yurim and Lee, Jaeung and Kim, Dohyun and Jo, Jaemin and Woo, Simon S},
  journal={arXiv preprint arXiv:2602.18505},
  year={2026}
}

About

[WWW '26 Short Paper] Suppression or Deletion: A Restoration-Based Representation-Level Analysis of Machine Unlearning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages