attack: added Wanda Pruning (attack) by esveee · Pull Request #26 · criticalml-uw/TamperBench

esveee · 2025-08-29T17:56:58Z

Changes

Added Wanda pruning as an attack paradigm. Normally, pruning is benign (for efficiency). But if applied maliciously, it can be a form of model tampering.

Testing

[Test in Progress]

tomtseng

looks like most of this is code copied from elsewhere, which I won't bother carefully reviewing.
Tagging Saad to review the remaining file src/safetunebed/whitebox/attacks/wanda_pruning/wanda_pruning.py

tomtseng · 2025-08-29T20:54:11Z

src/safetunebed/whitebox/attacks/wanda_pruning/__init__.py

can add citation and link to original code here, like punya did in his PR src/safetunebed/whitebox/attacks/gcg/init.py

sdhossain

@esveee could you please add a test script we could run? similar to what we have in our tests folder currently -> also do add a custom config if it's relevant here.

otherwise lgtm (also didn't look over too much at the ported over code - would recommend adding the citation + link to code as is done with other attacks)

sdhossain · 2025-08-30T00:07:17Z

src/safetunebed/whitebox/attacks/wanda_pruning/wanda_pruning.py

+    def run_attack(self) -> None:
+        cfg = self.attack_config
+
+        print(f"[WandA]  Loading model from: {cfg.base_input_checkpoint_path}")


I know we don't have a unified logging logic for the repository just yet, I do think we should probably use logging.logger so that we can control the level of logging.

Not something to necessarily scope for this PR.

sdhossain · 2025-08-30T00:10:56Z

src/safetunebed/whitebox/attacks/wanda_pruning/wanda_pruning.py

+        cfg = self.attack_config
+
+        print(f"[WandA]  Loading model from: {cfg.base_input_checkpoint_path}")
+        model = AutoModelForCausalLM.from_pretrained(cfg.base_input_checkpoint_path, torch_dtype=torch.float16)


torch_dtype=torch.float16 -> torch_dtype=torch.bfloat16 <-- note (only relevant if we are having errors here)

sdhossain · 2025-08-30T00:12:46Z

src/safetunebed/whitebox/attacks/wanda_pruning/wanda_pruning.py

+    """Implements weight-space tampering via WandA pruning."""
+
+    def run_attack(self) -> None:
+        cfg = self.attack_config


nit: our current style is to use config instead of cfg, I personally prefer that we use self.attack_config explicitly where we use it (so that is not ambiguous with other configs)

sdhossain · 2025-08-30T00:16:34Z

src/safetunebed/whitebox/attacks/wanda_pruning/wanda_pruning.py

+    StrongRejectEvaluationConfig,
+)
+
+class WandaPruningAttack(TamperAttack[TamperAttackConfig]):


do we need a custom WandaPruningAttackConfig for this attack?

sdhossain · 2025-08-30T00:18:11Z

src/safetunebed/whitebox/attacks/wanda_pruning/utils.py

@@ -0,0 +1,398 @@
+import time 


can we add a note on where the code was sourced from in header doc-string? (that is if it was sourced externally - ping me if it wasn't)

Samanvay Vajpayee added 2 commits August 29, 2025 13:53

wanda pruning added

a45de4c

fix imports

f84c867

tomtseng approved these changes Aug 29, 2025

View reviewed changes

tomtseng requested review from sdhossain and tomtseng August 29, 2025 20:58

sdhossain approved these changes Aug 30, 2025

View reviewed changes

sdhossain added the attack Adds or modifies attacks label Dec 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attack: added Wanda Pruning (attack) #26

attack: added Wanda Pruning (attack) #26
esveee wants to merge 2 commits intomainfrom
esveee/wanda-final

esveee commented Aug 29, 2025

Uh oh!

tomtseng left a comment

Uh oh!

tomtseng Aug 29, 2025

Uh oh!

sdhossain left a comment

Uh oh!

sdhossain Aug 30, 2025

Uh oh!

sdhossain Aug 30, 2025

Uh oh!

sdhossain Aug 30, 2025

Uh oh!

sdhossain Aug 30, 2025

Uh oh!

sdhossain Aug 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

esveee commented Aug 29, 2025

Changes

Testing

Uh oh!

tomtseng left a comment

Choose a reason for hiding this comment

Uh oh!

tomtseng Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

sdhossain left a comment

Choose a reason for hiding this comment

Uh oh!

sdhossain Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

sdhossain Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

sdhossain Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

sdhossain Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

sdhossain Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants