Skip to content

gcg: Actually run the optimization in evaluation#82

Merged
tomtseng merged 7 commits intomainfrom
tomtseng/gcg-eval
Feb 7, 2026
Merged

gcg: Actually run the optimization in evaluation#82
tomtseng merged 7 commits intomainfrom
tomtseng/gcg-eval

Conversation

@tomtseng
Copy link
Collaborator

@tomtseng tomtseng commented Feb 7, 2026

Changes

The GCG code was not actually runnable in evaluation, it only implemented an attack class and didn't not call the implementation anywhere. This PR adds a src/tamperbench/whitebox/evals/gcg/gcg.py in the same style as the embedding_attack evaluation.

Also this fixes the remaining pyright errors in main.

class GCGAttackConfig(TamperAttackConfig):
"""Config for GCG attack."""

num_steps: int = 250
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these params were originally defined in https://github.com/GraySwanAI/nanoGCG/blob/v0.3.0-release/nanogcg/gcg.py so i'm just moving them back into implementation.py which is based on that file

name: AttackName = AttackName.GCG_ATTACK

@override
def evaluate(self) -> dict[str, float]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewriting the rest of this file to look more like embedding_attack

GCG was not having much success on small tests, Claude seems to think it
could be due to transformers modifying DynamicCache in a way we don't
expect
This reverts commit b470044.

We don't want to pass, since evaluate() calls `super` which does need
that attribute to be set
@tomtseng tomtseng marked this pull request as ready for review February 7, 2026 08:23
@tomtseng
Copy link
Collaborator Author

tomtseng commented Feb 7, 2026

This probably deserves review but I'm merging it anyway to try to clean up the codebase quickly, I want to fix all the type errors in the code and get CircleCI green

@tomtseng tomtseng merged commit 708bbbe into main Feb 7, 2026
1 of 2 checks passed
@tomtseng tomtseng deleted the tomtseng/gcg-eval branch February 7, 2026 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant