gcg: Actually run the optimization in evaluation#82
Merged
Conversation
tomtseng
commented
Feb 7, 2026
| class GCGAttackConfig(TamperAttackConfig): | ||
| """Config for GCG attack.""" | ||
|
|
||
| num_steps: int = 250 |
Collaborator
Author
There was a problem hiding this comment.
these params were originally defined in https://github.com/GraySwanAI/nanoGCG/blob/v0.3.0-release/nanogcg/gcg.py so i'm just moving them back into implementation.py which is based on that file
| name: AttackName = AttackName.GCG_ATTACK | ||
|
|
||
| @override | ||
| def evaluate(self) -> dict[str, float]: |
Collaborator
Author
There was a problem hiding this comment.
rewriting the rest of this file to look more like embedding_attack
d750383 to
4829e09
Compare
GCG was not having much success on small tests, Claude seems to think it could be due to transformers modifying DynamicCache in a way we don't expect
This reverts commit b470044. We don't want to pass, since evaluate() calls `super` which does need that attribute to be set
Collaborator
Author
|
This probably deserves review but I'm merging it anyway to try to clean up the codebase quickly, I want to fix all the type errors in the code and get CircleCI green |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
The GCG code was not actually runnable in evaluation, it only implemented an attack class and didn't not call the implementation anywhere. This PR adds a
src/tamperbench/whitebox/evals/gcg/gcg.pyin the same style as theembedding_attackevaluation.Also this fixes the remaining pyright errors in
main.