Conversation
There was a problem hiding this comment.
@psyonp please make sure to rebase on main, my suggestion is git rebase main -X theirs after pulling changes from main git pull origin main
also if you could please follow the template of #12 - that would be extremely helpful. (i.e. implementing it as an eval as well)
make sure to add some script as a sanity checker that we can run similarly as well.
src/safetunebed/whitebox/attacks/embedding_attack/embedding_attack.py
Outdated
Show resolved
Hide resolved
Tried to rebase on main again and it said current branch is up to date. Template is now updated, lmk your thoughts. |
sdhossain
left a comment
There was a problem hiding this comment.
Lgtm -> if the tests work and we run pre-commit checks, am ok with merging (but would like a bit more documentation on the ported code -> but think it's ok to not lint that)
| outputs = self.module(*inputs, **kwargs) | ||
| return self.adversary(outputs) | ||
|
|
||
| def deepspeed_add_hooks( |
There was a problem hiding this comment.
Idk if we are supporting deepspeed just yet, or transformer_lens -> we might not need these hooks? (imo might make sense to remove until we add support for them to avoid confusion / having redundant code, but don't have very strong opinions on it)
There was a problem hiding this comment.
Yeah sounds good, kept in case if we migrate to deepspeed on CAIS. Can remove as needed.
| else: | ||
| return self.module(*args, **kwargs) | ||
|
|
||
| # a hook for transformer-lens models |
There was a problem hiding this comment.
does the method work for non transformer-lens models? (If it does, I think might make sense to use this for all for consistency?)
| import einops | ||
|
|
||
| # The attacker is parameterized by a low-rank MLP (not used by default) | ||
| class LowRankAdversary(torch.nn.Module): |
There was a problem hiding this comment.
I see we are only using GDAdversary in the attack, do we want it to be configurable to use these?
There was a problem hiding this comment.
Tried to keep as many helpers as needed. Can remove as needed.
| return self.m(x) + x | ||
|
|
||
|
|
||
| # Standard projected gradient attack (used by default) |
There was a problem hiding this comment.
nit: can we have the comment as the doc-string describing the class
|
|
||
| return x | ||
|
|
||
| def clip_attack(self): |
There was a problem hiding this comment.
nit: docstring (should probably run pre-commit hooks (currently forces us to doc-string methods)
There was a problem hiding this comment.
Done, ran pre-commit hooks
There was a problem hiding this comment.
remark: if code is copy-pasted directly from another source with minimal changes , then there's a case to be made that we shouldn't lint their code, we should keep it as-is so that
- it's easier to diff to find the logical changes between our version and theirs
- if the upstream source changes, it's easier to copy those changes into our version
- less work for us lol
if the upstream code hasn't been updated in a long time or we're making a decent number of changes then linting is reasonable
There was a problem hiding this comment.
Linting this is a nightmare. I just got hit with 150+ errors because I accidentally ran ruff-check and it won't let me commit until I've resolved the comments. Trying to figure out the best way to approach this, will only stage the files necessary.
Edit: much better, do not run ruff check --fix
There was a problem hiding this comment.
Agree in general we don't need to lint for copy-pasted files, (I've generally done flagged the files to be ignored from pyright checks) -> we can do the same for ruff checks. -> we should definitely indicate this in the file header.
I personally liked including doc-strings. just so that it's easy to debug some times, and could help potential users of our toolkit - but aren't too important (especially if we are porting over a lot of files).
@psyonp - just so you don't have to debug a bunch of linting errors, you can add it to the excludes in our pyproject.toml file. (150+ unfixable errors is a bit odd with our rule-set)
exclude = [
".venv",
"**/__pycache__",
"_legacy",
"src/safetunebed/whitebox/evals/embedding_attack/softopt.py",
"path/to/your/file"
]
|
Hey, made the requested changes. @tomtseng feel free to look it over and lmk your thoughts |
There was a problem hiding this comment.
lgtm - didn't look too closely at the ported over files (only that we should mention this in the doc-strings / file headers)
I think it's good to merge once the tests pass
@psyonp - if the linting is a bit too messy, I can fix it up in a followup cosmetic PR
| for row in tqdm(dataset, total=len(dataset)): | ||
| prompt_text = row["Goal"] | ||
|
|
||
| inputs = tokenizer([prompt_text], return_tensors="pt", padding=True).to( |
There was a problem hiding this comment.
curious if we could do .cuda() here? I'm not fully sure - just want to make sure we are able to still support multi-GPU (if we use device = "auto")
^ not suggestion a change - just a question, perhaps leave a comment unless you have a definitive answer for this?
| device=model.device, | ||
| ) | ||
|
|
||
| parent_path, attr = self.eval_config.attack_layer.rsplit(".", 1) |
There was a problem hiding this comment.
transformer.h.0.mlp
nit: can we rename parent_path to something like parent_layer (path could imply it's a file path)
| @@ -0,0 +1,221 @@ | |||
| """Standard imports.""" | |||
There was a problem hiding this comment.
Could we add a link to the file where most of this code is sourced from?
| @@ -0,0 +1,207 @@ | |||
| """Hooks for the latent attack evaluation.""" | |||
There was a problem hiding this comment.
same here - can we add a link to where the code for this file is sourced from?
src/safetunebed/whitebox/attacks/latent_attack/latent_attack.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Saad Hossain <68381793+sdhossain@users.noreply.github.com>
Co-authored-by: Saad Hossain <68381793+sdhossain@users.noreply.github.com>
Co-authored-by: Saad Hossain <68381793+sdhossain@users.noreply.github.com>
Co-authored-by: Saad Hossain <68381793+sdhossain@users.noreply.github.com>
574b238 to
408b9bd
Compare
|
I rebased and resolved merge conflicts with |
Hey,
Feel free to leave comments on the overall structure of the code and don't hold back from making any suggestions. Next steps involve creating tests as needed.
Most comments you'll see in the supporting files come directly from the source code, and provide clarity into how the latent perturbation attack functions.
Thanks,
Punya