Conversation
|
Will look today! |
…optuna based hparam sweeps
These internal instructions I think feel separate from a CONTRIBUTING.md which can be more public facing.
Doesn't change the rendered Markdown of the tables, but makes them easier to read as plaintext
tomtseng
left a comment
There was a problem hiding this comment.
read through the docs, looks good & thorough
These scripts should give a non-zero exit code when they fail, instead of printing an error message but exiting with exit code 0.
Fixing a few nits I have about the benchmark_grid script:
1. Duplicate line (lines 117 and 121)
attack_config_cls, attack_cls = ATTACKS_MAP[attack_name] # line 117
for config_name, attack_config_dict in whitebox_attack_config_grids[...]:
attack_config_cls, attack_cls = ATTACKS_MAP[attack_name] # line 121 - duplicate
Line 121 is redundant since the same unpacking already happens at line 117.
2. Missing required=True for --attacks argument (line 61-70)
If --attacks is not provided, args.attacks will be None, causing a crash at line 113 when iterating.
3. Redundant ATTACKS_MAP alias (line 42)
ATTACKS_MAP = ATTACKS_REGISTRY isn't necessary, just use ATTACKS_REGISTRY
4. Unnecessary global variable (lines 45-47)
whitebox_attack_config_grids is declared at module scope but only used within the if __name__ == "__main__" block.
5. Inconsistent argument naming style
Arguments mix underscore (--random-seed) and hyphen (--results_dir) styles. I've changed `--results_dir` to `--results-dir`.
- Class comment is wrong - setdefault doesn't need an if guard - making one error message slightly more informative
…hs.from_existing(). StudyPaths attribute order differed from StudyPaths.from_existing() argument order, let's keep them consistent.
tomtseng
left a comment
There was a problem hiding this comment.
OK I think this is about as much as I'm going to review this. Was skimming and skipping a lot of stuff and asking Claude Code to look at a bunch of files.
|
|
||
|
|
||
| def register_defense( | ||
| name: DefenseName, config_cls: type[H] |
There was a problem hiding this comment.
I think H is not necessary? could we just do type[AlignmentDefenseConfig] here and delete H?
There was a problem hiding this comment.
Weirdly, got type errors when I tried that, put it as a TODO to resolve.
src/safetunebed/whitebox/evals/embedding_attack/embedding_attack.py
Outdated
Show resolved
Hide resolved
| "sentencepiece>=0.2.0", | ||
| "tokenizers>=0.22.0", | ||
| "pytest>=8.4.2", | ||
| "flash-attention>=1.0.0", |
There was a problem hiding this comment.
the correct package for this is flash-attn rather than flash-attention. however it takes some work to install so I wouldn't try to switch this to flash-attn in this PR, i'd just delete flash-attention here.
I looked at how to install it for an hour in PR #55, but currently we don't actually use flash-attn anywhere except in GCG — which we're not using, and if we want to use it, we could turn off flash attention probably. So not a priority to fix. vllm I think already does some flash attention thing internally without needing to install flash-attn
Changes
This PR contains a number of changes, including:
infrarefactors: these have been kind of reviewed already, but not merged -- so will close those PRs in favour of this (those PRs were split from a previous commit of this - and a bit scattered), I've put TODOs for unresolved things. <--- could use some attention, but I think mostly looked at, so a skim is probably fineattacks(although a lot of these have already been reviewed in some other currently open PRs, and i've looked to address some of the comments over there here, so the code inwhitebox/attackshas already mostly been reviewed -- for unresolved items, I've put a TODO. <--- needs a bit of a skim, but I think most of the code here was approved so I think a skim is fine here tooevals-- I added MBPP, Minerva MATH, and IFEval. They loosely match metrics that I'd seen in the technical reports, but these were pretty quickly put together. <-- Some files are just copied over, <-- will leave at reviewer's discretion how much attention we want to give these. I don't believe we intend to use this for anything major, but if we do, perhaps we can have follow up review just for these?analysis--> lots of this code is just for plotting / filtering <--- probably can be skimmed, not something I think users will really use, as it is kind of a bit tailored to our usecase.benchmark--> These scripts are the entry points to the repo (how to use)docs--> These are the docs, which I'm hoping are sufficiently clear / comprehensive <--- probably needs to be quite high quality.Testing
Describe how you tested the changes in this PR. E.g., added tests, or ran
command
fooand checked the results looked good.