Conversation
|
How does the diff work here? I know these files already exist on |
|
@tomtseng I created a new branch where I kind of did a fresh start, and sequentially added stuff. For this PR, I changed the base back to If we change the base branch back to |
|
oh I see, so the commit history has a new commit for each PR #33 #32 #35 which stack on top of each other, but also each shares a parent commit that deleted a whole bunch of files. So we either point the base branch to I think the way I'll make this more reviewable for myself is to locally make a version where I rebase to drop that mass-deletion commit, then the diff for each of the three remaining commits will match up to the diff I should review for each PR |
|
|
||
| @classmethod | ||
| def serialize_data(cls, data: dict[str, Any]) -> dict[str, Any]: # pyright: ignore[reportExplicitAny] | ||
| """Serialize data from a dictionary such that it can be used to construct nested objects. |
There was a problem hiding this comment.
I think this function comment (and function name) could be clearer about what this function does —
I think of serializing data as converting data to a printable, transmittable form (often a string) but the implementation is looking more like it's moreso unserializing the nested item model_config, converting from a dict into its proper class. I guess I'm not sure that counts as unserialization but it doesn't fit my mental model of what serialization is
There was a problem hiding this comment.
Ah - agreed (and your understanding is correct) - was a poor choice of names (as you mentioned, I kind of used it just to get nested dictionaries converted to dataclasses).
Will look to change the name and elaborate on documentation - perhaps decode_dicts_to_dataclasses <-- or decode_to_dataclasses is a more appropriate name.
|
|
||
| H = TypeVar(name="H", bound="FullParameterFinetuneConfig") | ||
|
|
||
| DATASET_SIZE = 64 |
There was a problem hiding this comment.
should this be configurable rather than a constant?
or document where this came from, I recall you chose this from a paper
| processing_class=tokenizer, | ||
| train_dataset=ds, | ||
| args=training_arguments, | ||
| p = multiprocessing.Process( |
There was a problem hiding this comment.
hmm with we're going to be using this pattern a lot where we wrap things into a separate process in order to address memory leaks, we should document it somehow.
e.g., make it a separate function to make it clearer that the same thing is happening everywhere:
def run_with_isolation(target, args):
"""Runs a function in a separate process to avoid memory leaks."""| dealloc_model_and_tokenizer(model, tokenizer) | ||
| Returns: | ||
| datasets.Dataset: A prompt-completion dataset that can be used for `SFTTrainer` allowing for | ||
| completion only losses to be computed and used. Data points must be in the following format: |
There was a problem hiding this comment.
| completion only losses to be computed and used. Data points must be in the following format: | |
| completion only losses to be computed and used. Data points will be in the following format: |
| tokenizer.save_pretrained(save_directory=self.output_checkpoint_path) | ||
|
|
||
| trainer.accelerator.free_memory() | ||
| model.resize_token_embeddings(new_num_tokens=len(tokenizer)) |
There was a problem hiding this comment.
to clarify: this is because in the above if statement we may expand the number of tokens?
| name: AttackName = AttackName.BENIGN_LORA_FINETUNE | ||
|
|
||
| @override | ||
| def load_prompt_completions_dataset(self) -> datasets.Dataset: |
There was a problem hiding this comment.
code looks very similar to full_finetune.py, I wonder if we can consolidate somehow
| return completions_dataset | ||
|
|
||
|
|
||
| def run_lora_attack( |
There was a problem hiding this comment.
also has some similarities to run_full_finetune_attack, i wonder if we can consolidate
| training_arguments, | ||
| prompt_completions_dataset, | ||
| self.output_checkpoint_path, | ||
| ), |
There was a problem hiding this comment.
suppose this child process that perform training crashes, will the parent also crash and give a non-zero exit code? i think we would want the parent to crash but i'm not actually sure what happens with multiprocessing
| # USER_PREFIX = "[INST] " # note the trailing space | ||
| # ASSISTANT_PREFIX = " [/INST] " # note the leading & trailing spaces | ||
| # END_TURN = "</s>" # or: tokenizer.eos_token | ||
| BENIGN_DATASET_SIZE = 128 |
There was a problem hiding this comment.
document where this came from, i think it's not intuitive why it's different from the DATASET_SIZE for full_finetune.py. also it's named differently (BENIGN_DATASET_SIZE vs. DATASET_SIZE), not sure if that's intentional
|
|
||
| def to_completions(data_point: list[dict[str, str]]) -> dict[str, str]: | ||
| sample = {} | ||
| for message in data_point["messages"]: |
There was a problem hiding this comment.
data_point is typed as a list but we're indexing into it like its a dict here
| from safetunebed.whitebox.utils.names import AttackName | ||
| from safetunebed.whitebox.utils.ops.dealloc import dealloc_model_and_tokenizer | ||
|
|
||
| DATASET_SIZE = 300 |
There was a problem hiding this comment.
is this the number used in the multilingual fine-tuning paper? again am wondering if we should document why we chose this number differently from the full_finetune.py DATASET_SIZE
|
Closing as we merged in #53 |
Changes
Refactored and added the fine-tuning attacks outside of jailbreak tuning, including fine-tune, lora and multilingual. This was primarily using a newer version of
sftlibrary and also using theDataCollatorForCompletionOnlyLMfunctionality - by casting datasets more easily.Testing
Ran runs for ICLR.