Skip to content

fix: optim_string_condense#20

Open
pipiPdesu wants to merge 1 commit intoGraySwanAI:mainfrom
pipiPdesu:fix_concatenating_letters
Open

fix: optim_string_condense#20
pipiPdesu wants to merge 1 commit intoGraySwanAI:mainfrom
pipiPdesu:fix_concatenating_letters

Conversation

@pipiPdesu
Copy link
Contributor

Referring to issue #15, problem 2, I simulate the user's behavior by interspersing spaces between the letters and systematically removing the characters '!', '?', and '.' from the INIT_CHAR.

It should work well on Llama3 tokenizer and InternLM2 tokenizer, which previously encountered issues (I have not yet tested other models). I suspect that under these circumstances, the filter_ids parameter might be redundant. Even if we successfully optimize a suffix, it seems infeasible to identify a string that can be tokenized as the suffix's token(unless directly pass input_embeds). It might be beneficial to enhance the warning messages to guide users on reporting model incompatibilities or suggest modifications to the INIT_CHAR to resolve tokenization problems.

Let discuss it here and figure it out:)

Copy link
Collaborator

@justinwangx justinwangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a few things i'm worried about with this approach, which are:

  • it's not a guarantee that each string in init_str will tokenize to the same length
  • even if all the strings in init_str tokenize to the same length, i don't think it's necessarily the case that this length will be the intended sequence length

in the prior approach, indexing into the ids after tokenization gives us this guarantee. i think it would be rare for a space-separated sequence of n characters to tokenize to m != n tokens, but i'm not sure that this won't ever occur with some tokenizers.

does using the prior approach but leaving !', '?', and '.' removed still fix our problem?

r.e. identifying strings that are tokenized as the proper suffix, if filter_ids=True, then this should be guaranteed, but, yeah, if not then the string could be tokenized differently

@justinwangx
Copy link
Collaborator

i do think it could be good courtesy if filter_ids=False to check if the best string at the end of the run actually tokenizes to the best optimized ids (and report to the user if this isn't the case) -- what do you think of that?

@pipiPdesu
Copy link
Contributor Author

Hello, I understand your concern. I have kept the original method and tried to add spaces before and after the letters in INIT_CHAR. This works for some tokenizers (essentially changing from "!" to " !", making it a new token), but for others, the space becomes an additional token. So perhaps we can add a note [here](

"Consider setting `filter_ids=False` or trying a different `optim_str_init`"
) suggesting that users can solve the problem by adding spaces or deleting some letters. Also, we will recheck the best string when filter_ids=False.

@justinwangx
Copy link
Collaborator

sorry, i mean -- it seems like the characters that get condensed are '!', '?', and '.' (with emphasis on '!'). if we simply remove these from being used in the initialization (without adding any spaces or thinking about spaces), do the strings generally work fine without getting condensed?

@pipiPdesu
Copy link
Contributor Author

Make the letters of the string condensed not only '!', '?', and '.', depending on the tokenizer. For example, as long as the tokenizer encodes "!" and " !" into two separate tokens and the tokenizer also contains other tokens with "!", such as "?!", this issue will occur. We need to find a better generation method based on the tokenizer.

Moreover, this characteristic of the tokenizer, along with the implementation of NanoGCG, will cause some more serious problems. This issue arises because we embed the entire input in four separate parts rather than embedding the whole sentence together. This causes the embeddings at the junctions of each part to potentially differ from the embeddings of the entire sentence. Here are two examples that may cause problems:

  1. Taking the llama2 tokenizer as an example, at the junction of after_str and target, the embedding results for [\INST],[\INST]Sure, [\INST] Sure, Sure, [space]Sure are
    image

Although we provided the add_space_before_target parameter, the embedding results are different whether a space is added or not, when embedding separately versus embedding together.

This issue varies depending on the tokenizer. Using internlm as an example, the ideal situation is like this:

image

  1. Taking the internlm tokenizer as an example, at the junction of before_str and optim_str, optim_str may collapse together with before_str. Suppose before_str ends with ? and optim_str starts with !, they will collapse together just like during buffer initialization.

These issues, like the buffer problem, can be summarized as inconsistencies between direct embeddings and separate embeddings. I am exploring whether this issue affects the overall algorithm.

It might be a bit confusing to discuss here (◎_◎;). Can we open a related issue? If you have any questions or want to see more examples, feel free to discuss with me!

@justinwangx
Copy link
Collaborator

This is indeed an issue... if this occurs frequently, then I think the best fix would be a refactor to using the slices approach that the original repo uses. We adapted this approach from HarmBench, which means that the HarmBench implementation would also suffer from the same problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants