Skip to content

attacks + evals: minor refactor for strongreject and embedding (review not important rn)#35

Closed
sdhossain wants to merge 3 commits intomainfrom
sh/td_sr_embedding
Closed

attacks + evals: minor refactor for strongreject and embedding (review not important rn)#35
sdhossain wants to merge 3 commits intomainfrom
sh/td_sr_embedding

Conversation

@sdhossain
Copy link
Collaborator

Changes

Small refactor for strongreject and embedding attack but nothing logical changed, so probably no need for review

Testing

ICLR submission.

@sdhossain sdhossain changed the base branch from main to sh/td_templates October 7, 2025 19:33
@tomtseng
Copy link
Collaborator

Same Q as #33: How does the diff work here? I know these files already exist on main, so why is the diff showing these files as entirely new files rather than showing lines added/removed? I think seeing that diff would make this easier to review

@sdhossain sdhossain changed the base branch from sh/td_templates to main November 2, 2025 15:33
@sdhossain sdhossain changed the base branch from main to sh/td_templates November 2, 2025 15:37
@sdhossain sdhossain changed the base branch from sh/td_templates to main November 4, 2025 19:58
def from_dict(cls, data: dict[str, Any]) -> Self: # pyright: ignore[reportExplicitAny]
"""Construct config from a dictionary.
def serialize_data(cls, data: dict[str, Any]) -> dict[str, Any]: # pyright: ignore[reportExplicitAny]
"""Serialize data from a dictionary such that it can be used to construct nested objects.
Copy link
Collaborator

@tomtseng tomtseng Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as in https://github.com/sdhossain/SafeTuneBed/pull/33/files#r2496773669, I think this function name/comment doesn't sufficiently describe what this class does.

though this does make me wonder if we even need a proper function comment here or whether we just write """See parent class.""" or something since the behavior doesn't substantially change — not sure what best practices here are!

Comment on lines +65 to +71
q = multiprocessing.Queue()
p = multiprocessing.Process(
target=_instantiate_model_and_infer, args=(self.eval_config, jbb_dataset, q)
)
p.start()
_inferences_dataframe = q.get() # blocks until child puts result
p.join()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of multiprocessing here? I'm not familiar with the library but it looks like this is putting one item in the queue and blocking until it completes — needs a comment explaining why we're doing this rather than just doing the evaluation in the same process

same q for multiprocessing in strongREJECT

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose for multiprocessing is just to isolate the run (so that there is no chance for any memory leak, etc. that could crash subsequent processes).

Sometimes when random bugs happen, and memory hangs, even if I try generic methods to clear cache, etc. they don't work and I have to wrap it into a process.

Since we're running / orchestrating sweeps through python, used this approach, which is perhaps a bit hacky and am very open to suggestions.

StrongREJECT as well, often times there are memory leaks which crash the whole sweep.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof ok, in that case I would leave a comment and a TODO but yeah this seems like a pragmatic solution if there's a mysterious memory leak that we can't easily track down


model = model.to(config.device)
user_prefix = "[INST] "
assistant_prefix = " [/INST] "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am curious whether it's appropriate to make an individual evaluation pick the default user & assistant prefix, I haven't really thought about it. (In this case these prefixes look like the llama ones.)
is there a particular motivation for why SoftOpt prefers these default prefixes (e.g., this is what the paper does)?
or maybe the function could take the model_config as an arg and read the prefixes from there?

target_ids = tokenizer([target], add_special_tokens=False)["input_ids"]

embedding_layer = model.get_input_embeddings()
first_device = embedding_layer.weight.device # this is where inputs must live
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does model.device not always work? and why would the device be inconsistent with config.device, like why did we move away from config.device?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it was because it was failing on muti-gpu (as config.device only allowed one device)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see, like if we specify config.device as something like cuda:0 then that's just not correct for multi-gpu since e.g., if we are using two GPUs then one process should have cuda:0 and the other should have cuda:1?

hmm yeah this makes me wonder if we should get rid of config.device and do something different in order to clean this up, I'm not actually sure what best practices here are though

optim_ids = tokenizer(
config.optim_str_init, return_tensors="pt", add_special_tokens=False
)["input_ids"].to(config.device)
)["input_ids"].cuda()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not .to(first_device)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handles multi-gpu better.

# USER_PREFIX = "[INST] " # note the trailing space
# ASSISTANT_PREFIX = " [/INST] " # note the leading & trailing spaces
# END_TURN = "</s>" # or: tokenizer.eos_token
multiprocessing.set_start_method("spawn", force=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like it has a global effect, could lead to unexpected results if we use multiprocessing elsewhere and forget that we've configured things here, so am wondering if there are alternative non-global options here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.. I've put this as a TODO as am not fully sure as of yet.

@sdhossain
Copy link
Collaborator Author

Replying to comments here, as I look to close this, and have the 1 mega PR. Putting TODOs for things that aren't necessarily resolved.

@sdhossain
Copy link
Collaborator Author

Closing as we merged in #53

@sdhossain sdhossain closed this Jan 22, 2026
@sdhossain sdhossain deleted the sh/td_sr_embedding branch February 5, 2026 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants