A generic token-optimization solver designed for "hacking" models using Gradient Coordinate Gradient (GCG) strategies.
- Install Python 3.13 (managed automatically by
uv). - Clone the repo and install dependencies:
uv sync
- Run scripts through
uv run …to use the project environment.
An example is provided in example.py. It demonstrates how to optimize a prefix to satisfy a specific objective (in this case, a "condense" task).
Run it with:
uv run example.pyThis will:
- Load the configuration from
configs/gcg/test.json. - Load the target data from
data/condense/test.json. - Optimize a prefix using the GCG strategy.
- Save results to
outputs/condense_program.
The solver optimizes a set of variables to improve a loss built from compositions (weighted cross-entropy objectives).
- Define Data: A JSON file containing the text targets and constraints (see
data/condense/test.json). - Build a Spec: A Python function that translates the data into a
ProgramSpecTemplate. This defines the variables (e.g., the prefix) and the objective function (e.g., maximize cross-entropy). - Run: The
run_genericfunction handles the optimization loop, distributing work across GPUs if available.
The ProgramSpecTemplate defines the optimization problem using symbols and compositions.
First, define the parts of your sequence. Symbols can be Fixed (text/ids) or Variable (optimized tokens).
s = Symbols({
"prefix": Variable(20), # 20 learnable tokens
"prompt": Fixed("Tell me a joke: "), # Constant text
"target": Fixed("Why did the..."), # Constant text
})Objectives are built using helper functions that represent cross-entropy terms.
xent(seq, ctx): Minimize Cross-Entropy ofseqgivenctx. (Standard language modeling loss).nex(seq, ctx): Maximize Cross-Entropy (Negative XEnt). Used to make text less likely.xed(seq, ctx): Difference betweenxent(seq)(prior) andxent(seq | ctx). Maximizing this maximizes the Mutual Information or "condensing" power.dex(seq, ctx): The reverse ofxed(difference between conditional and prior).
Examples:
Standard "Jailbreak" (Maximize probability of target given prompt+prefix):
# We want to minimize Loss(target | prompt + prefix)
# Which is equivalent to Maximizing nex(target, prompt + prefix)
objective = nex(s["target"], s["prompt"] + s["prefix"])Condense (Find a prefix that summarizes the text):
# Maximize Mutual Information: xent(text) - xent(text | prefix)
# This uses the 'xed' helper which returns [xent(text), nex(text, prefix)]
objective = xed(s["text"], s["prefix"])You can combine multiple terms linearly.
# Minimize loss on target1 but also keep target2 unlikely
objective = nex(s["target1"], s["prefix"]) + 0.5 * xent(s["target2"], s["prefix"])
return ProgramSpecTemplate(objective=objective, goal="maximize")The optimization behavior is controlled by JSON config files (e.g., configs/gcg/test.json). Key parameters include:
model_names: List of models to optimize against.top_k: Number of candidate tokens to consider per position based on gradient information.batch_size: (Implicitly handled) affects memory usage and search quality.selection: Parameters for the selection strategy (e.g., epsilon-greedy).
src/: Core solver logic.dsl/: Domain Specific Language for defining programs (Spec, Symbols, Builder).engine/: Optimization engine and task management.strategies/: Optimization algorithms (e.g. GCG).data/: Data loading and constraint policies.utils/: Utilities for models, logging, and hardware.