Fluent dreaming for language models.

The code here implements the discrete prompt optimization algorithms in the paper "Fluent student-teacher redteaming".

Please also see the companion page that demonstrates using the code here.

The demo.ipynb file here is the source for that companion page.

Key modules:

flrt.attack: The main attack entrypoint including the AttackConfig object.
flrt.victim: Code for managing attack "victims" - the model that will be forced to misbehave.
flrt.templates: Attack templates specifying which subset of the prompt can be optimized by the discrete optimization.
flrt.util: Tools for loading models and tokenizers and generating.

The remaining code is either internal to the algorithm (flrt.objective, flrt.operators) or is scaffolding for running on Modal (flrt.modal_defs, flrt.modal_download) or running evaluations (flrt.judge).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
flrt		flrt
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fluent dreaming for language models.

About

Uh oh!

Releases

Packages

Languages

License

Confirm-Solutions/flrt

Folders and files

Latest commit

History

Repository files navigation

Fluent dreaming for language models.

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages