mars-steg

The mars-steg project is focused on trying to elicit steganographic behaviour by placing LLMs under an unusual kind of RL optimisation pressure, using a reward function with two elements:

A reward for successfully completing a task;
A penalty for using an aspect of language in the CoT, which is chosen to be critical to successful task performance.

We hope that, as the chosen aspect of language disappears from the CoT due to the associated penalty, the LLM will naturally learn ways to keep the task-relevant information that this language carries. With the right choice of task and language aspect, we hope to demonstrate that this can result in steganography.

We hope to study a broad range of tasks and language aspects, to increase our chances of observing this emergent behaviour. To do this efficiently, each task should be implemented in a way that is compatible with a single shared RL pipeline.

Features

🔥 Fast and lightweight
🔄 Supports multiple file formats
🔧 Customizable via settings

Installation

Prerequisites

See requirements.txt for dependencies.

Steps

git clone https://github.com/puria-radmard/mars-steg.git
cd mars-steg
pip install -r requirements.txt

Usage

Run the following command to run the training script:

run ./run_math.sh

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Commit changes (git commit -m "Add new feature").
Push to the branch (git push origin feature-branch).
Create a pull request.

License

🚧 In Construction

Acknowledgments

🚧 In Construction

Name		Name	Last commit message	Last commit date
Latest commit History 931 Commits
accelerate_config		accelerate_config
archive		archive
docker		docker
docs		docs
experiment_lora_cache		experiment_lora_cache
experiments		experiments
mars_steg		mars_steg
scripts		scripts
sweeps		sweeps
trained_models/model_at_step_9000		trained_models/model_at_step_9000
.gitignore		.gitignore
README.md		README.md
index.html		index.html
luis_test_script.sh		luis_test_script.sh
requirements.txt		requirements.txt
run_qmul.sh		run_qmul.sh
running.sh		running.sh
setup.py		setup.py
test.py		test.py
tmux_install.sh		tmux_install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mars-steg

Table of Contents

Features

Installation

Prerequisites

Steps

Usage

Contributing

License

Acknowledgments

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

MeridianResearch/mars-steg

Folders and files

Latest commit

History

Repository files navigation

mars-steg

Table of Contents

Features

Installation

Prerequisites

Steps

Usage

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages