Experiments with representation engineering. There's been a bunch of recent work (1, 2, 3) into using a neural network's latent representations to control & interpret models.
This repository contains utilities for running experiments (the repeng package) and a bunch of experiments (the notebooks in experiments).
git clone https://github.com/mishajw/repeng
cd repeng
pip install -e .
# Or if using poetry:
poetry install- Install the repository, as described above.
- Optional: Check out
c99e9aa. This shouldn't be necessary, unless I introduce breaking changes. - Create a dataset of activations:
python experiments/comparison_dataset.py.- This will upload the experiments to S3. Some tinkering may be required to change the upload location - sorry about that!
- Run the analysis:
python experiments/comparison.py.- This will write plots to
./output/comparison.
- This will write plots to
This is split into two scripts as only the first requires a GPU for LLM inference.