Reproducing experiments from this paper. Code is organized with configs in conf/, experiment functions in src/harm_refuse/experiments/, and an entry point at run.py.
- Set up environment with
uv sync - Hugging Face login for gated models/datasets:
huggingface-cli login - Run experiment:
uv run run.py -cn [EXPERIMENT]
- Create and activate a Python 3.12 virtual env.
- Install this project and deps:
pip install -e . - (Optional)
huggingface-cli login - Run:
python run.py -cn [EXPERIMENT]
- Large models can be memory-heavy; use
smolto validate setup.