Revealing hidden biases by finding steering vectors for neutrality
This is a project as part of Algoverse, Summer 2025.
Completed logs from experiments can be found in experiments/past_logs Experimentation can be found in experiments/farhan_experimentation.ipynb or experiments/aryaman_experimentation.ipynb.
examples. Contains examples of how to use functions in src, on toy examples or datasets.experiments. Contains scripts/noteboooks for running actual experiments.logs. Contains logs of experiments.src. Contains the source code.BBQ_Prompt_Sets. Contains the BBQ bias datasets.
conda create -n algo-neutrality python=3.12conda activate algo-neutralitypip install -e .