Skip to content

oninvis/Algoverse_Mech_Interp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

602 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Revealing hidden biases by finding steering vectors for neutrality

This is a project as part of Algoverse, Summer 2025.

Completed logs from experiments can be found in experiments/past_logs Experimentation can be found in experiments/farhan_experimentation.ipynb or experiments/aryaman_experimentation.ipynb.

Directory structure

  • examples. Contains examples of how to use functions in src, on toy examples or datasets.
  • experiments. Contains scripts/noteboooks for running actual experiments.
  • logs. Contains logs of experiments.
  • src. Contains the source code.
  • BBQ_Prompt_Sets. Contains the BBQ bias datasets.

Setup instructions

  • conda create -n algo-neutrality python=3.12
  • conda activate algo-neutrality
  • pip install -e .

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors