Introspective-Reasoning

Use classifier + intervention module + etc. to analyze and improve model reasoning...

0. Set up honest_llama and introspective-reasoning

git clone git@github.com:likenneth/honest_llama.git
git clone git@github.com:YourSaDady/Introspective_Reasoning.git
chmod +x ./Introspective_Reasoning/setup.sh
./Introspective_Reasoning/setup.sh
cd honest_llama
chmod +x ./on-the-fly_probe_training.sh
chmod +x ./probes_comparable_training.sh

1. Set up the ITI environment

Follow the installation guide in https://github.com/likenneth/honest_llama

2. Build classifiers

Sample data splits from MATH-Shepherd: python ./get_activations/prepare_math_shepherd.py
Build hidden_states-label pairs for each data split: ./on-the-fly_probe_training.sh
Train and evaluate classifiers: ./probes_comparable_taining.sh

4. Build intervention modules

Build hs-hs'-label pairs (hidden states before and after intervention and the step label) for each data split: python ./get_activations/build_intervention_datsets.py --split_num {n}, with split index n ranging from 1 to 10
Datasets for training the intervention module / PEFT is on the HuggingFace: Lo-Fi-gahara/intervene_{n}k
Train the intervention module: python ./probes_training/probe_training.py --split_num {n} --probe_type intervention_module

5. Analysis

FactCheckMate-style analysis:
- probes online training (and analysis): python ./probing_survey/analysis.py --mode specified --online_training True
- analysis in freezed mode (use probe trained on a single type of hiddens, say I+O, to analyze different types of hiddens): python ./probing_survey/analysis.py --mode freezed --probe_path {...}
- analysis in specified mode (use probes trained from different hiddens types to analyze the corresponding hiddens types): python ./probing_survey/analysis.py --mode specified (specify the probes' paths in the code)

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
attention_sink		attention_sink
evaluate		evaluate
figs		figs
get_activations		get_activations
probe_training		probe_training
probing_survey		probing_survey
trained_probes		trained_probes
README.md		README.md
interveners.py		interveners.py
probes.py		probes.py
setup.sh		setup.sh
templates.py		templates.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introspective-Reasoning

0. Set up honest_llama and introspective-reasoning

1. Set up the ITI environment

2. Build classifiers

4. Build intervention modules

5. Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Languages

YourSaDady/Introspective_Reasoning

Folders and files

Latest commit

History

Repository files navigation

Introspective-Reasoning

0. Set up honest_llama and introspective-reasoning

1. Set up the ITI environment

2. Build classifiers

4. Build intervention modules

5. Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages