Sanity Checks for Self-Preference in LLM Evaluators

Installation

# Clone the repository
git clone https://github.com/username/self-preference-llm.git
cd self-preference-llm

# Create a virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install matplotlib openai seaborn numpy scipy

Project results

To see the results for verifiable datasets go to judge_swap_null_verif_smoke2 and go to analysis folder in each dataset

to see the results for Quality go to judge_swap_null_author_obfuscation/quality/analysis

To see the results for the chain of thought experiments go to judge_swap_null_verif_cot and go to the analysis folder in each dataset

to see the results for the DBG score datasets go to judge_swap_null_dbg_results and go to the analysis folder in each dataset

to see the results for CNN and XSUM go to CNN_and_XSUM results/cnn_results/cnn/analysis for cnn and CNN_and_XSUM results/xsum_result/xsum/analysis for xsum.

for entropy results, see per_reference_entropy.json

For the python files, check which arguments are required. to run

to make the proxy_robustness figure from the main figure run analyze_proxy_robustness.py, proxy_robustness_plot1.py, and proxy_robustness_plot2.py

to make the scatter plots for harmful self-preference vs task accuracy for before and after the evaluator quality baseline, run analyze_judge_self_preference_scatter.py (for cnn and xsum run analyze_judge_self_preference_scatter_cnn_xsum.py)

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
CNN_and_XSUM results		CNN_and_XSUM results
author_obfuscation		author_obfuscation
dbg-score-paper		dbg-score-paper
entropy_gap_per_reference_scatter_plots		entropy_gap_per_reference_scatter_plots
entropy_gap_statistics		entropy_gap_statistics
jsd_judge_swap_null_verif_smoke2		jsd_judge_swap_null_verif_smoke2
judge_swap_null_author_obfuscation		judge_swap_null_author_obfuscation
judge_swap_null_dbg_results		judge_swap_null_dbg_results
judge_swap_null_verif_cot		judge_swap_null_verif_cot
judge_swap_null_verif_smoke2		judge_swap_null_verif_smoke2
llm-repro-cot-verif		llm-repro-cot-verif
llm-sp-reprod-cot		llm-sp-reprod-cot
llm-sp-reprod		llm-sp-reprod
llm-sp-verif		llm-sp-verif
llm-sp		llm-sp
reproduction_results-cot		reproduction_results-cot
reproduction_results		reproduction_results
scatter_plots		scatter_plots
scatter_plots_cnn		scatter_plots_cnn
scatter_plots_xsum		scatter_plots_xsum
spread_statistics_analysis		spread_statistics_analysis
test1		test1
test_cnn_xsum_output		test_cnn_xsum_output
test_jvr_kvr_cnn_xsum		test_jvr_kvr_cnn_xsum
.gitignore		.gitignore
README.md		README.md
analyze_individual_recognition.py		analyze_individual_recognition.py
analyze_judge_self_preference_scatter.py		analyze_judge_self_preference_scatter.py
analyze_judge_self_preference_scatter_cnn_xsum.py		analyze_judge_self_preference_scatter_cnn_xsum.py
analyze_judge_swap_results.py		analyze_judge_swap_results.py
analyze_judge_swap_results_diffmean.py		analyze_judge_swap_results_diffmean.py
analyze_judge_swap_results_diffmean_closest_proxy.py		analyze_judge_swap_results_diffmean_closest_proxy.py
analyze_jvr_kvr_scatter.py		analyze_jvr_kvr_scatter.py
analyze_proxy_robustness.py		analyze_proxy_robustness.py
analyze_reproduction_self_preference.py		analyze_reproduction_self_preference.py
analyze_same_error_ilsp.py		analyze_same_error_ilsp.py
analyze_self-rec_null_dbg.py		analyze_self-rec_null_dbg.py
analyze_spread_statistics.py		analyze_spread_statistics.py
batch_reproduction.py		batch_reproduction.py
cache_activations.py		cache_activations.py
calculate_per_reference_entropy.py		calculate_per_reference_entropy.py
compute_entropy_gap_correlations.py		compute_entropy_gap_correlations.py
create_entropy_gap_per_reference_scatter.py		create_entropy_gap_per_reference_scatter.py
create_entropy_gap_per_reference_unified.py		create_entropy_gap_per_reference_unified.py
create_entropy_gap_scatter.py		create_entropy_gap_scatter.py
create_entropy_gap_tables.py		create_entropy_gap_tables.py
debug_reproduction_quality.py		debug_reproduction_quality.py
evaluate_mbpp_solutions.py		evaluate_mbpp_solutions.py
filter_same_error_ilsp.py		filter_same_error_ilsp.py
generate_judge_swap_tables.py		generate_judge_swap_tables.py
per_reference_entropy.json		per_reference_entropy.json
plot_nilsp_distribution.py		plot_nilsp_distribution.py
proxy_robustness_plot1.py		proxy_robustness_plot1.py
proxy_robustness_plot2.py		proxy_robustness_plot2.py
reproduce_paper_experiments.py		reproduce_paper_experiments.py
reproduction_analytics_utils.py		reproduction_analytics_utils.py
requirements.txt		requirements.txt
run_debug_reproduction.sh		run_debug_reproduction.sh
run_judge_swap_null_author_local.sh		run_judge_swap_null_author_local.sh
run_judge_swap_null_author_obfuscation.py		run_judge_swap_null_author_obfuscation.py
run_judge_swap_null_author_obfuscation.sh		run_judge_swap_null_author_obfuscation.sh
run_judge_swap_null_dbg.py		run_judge_swap_null_dbg.py
run_judge_swap_null_dbg.sh		run_judge_swap_null_dbg.sh
run_judge_swap_null_test.py		run_judge_swap_null_test.py
run_judge_swap_null_verif.py		run_judge_swap_null_verif.py
run_reproduction_experiment.sh		run_reproduction_experiment.sh
run_self-rec_null_dbg.py		run_self-rec_null_dbg.py
run_self-rec_null_dbg.sh		run_self-rec_null_dbg.sh
train_probe.py		train_probe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sanity Checks for Self-Preference in LLM Evaluators

Installation

Project results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sanity Checks for Self-Preference in LLM Evaluators

Installation

Project results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages