LLMs for CSS

How to run testing?

Install ConvoKit

git clone https://github.com/CornellNLP/ConvoKit.git
cd ConvoKit
pip3 install -e .

Download the datasets and pre-process the datasets:

python data_loader.py -d power --save_dir ./css_data/wiki_corpus

Install dependencies

pip3 install -r requirements.txt

Add your OpenAI Key to your environment.
Usage:

python test_official_chat_css --model [MODEL_NAME_HERE] --dataset wiki_corpus

We evaluated the following models - but any model which can be loaded with HuggingFace AutoModelForSeq2SeqLM should work out of the box.

        choices=[
            "chatgpt",
            "google/flan-t5-small",
            "google/flan-t5-base",
            "google/flan-t5-large",
            "google/flan-t5-xl",
            "google/flan-t5-xxl",
            "google/flan-ul2",
            "text-davinci-001",
            "text-curie-001",
            "text-babbage-001",
            "text-ada-001",
            "text-davinci-002",
            "text-davinci-003",
        ],

File Roadmap

mappings.py - Configuration used for each dataset in the paper. Describes the type of dataset, how it should be processed from the raw format, and how the task should be formatted into a prompt from our prompting guidelines.

data_loader.py - Downloads and Converts Raw Datasets into the Seq2Seq format used by LLMs.

test_official_chat_css.py - Runs zero-shot LLM of choice - contains code for HuggingFace, ChatGPT API, and Traditional GPT API.

eval_significance.py - Computes Pairwise Bootstrap significance between the answer files of two models.

eval_agreement.py - Computes the Kappa between the LLM and the gold labels.

Citation

If you find this work useful, please cite it as follows!

@article{salt-2023-llms-for-css,
  title = {Can Large Language Models Transform Computational Social Science?},
  author = {Ziems, Caleb and Held, William and Shaikh, Omar and Chen, Jiaao and Zhang, Zhehao and Yang, Diyi},
  journal = {arXiv submission 4840038},
  year = {2023},
  month = apr,
}

Name		Name	Last commit message	Last commit date
Latest commit History 394 Commits
ConvoKit		ConvoKit
baselines		baselines
bleurt		bleurt
css_data		css_data
error_analysis		error_analysis
figures		figures
hit		hit
.gitignore		.gitignore
README.md		README.md
build_human_eval_hit_csvs.ipynb		build_human_eval_hit_csvs.ipynb
data_loader.py		data_loader.py
eval_agreement.py		eval_agreement.py
eval_generation.py		eval_generation.py
eval_human.py		eval_human.py
eval_significance.py		eval_significance.py
latex_prompt_exporter.py		latex_prompt_exporter.py
latex_tables.py		latex_tables.py
mappings.py		mappings.py
play.py		play.py
read_data.ipynb		read_data.ipynb
requirements.txt		requirements.txt
run_flan.sh		run_flan.sh
run_generation.sh		run_generation.sh
run_ul2.sh		run_ul2.sh
sanity_check_tables.sh		sanity_check_tables.sh
test_official_chat_css.py		test_official_chat_css.py
testcss_v2.py		testcss_v2.py
testreasoning.py		testreasoning.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMs for CSS

How to run testing?

File Roadmap

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

SALT-NLP/LLMs_for_CSS

Folders and files

Latest commit

History

Repository files navigation

LLMs for CSS

How to run testing?

File Roadmap

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages