Skip to content

WarmBuns/LLMEval

Repository files navigation

LLMEval

Setup Instructions

  1. Download and install Ollama: https://ollama.com/
  2. Make sure Ollama is running when you run these models
    • The code will handle downloading the models for you, test using something small like SMOLLM2!!
  3. Make sure Ollama is running when you run the code
  4. Setup your virtual environment according to instructions below

Python version: 3.11.9, you can use pyenv to manage your local python installs

Linux/macOS

  1. Create a virtual environment:
python -m venv venv
  1. Activate the virtual environment:
source venv/bin/activate
  1. Install requirements:
pip install -r requirements.txt

Windows

  1. Create a virtual environment:
python -m venv venv
  1. Activate the virtual environment:
.\venv\Scripts\activate
  1. Install requirements:
pip install -r requirements.txt

D3

Downloading Data

For SAM-Sum:

curl -L -O https://huggingface.co/datasets/Samsung/samsum/resolve/main/data/corpus.7z 
7z x corpus.7z

For Webis:

for i in {0..9};
    do curl -L -O https://huggingface.co/datasets/webis/tldr-17/resolve/refs%2Fconvert%2Fparquet/default/partial-train/000$i.parquet; 
done

EDA

  • samsum/test.json (Source)

    • 819 instances
    • 3 fields: id, summary, dialogue
    • dialogue ranges from 3-30 utterances (newline-separated)
  • webis/data.json (Source)

    • 3,848,330 instances in full
    • We can consider using the partial-train branch data (9 parquets) -- see download instructions above
    • relevant fields: id, content, summary, subreddit

A single parquet of the Webis data is massive compared to SAM-Sum test split. Another repo includes analysis on duplicate rows and removing problematic rows, which we may want to do as well. We may also want to remove noisy/graphic data from certain subreddits, etc. (maybe only keep TrueReddit, or AskReddit?)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •