AutoLibra is designed to facilitate the evaluation of agents through metrics derived from human feedback. This document outlines the steps for contributors to prepare data, annotate it, and run experiments.
Install git lfs if you haven't already. This is required to download the large files in the dataset.
For contributors, it is the best to use our shared data repo on Hugging Face: open-social-world/autolibra. Upload new datasets to this shared repo.
# Download and preprocess <dataset>
uv run python -m autolibra_core.datasets.<dataset>git clone https://huggingface.co/datasets/open-social-world/autolibra .data# cd into .data
# git add your data
# git commit -m "Add <dataset>"
git pushuv run python src/tty/tty_annotation.py .data/webarena .data/annotations/webarena --annotator-id <your name>uv run streamlit run src/tty/tty_annotation.py .data/sotopia .data/annotations/sotopia -- --annotator-id <your name> --use-streamlitstreamlit run src/tty/view_annotations.py -- .data/annotations/sotopia/annotationsuv run python -m autolibra_core.gen_eval.generatorTest environments (BALROG, etc) are included as submodules under .gitmodules. Documentation for using these environments are included within each environment repo.