AutoLibra ⚖️ Metric Induction for Agents from Open-Ended Human Feedback

Introduction

AutoLibra is designed to facilitate the evaluation of agents through metrics derived from human feedback. This document outlines the steps for contributors to prepare data, annotate it, and run experiments.

Contributor doc

Prepare the data

Install git lfs if you haven't already. This is required to download the large files in the dataset.

From scratch

For contributors, it is the best to use our shared data repo on Hugging Face: open-social-world/autolibra. Upload new datasets to this shared repo.

# Download and preprocess <dataset>
uv run python -m autolibra_core.datasets.<dataset>

Download from huggingface

git clone https://huggingface.co/datasets/open-social-world/autolibra .data

Upload your data to huggingface

# cd into .data
# git add your data
# git commit -m "Add <dataset>"
git push

Annotation

uv run python src/tty/tty_annotation.py .data/webarena .data/annotations/webarena --annotator-id <your name>

Annotation Web Interface with Streamlit

uv run streamlit run src/tty/tty_annotation.py .data/sotopia .data/annotations/sotopia -- --annotator-id <your name> --use-streamlit

View Annotations with Streamlit

streamlit run src/tty/view_annotations.py -- .data/annotations/sotopia/annotations

To run metric extraction

uv run python -m autolibra_core.gen_eval.generator

Run experiments

Test environments (BALROG, etc) are included as submodules under .gitmodules. Documentation for using these environments are included within each environment repo.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
packages		packages
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AutoLibra ⚖️ Metric Induction for Agents from Open-Ended Human Feedback

Introduction

Contributor doc

Prepare the data

From scratch

Download from huggingface

Upload your data to huggingface

Annotation

Annotation Web Interface with Streamlit

View Annotations with Streamlit

To run metric extraction

Run experiments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

Open-Social-World/autolibra

Folders and files

Latest commit

History

Repository files navigation

AutoLibra ⚖️ Metric Induction for Agents from Open-Ended Human Feedback

Introduction

Contributor doc

Prepare the data

From scratch

Download from huggingface

Upload your data to huggingface

Annotation

Annotation Web Interface with Streamlit

View Annotations with Streamlit

To run metric extraction

Run experiments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages