BioDataLab

This is the offical implement of "BioDataLab: Benchmarking LLM Agents on Real-World Biological Database Curation for Data-Driven Scientific Discovery". If you encounter any issues, please reach out to jiaxianyan@mail.ustc.edu.cn.

Introduction

Benchmark Overview

We introduce BioDataLab, a rigorous benchmark comprising 100 tasks meticulously derived from 57 high-impact database publications, covering 9 biological domains and 7 data modalities.

BioDataLab evaluates the capability of autonomous agents to transform raw, heterogeneous biological resources into structured, analysis-ready databases.

Tasks are classfied by their primary intention into 4 types: open-world data retrieval, structured data extraction, functional feature annotation, and data refinement and integration.

Benchmark Access

A overall summary of all 100 tasks are listed in BioDataLab.

First, download the necessary input from huggingface and unzip it as ./benchmark/dataset.

Then, the whole benchmark directory structure should look like this:

|-- BioDataLab/
|---- BioDataLab.csv
|---- benchmark/
|------ datasets/           # subdirectory for input data
|-------- ...
|------ verifiers/          # subdirectory for succsess rate verfiers 
|-------- ...
|------ verifiers_valid/    # subdirectory for valid rate verfiers 
|-------- ...
|------ tasks/              # subdirectory for detailed task description yaml files 
|-------- ...
|------ gold_programs/      # subdirectory for programs used to generate groundtruth results if applicable 
|-------- ...
|------ gold_results/       # subdirectory for groundtruth
|-------- ...

Quick Start

Install the envrironment

conda create -f environment.yml

LLMs API

Setting the LLM API_KEY and BASE_URL in assistant\llm.py.

API_KEY = ""
BASE_URL = ""

Basic Usage of BioDataLab

If you want to evaluate on one task, for example fusionneoantigen_annotate_2, you can run:

conda activate biomni_e1
python3 run_evaluate_case_biomni.py --task_yaml=benchmark/tasks/fusionneoantigen_annotate_2.yaml

If you want to evaluate all tasks, we also provided the scripts in the directory. For example, if you want to evaluate the gemini-3-flash-preview model:

conda activate biomni_e1
bash evaluate_bash_scripts\run_evaluate_batch_biomni_gemini-3-flash-preview.sh

Evaluation Results

Contributors

Student Contributors: Jiaxian Yan, Xi Fang, Chenmin Wu, Jintao Zhu, Yuhang Yang, Zaixi Zhang, Meijing Fang, and Chenxi Du

Supervisors: Qi Liu, Kai Zhang

Affiliation: State Key Laboratory of Cognitive Intelligence, USTC; Peking University; Princeton University; Zhejiang University, Tsinghua University

Contact

We welcome all forms of feedback! Please raise an issue for bugs, questions, or suggestions. This helps our team address common problems efficiently and builds a more productive community. If you encounter any issues, please reach out to jiaxianyan@mail.ustc.edu.cn.

License

This project is licensed under the terms of the MIT license. See LICENSE for additional details.

Citation

If you find our work helpful, please kindly cite:

@article {Yan2026biodatalab,
	author = {Yan, Jiaxian and Fang, Xi and Zhu, Jintao and Wu, Chenmin and Yang, Yuhang and Fang, Meijing and Du, Chenxi and Zhang, Kai and Zhang, Zaixi and Liu, Qi},
	title = {Benchmarking LLM Agents on Real-World Biological Database Curation for Data-Driven Scientific Discovery},
	year = {2026},
	journal = {underview}
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assistant		assistant
benchmark		benchmark
evaluate_bash_scripts		evaluate_bash_scripts
operation_env/SAAMBE-3D-master		operation_env/SAAMBE-3D-master
run_log		run_log
visualization		visualization
.gitignore		.gitignore
BioDataLab.csv		BioDataLab.csv
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
run_evaluate_batch_biomni.py		run_evaluate_batch_biomni.py
run_evaluate_batch_biomni_for_bash.py		run_evaluate_batch_biomni_for_bash.py
run_evaluate_case_biomni.py		run_evaluate_case_biomni.py
run_only_evaluate.py		run_only_evaluate.py
run_only_split_valid_evaluate.py		run_only_split_valid_evaluate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioDataLab

Introduction

Benchmark Overview

Benchmark Access

Quick Start

Install the envrironment

LLMs API

Basic Usage of BioDataLab

Evaluation Results

Contributors

Contact

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

jiaxianyan/BioDataLab

Folders and files

Latest commit

History

Repository files navigation

BioDataLab

Introduction

Benchmark Overview

Benchmark Access

Quick Start

Install the envrironment

LLMs API

Basic Usage of BioDataLab

Evaluation Results

Contributors

Contact

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages