Skip to content

Benchmarking LLM Agents on Real-World Biological Database Curation for Data-Driven Scientific Discovery

License

Notifications You must be signed in to change notification settings

jiaxianyan/BioDataLab

Repository files navigation

BioDataLab

This is the offical implement of "BioDataLab: Benchmarking LLM Agents on Real-World Biological Database Curation for Data-Driven Scientific Discovery". If you encounter any issues, please reach out to jiaxianyan@mail.ustc.edu.cn.

Introduction

logo

Benchmark Overview

We introduce BioDataLab, a rigorous benchmark comprising 100 tasks meticulously derived from 57 high-impact database publications, covering 9 biological domains and 7 data modalities.

BioDataLab evaluates the capability of autonomous agents to transform raw, heterogeneous biological resources into structured, analysis-ready databases.

Tasks are classfied by their primary intention into 4 types: open-world data retrieval, structured data extraction, functional feature annotation, and data refinement and integration.

logo

Benchmark Access

A overall summary of all 100 tasks are listed in BioDataLab.

First, download the necessary input from huggingface and unzip it as ./benchmark/dataset.

Then, the whole benchmark directory structure should look like this:

|-- BioDataLab/
|---- BioDataLab.csv
|---- benchmark/
|------ datasets/           # subdirectory for input data
|-------- ...
|------ verifiers/          # subdirectory for succsess rate verfiers 
|-------- ...
|------ verifiers_valid/    # subdirectory for valid rate verfiers 
|-------- ...
|------ tasks/              # subdirectory for detailed task description yaml files 
|-------- ...
|------ gold_programs/      # subdirectory for programs used to generate groundtruth results if applicable 
|-------- ...
|------ gold_results/       # subdirectory for groundtruth
|-------- ...

Quick Start

Install the envrironment

conda create -f environment.yml

LLMs API

Setting the LLM API_KEY and BASE_URL in assistant\llm.py.

API_KEY = ""
BASE_URL = ""

Basic Usage of BioDataLab

If you want to evaluate on one task, for example fusionneoantigen_annotate_2, you can run:

conda activate biomni_e1
python3 run_evaluate_case_biomni.py --task_yaml=benchmark/tasks/fusionneoantigen_annotate_2.yaml

If you want to evaluate all tasks, we also provided the scripts in the directory. For example, if you want to evaluate the gemini-3-flash-preview model:

conda activate biomni_e1
bash evaluate_bash_scripts\run_evaluate_batch_biomni_gemini-3-flash-preview.sh

Evaluation Results

Contributors

Student Contributors: Jiaxian Yan, Xi Fang, Chenmin Wu, Jintao Zhu, Yuhang Yang, Zaixi Zhang, Meijing Fang, and Chenxi Du

Supervisors: Qi Liu, Kai Zhang

Affiliation: State Key Laboratory of Cognitive Intelligence, USTC; Peking University; Princeton University; Zhejiang University, Tsinghua University

Contact

We welcome all forms of feedback! Please raise an issue for bugs, questions, or suggestions. This helps our team address common problems efficiently and builds a more productive community. If you encounter any issues, please reach out to jiaxianyan@mail.ustc.edu.cn.

License

This project is licensed under the terms of the MIT license. See LICENSE for additional details.

Citation

If you find our work helpful, please kindly cite:

@article {Yan2026biodatalab,
	author = {Yan, Jiaxian and Fang, Xi and Zhu, Jintao and Wu, Chenmin and Yang, Yuhang and Fang, Meijing and Du, Chenxi and Zhang, Kai and Zhang, Zaixi and Liu, Qi},
	title = {Benchmarking LLM Agents on Real-World Biological Database Curation for Data-Driven Scientific Discovery},
	year = {2026},
	journal = {underview}
}

About

Benchmarking LLM Agents on Real-World Biological Database Curation for Data-Driven Scientific Discovery

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages