This is the offical implement of "BioDataLab: Benchmarking LLM Agents on Real-World Biological Database Curation for Data-Driven Scientific Discovery". If you encounter any issues, please reach out to jiaxianyan@mail.ustc.edu.cn.
We introduce BioDataLab, a rigorous benchmark comprising 100 tasks meticulously derived from 57 high-impact database publications, covering 9 biological domains and 7 data modalities.
BioDataLab evaluates the capability of autonomous agents to transform raw, heterogeneous biological resources into structured, analysis-ready databases.
Tasks are classfied by their primary intention into 4 types: open-world data retrieval, structured data extraction, functional feature annotation, and data refinement and integration.
A overall summary of all 100 tasks are listed in BioDataLab.
First, download the necessary input from huggingface and unzip it as ./benchmark/dataset.
Then, the whole benchmark directory structure should look like this:
|-- BioDataLab/
|---- BioDataLab.csv
|---- benchmark/
|------ datasets/ # subdirectory for input data
|-------- ...
|------ verifiers/ # subdirectory for succsess rate verfiers
|-------- ...
|------ verifiers_valid/ # subdirectory for valid rate verfiers
|-------- ...
|------ tasks/ # subdirectory for detailed task description yaml files
|-------- ...
|------ gold_programs/ # subdirectory for programs used to generate groundtruth results if applicable
|-------- ...
|------ gold_results/ # subdirectory for groundtruth
|-------- ...
conda create -f environment.ymlSetting the LLM API_KEY and BASE_URL in assistant\llm.py.
API_KEY = ""
BASE_URL = ""If you want to evaluate on one task, for example fusionneoantigen_annotate_2, you can run:
conda activate biomni_e1
python3 run_evaluate_case_biomni.py --task_yaml=benchmark/tasks/fusionneoantigen_annotate_2.yamlIf you want to evaluate all tasks, we also provided the scripts in the directory. For example, if you want to evaluate the gemini-3-flash-preview model:
conda activate biomni_e1
bash evaluate_bash_scripts\run_evaluate_batch_biomni_gemini-3-flash-preview.shStudent Contributors: Jiaxian Yan, Xi Fang, Chenmin Wu, Jintao Zhu, Yuhang Yang, Zaixi Zhang, Meijing Fang, and Chenxi Du
Supervisors: Qi Liu, Kai Zhang
Affiliation: State Key Laboratory of Cognitive Intelligence, USTC; Peking University; Princeton University; Zhejiang University, Tsinghua University
We welcome all forms of feedback! Please raise an issue for bugs, questions, or suggestions. This helps our team address common problems efficiently and builds a more productive community. If you encounter any issues, please reach out to jiaxianyan@mail.ustc.edu.cn.
This project is licensed under the terms of the MIT license. See LICENSE for additional details.
If you find our work helpful, please kindly cite:
@article {Yan2026biodatalab,
author = {Yan, Jiaxian and Fang, Xi and Zhu, Jintao and Wu, Chenmin and Yang, Yuhang and Fang, Meijing and Du, Chenxi and Zhang, Kai and Zhang, Zaixi and Liu, Qi},
title = {Benchmarking LLM Agents on Real-World Biological Database Curation for Data-Driven Scientific Discovery},
year = {2026},
journal = {underview}
}


