CaseReportBench: Benchmarking LLMs for Dense Information Extraction in Clinical Case Reports

Official repository for the accepted CHIL 2025 paper
.

🔔 Note

This github Repo accompanies our upcoming publication:

Zhang et al. CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports.
To appear in the Proceedings of the Conference on Health, Inference, and Learning (CHIL 2025), PMLR.

The official PMLR citation and link will be added upon publication.

📘 Overview

CaseReportBench is the first benchmark designed for dense information extraction from clinical case reports, focused on rare diseases, especially Inborn Errors of Metabolism (IEMs). This benchmark evaluates how well large language models (LLMs) can extract structured, clinically relevant data across 14 system-level categories, such as Neurology, History, Lab/Imaging, and Musculoskeletal (MSK).

Key Contributions:

A curated dataset of 138 expert-annotated case reports.
Dense extractions across 14 predefined diagnostic categories.
Evaluation of LLMs including *Qwen2, Qwen2.5, LLaMA3, and GPT-4o.
Novel prompting strategies: Filtered Category-Specific Prompting (FCSP), Uniform Category-Specific Prompting (UCP), and Unified Global Prompting (UGP).
Expert clinical assessment of model outputs.

🧩 Source Code Overview (`src/`)

The src/ folder contains all key components for dataset construction, prompting logic, and LLM evaluation:

Folder	Description
`dataset_construction/`	Scripts to process PMC-OA case reports, filter IEM cases, and structure data into prompt-ready JSON. Includes code for expert annotation merging and TSR filtering.
`benchmarking_llms/`	Evaluate LLM dense information extractions against gold expert-crafted annotations, and compute all metrics (TSR, EM, hallucination, etc).

Supplementary Materials (`supplemengary_material/`)

This dataset includes the following supplementary files:

65_Excluded_Subheadings_Casefilter.json: Subheading-level case filtering metadata.
65_Subheading_Category_Mapping.json: Mapping of subheadings to clinical categories.
65_Excluded_Title_Manual_Review.txt: Manually reviewed titles excluded from the dataset.

These files support the CHIL 2025 submission and are referenced in the accompanying arXiv paper.

🧪 Setup Instructions

To set up the environment using Conda:

conda env create -f environment.yaml
conda activate CaseReportBench

📦 Dataset Access

The dataset is available on the Hugging Face Hub:
👉 https://huggingface.co/datasets/cxyzhang/caseReportBench_ClinicalDenseExtraction_Benchmark

To load it in Python:

from datasets import load_dataset

dataset = load_dataset("cxyzhang/consolidated_expert_validated_denseExtractionDataset")

🔓 License

Code: MIT License (LICENSE.txt)
Dataset: CC BY-NC 4.0 (DATA_LICENSE.txt)

The dataset is derived from the PubMed Central Open Access Subset and is for non-commercial academic use only.

📝 Citation

If you use this work in your research, please cite:

@inproceedings{zhang2025casereportbench,
title={CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports},
author={Zhang, Xiao Yu Cindy and Ferreira, Carlos R. and Rossignol, Francis and Ng, Raymond T. and Wasserman, Wyeth and Zhu, Jian},
booktitle={Proceedings of the Sixth Conference on Health, Inference, and Learning},
series={Proceedings of Machine Learning Research},
volume={287},
pages={527--542},
year={2025},
publisher={PMLR}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
doc		doc
src		src
supplementary_material		supplementary_material
DATA_LICENSE.txt		DATA_LICENSE.txt
LICENSE.txt		LICENSE.txt
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CaseReportBench: Benchmarking LLMs for Dense Information Extraction in Clinical Case Reports

🔔 Note

📘 Overview

🧩 Source Code Overview (`src/`)

Supplementary Materials (`supplemengary_material/`)

🧪 Setup Instructions

📦 Dataset Access

🔓 License

📝 Citation

About

Uh oh!

Releases

Packages

Languages

License

cindyzhangxy/CaseReportBench

Folders and files

Latest commit

History

Repository files navigation

CaseReportBench: Benchmarking LLMs for Dense Information Extraction in Clinical Case Reports

🔔 Note

📘 Overview

🧩 Source Code Overview (src/)

Supplementary Materials (supplemengary_material/)

🧪 Setup Instructions

📦 Dataset Access

🔓 License

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

🧩 Source Code Overview (`src/`)

Supplementary Materials (`supplemengary_material/`)

Packages