Skip to content

cindyzhangxy/CaseReportBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CaseReportBench: Benchmarking LLMs for Dense Information Extraction in Clinical Case Reports

Official repository for the accepted CHIL 2025 paper
Dataset. Conference License: MIT Cite this


🔔 Note

This github Repo accompanies our upcoming publication:

Zhang et al. CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports.
To appear in the Proceedings of the Conference on Health, Inference, and Learning (CHIL 2025), PMLR.

The official PMLR citation and link will be added upon publication.


📘 Overview

CaseReportBench is the first benchmark designed for dense information extraction from clinical case reports, focused on rare diseases, especially Inborn Errors of Metabolism (IEMs). This benchmark evaluates how well large language models (LLMs) can extract structured, clinically relevant data across 14 system-level categories, such as Neurology, History, Lab/Imaging, and Musculoskeletal (MSK).

Key Contributions:

  • A curated dataset of 138 expert-annotated case reports.
  • Dense extractions across 14 predefined diagnostic categories.
  • Evaluation of LLMs including *Qwen2, Qwen2.5, LLaMA3, and GPT-4o.
  • Novel prompting strategies: Filtered Category-Specific Prompting (FCSP), Uniform Category-Specific Prompting (UCP), and Unified Global Prompting (UGP).
  • Expert clinical assessment of model outputs.

🧩 Source Code Overview (src/)

The src/ folder contains all key components for dataset construction, prompting logic, and LLM evaluation:

Folder Description
dataset_construction/ Scripts to process PMC-OA case reports, filter IEM cases, and structure data into prompt-ready JSON. Includes code for expert annotation merging and TSR filtering.
benchmarking_llms/ Evaluate LLM dense information extractions against gold expert-crafted annotations, and compute all metrics (TSR, EM, hallucination, etc).

Supplementary Materials (supplemengary_material/)

This dataset includes the following supplementary files:

  • 65_Excluded_Subheadings_Casefilter.json: Subheading-level case filtering metadata.
  • 65_Subheading_Category_Mapping.json: Mapping of subheadings to clinical categories.
  • 65_Excluded_Title_Manual_Review.txt: Manually reviewed titles excluded from the dataset.

These files support the CHIL 2025 submission and are referenced in the accompanying arXiv paper.

🧪 Setup Instructions

To set up the environment using Conda:

conda env create -f environment.yaml
conda activate CaseReportBench

📦 Dataset Access

The dataset is available on the Hugging Face Hub:
👉 https://huggingface.co/datasets/cxyzhang/caseReportBench_ClinicalDenseExtraction_Benchmark

To load it in Python:

from datasets import load_dataset

dataset = load_dataset("cxyzhang/consolidated_expert_validated_denseExtractionDataset")

🔓 License

The dataset is derived from the PubMed Central Open Access Subset and is for non-commercial academic use only.

📝 Citation

If you use this work in your research, please cite:

@inproceedings{zhang2025casereportbench,
title={CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports},
author={Zhang, Xiao Yu Cindy and Ferreira, Carlos R. and Rossignol, Francis and Ng, Raymond T. and Wasserman, Wyeth and Zhu, Jian},
booktitle={Proceedings of the Sixth Conference on Health, Inference, and Learning},
series={Proceedings of Machine Learning Research},
volume={287},
pages={527--542},
year={2025},
publisher={PMLR}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published