Medical report generation

The Detectability Paradox: Bilingual Medical Report Generation with Open-Weight Models and the Limits of Human Oversight

Introduction

This project investigates the quality and safety risks of using large language models (LLMs) to automate medical report generation in English and French. We evaluate medical reports generated by several multilingual LLMs using automated metrics and a medical expert panel, demonstrating high-quality output while highlighting the need for automated tools to detect machine-generated content.

Overview

This repository contains the complete pipeline for:

Data preprocessing - Process raw medical data
EHR simulation - Generate synthetic electronic health records
Report generation - Generate medical reports (zero-shot & few-shot)
Authorship classification - Detect machine-generated vs human-written medical reports

Languages: English & French

Project structure

├── data/
│   ├── raw/                         # mtsamples urls, PubMed French PMIDs
│   └── processed/
│       ├── dev/                     # Development set (for few-shot prompting)
│       └── test/                    # Test set (for evaluation)
│
├── src/
│   ├── preprocessing/               # Data preprocessing scripts
│   │   ├── case_report_extractor.py         # Extract the French case reports    
│   │   ├── preprocessing_pmc_patients.py    # Extract the English case reports
│   │   └── medical_transcript_scraper.py    # Extract the English medical transcript
│   │
│   ├── llm_generation/
│   │   ├── ehr_simulation/          # EHR simulation
│   │   │   ├── generate_ehr.py
│   │   │   ├── config.py
│   │   │   ├── prompts.py
│   │   │   └── utils.py
│   │   │
│   │   └── report_generation/       # Medical report generation
│   │       ├── generate_report.py
│   │       ├── config.py
│   │       ├── prompts.py
│   │       └── utils.py
│   │
│   ├── evaluation/                  # Automatic evaluation
│   │   ├── bertscore_evaluator.py
│   │   └── rouge_evaluator.py      
│   │
│   └── expert_annotation/           # Expert evaluation setup
│   │    └── randomize_data.py        # Randomize samples for expert panel
│   │
│   └── authorship_classifier/        # Machine vs Human text detection
│       ├── training/
│       │   ├── train.py             # Main training script
│       │   ├── config.py            # Configuration
│       │   ├── dataset.py           # PyTorch Dataset
│       │   ├── evaluation.py        # Evaluation metrics
│       │   ├── inference.py         # Inference and predictions
│       │   ├── trainer.py           # Training logic
│       │   └── utils.py             # Utilities
│       │
│       └── ig_scores/               # Integrated Gradients analysis
│           └── compute_ig.py        # Attribution scores      
│       
│ 
├── README.md                        # This file
└── requirements.txt                 # Python dependencies

Installation

Prerequisites

Python 3.11+
CUDA-capable GPU (for vLLM)

Setup

# Clone repository
git clone https://github.com/ds4dh/medical_report_generation
cd medical_report_generation

# Install dependencies
pip install -r requirements.txt

Quick start

1. Preprocess data

Step 1: Extract medical transcripts

Scrape medical transcripts from MTSamples.com:

cd src/preprocessing

# Scrape medical transcripts from MTSamples
python medical_transcript_scraper.py \
    --input_dir ../../data/raw/mtsamples_urls.csv \
    --output_dir ../../data/raw/english_medical_transcripts.csv

Step 2: Extract French case reports from PubMed

Download each paper using its PMC ID and extract the case report section.

cd src/preprocessing

python case_report_extractor.py ../../data/raw/french_case_reports_pmc_ids.txt

Step 3: Extract English case reports from the PMC-Patients dataset

To run this script, you need to download the source dataset from: https://github.com/pmc-patients/pmc-patients

cd src/preprocessing
python preprocessing_pmc_patients.py
    --input_dir path to "PMC-Patients.json" file \
    --output_dir 'english_case_reports.csv'

2. Simulate EHR

cd src/llm_generation/ehr_simulation

python generate_ehr.py \
    --task case_report \
    --language english \
    --input_file ../../../data/processed/test/case_reports.csv


python generate_ehr.py \
    --task transcript \
    --language french \
    --input_file ../../../data/processed/test/medical_transcripts_test.csv

3. Generate reports

cd src/llm_generation/report_generation

# Zeroshot English case reports
python generate.py \
    --task case_report \
    --approach zeroshot \
    --language english \
    --input_file ../../../data/processed/test/case_reports.csv

# Fewshot French transcripts
python generate.py \
    --task transcript \
    --approach fewshot \
    --language french \
    --num_shots 3 \
    --input_file ../../../data/processed/test/transcripts.csv \
    --dev_file ../../../data/processed/dev/transcripts.csv

4. Train authorship classifier

cd src/authorship_classifier/training

# Train with default settings
python train.py --data_folder /path/to/data

# Train with custom model
python train.py \
    --data_folder /path/to/data \
    --model_name bert-base-multilingual-cased \
    --num_epochs 5 \
    --batch_size 16

Required files

Place these files in your data folder:

train.csv - training data
dev.csv - development data
test.csv - test data

CSV format

Required columns:

text: text content to classify
label: label (0 = machine, 1 = human)

text,label
"This is machine-generated text...",0
"This is human-written text...",1

Contact

For questions or inquiries, please contact us at hossein.rouhizadeh@unige.ch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Medical report generation

Introduction

Overview

Project structure

Installation

Prerequisites

Setup

Quick start

1. Preprocess data

Step 1: Extract medical transcripts

Step 2: Extract French case reports from PubMed

Step 3: Extract English case reports from the PMC-Patients dataset

2. Simulate EHR

3. Generate reports

4. Train authorship classifier

Required files

CSV format

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
data		data
src		src
README.md		README.md
requirements.txt		requirements.txt

ds4dh/medical_report_generation

Folders and files

Latest commit

History

Repository files navigation

Medical report generation

Introduction

Overview

Project structure

Installation

Prerequisites

Setup

Quick start

1. Preprocess data

Step 1: Extract medical transcripts

Step 2: Extract French case reports from PubMed

Step 3: Extract English case reports from the PMC-Patients dataset

2. Simulate EHR

3. Generate reports

4. Train authorship classifier

Required files

CSV format

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages