Skip to content

ds4dh/medical_report_generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Medical report generation

The Detectability Paradox: Bilingual Medical Report Generation with Open-Weight Models and the Limits of Human Oversight

Introduction

This project investigates the quality and safety risks of using large language models (LLMs) to automate medical report generation in English and French. We evaluate medical reports generated by several multilingual LLMs using automated metrics and a medical expert panel, demonstrating high-quality output while highlighting the need for automated tools to detect machine-generated content.

Overview

This repository contains the complete pipeline for:

  1. Data preprocessing - Process raw medical data
  2. EHR simulation - Generate synthetic electronic health records
  3. Report generation - Generate medical reports (zero-shot & few-shot)
  4. Authorship classification - Detect machine-generated vs human-written medical reports

Languages: English & French

Project structure

├── data/
│   ├── raw/                         # mtsamples urls, PubMed French PMIDs
│   └── processed/
│       ├── dev/                     # Development set (for few-shot prompting)
│       └── test/                    # Test set (for evaluation)
│
├── src/
│   ├── preprocessing/               # Data preprocessing scripts
│   │   ├── case_report_extractor.py         # Extract the French case reports    
│   │   ├── preprocessing_pmc_patients.py    # Extract the English case reports
│   │   └── medical_transcript_scraper.py    # Extract the English medical transcript
│   │
│   ├── llm_generation/
│   │   ├── ehr_simulation/          # EHR simulation
│   │   │   ├── generate_ehr.py
│   │   │   ├── config.py
│   │   │   ├── prompts.py
│   │   │   └── utils.py
│   │   │
│   │   └── report_generation/       # Medical report generation
│   │       ├── generate_report.py
│   │       ├── config.py
│   │       ├── prompts.py
│   │       └── utils.py
│   │
│   ├── evaluation/                  # Automatic evaluation
│   │   ├── bertscore_evaluator.py
│   │   └── rouge_evaluator.py      
│   │
│   └── expert_annotation/           # Expert evaluation setup
│   │    └── randomize_data.py        # Randomize samples for expert panel
│   │
│   └── authorship_classifier/        # Machine vs Human text detection
│       ├── training/
│       │   ├── train.py             # Main training script
│       │   ├── config.py            # Configuration
│       │   ├── dataset.py           # PyTorch Dataset
│       │   ├── evaluation.py        # Evaluation metrics
│       │   ├── inference.py         # Inference and predictions
│       │   ├── trainer.py           # Training logic
│       │   └── utils.py             # Utilities
│       │
│       └── ig_scores/               # Integrated Gradients analysis
│           └── compute_ig.py        # Attribution scores      
│       
│ 
├── README.md                        # This file
└── requirements.txt                 # Python dependencies

Installation

Prerequisites

  • Python 3.11+
  • CUDA-capable GPU (for vLLM)

Setup

# Clone repository
git clone https://github.com/ds4dh/medical_report_generation
cd medical_report_generation

# Install dependencies
pip install -r requirements.txt

Quick start

1. Preprocess data

Step 1: Extract medical transcripts

Scrape medical transcripts from MTSamples.com:

cd src/preprocessing

# Scrape medical transcripts from MTSamples
python medical_transcript_scraper.py \
    --input_dir ../../data/raw/mtsamples_urls.csv \
    --output_dir ../../data/raw/english_medical_transcripts.csv

Step 2: Extract French case reports from PubMed

Download each paper using its PMC ID and extract the case report section.

cd src/preprocessing

python case_report_extractor.py ../../data/raw/french_case_reports_pmc_ids.txt

Step 3: Extract English case reports from the PMC-Patients dataset

To run this script, you need to download the source dataset from: https://github.com/pmc-patients/pmc-patients

cd src/preprocessing
python preprocessing_pmc_patients.py
    --input_dir path to "PMC-Patients.json" file \
    --output_dir 'english_case_reports.csv'

2. Simulate EHR

cd src/llm_generation/ehr_simulation

python generate_ehr.py \
    --task case_report \
    --language english \
    --input_file ../../../data/processed/test/case_reports.csv


python generate_ehr.py \
    --task transcript \
    --language french \
    --input_file ../../../data/processed/test/medical_transcripts_test.csv

3. Generate reports

cd src/llm_generation/report_generation

# Zeroshot English case reports
python generate.py \
    --task case_report \
    --approach zeroshot \
    --language english \
    --input_file ../../../data/processed/test/case_reports.csv

# Fewshot French transcripts
python generate.py \
    --task transcript \
    --approach fewshot \
    --language french \
    --num_shots 3 \
    --input_file ../../../data/processed/test/transcripts.csv \
    --dev_file ../../../data/processed/dev/transcripts.csv

4. Train authorship classifier

cd src/authorship_classifier/training

# Train with default settings
python train.py --data_folder /path/to/data

# Train with custom model
python train.py \
    --data_folder /path/to/data \
    --model_name bert-base-multilingual-cased \
    --num_epochs 5 \
    --batch_size 16

Required files

Place these files in your data folder:

  • train.csv - training data
  • dev.csv - development data
  • test.csv - test data

CSV format

Required columns:

  • text: text content to classify
  • label: label (0 = machine, 1 = human)
text,label
"This is machine-generated text...",0
"This is human-written text...",1

Contact

For questions or inquiries, please contact us at hossein.rouhizadeh@unige.ch.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages