Skip to content

monajm36/ohca-classifier-v11

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OHCA Classifier: Automated Out-of-Hospital Cardiac Arrest Identification

Hugging Face Python 3.8+ License: MIT

A transformer-based deep learning system for automatically identifying Out-of-Hospital Cardiac Arrest (OHCA) cases from clinical notes, with specific focus on reducing false positives from in-hospital cardiac arrest (IHCA) cases.

🎯 Key Results

Metric V9 (Baseline) V10 (+ Location) V11 (+ Temporal)
Sensitivity 96.1% 84.2% 92.1%
Specificity 69.6% 89.6% 89.4%
F1-Score 0.732 0.814 0.856
AUC-ROC 0.932 0.938 0.956
IHCA False Positives 88 (64.2%) 39 (28.5%) 39 (28.5%)

Validation: 647 manually annotated clinical notes from UChicago C19 dataset

πŸš€ Quick Start

Installation

# Clone repository
git clone https://github.com/monajm36/ohca-classifier.git
cd ohca-classifier

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Using the Pre-trained Model

from predict_v11 import predict_ohca

# Example clinical note
note = """
Patient found unresponsive at home by family. 911 called.
EMS arrived and initiated CPR. ROSC achieved in field.
Transported to ED.
"""

# Predict
result = predict_ohca(note, threshold=0.14)

print(f"Prediction: {result['prediction']}")  # 'OHCA' or 'Non-OHCA'
print(f"Probability: {result['probability']:.2%}")
print(f"Features: {result['features']}")

Download Pre-trained Model

# Model is available on Hugging Face
# https://huggingface.co/monajm36/ohca-classifier-v11

# Or use the download script
python scripts/download_model.py

πŸ“Š Model Architecture

V11: Temporal + Location-Aware OHCA Classifier

Input Clinical Note
       ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Text Processing     β”‚
β”‚  - Extract sections  β”‚
β”‚  - Tokenize (512)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Feature Extraction                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  1. BERT Embeddings (768)            β”‚
β”‚  2. Location Features (2)            β”‚
β”‚     β€’ OHCA indicators (22 phrases)   β”‚
β”‚     β€’ IHCA indicators (25 phrases)   β”‚
β”‚  3. Temporal Features (5)            β”‚
β”‚     β€’ Arrest timing score            β”‚
β”‚     β€’ First location (in/out)        β”‚
β”‚     β€’ Movement patterns              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  MLP Classifier      β”‚
β”‚  775 β†’ 512 β†’ 256 β†’ 2 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       ↓
  OHCA Probability

Base Model: microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract

πŸ“ Repository Structure

ohca-classifier/
β”œβ”€β”€ README.md                          # This file
β”œβ”€β”€ requirements.txt                   # Python dependencies
β”œβ”€β”€ setup.py                          # Package installation
β”‚
β”œβ”€β”€ models/                           # Model code
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ v9_bert_classifier.py        # V9: Baseline BERT
β”‚   β”œβ”€β”€ v10_location_aware.py        # V10: + Location features
β”‚   └── v11_temporal_location.py     # V11: + Temporal features
β”‚
β”œβ”€β”€ training/                         # Training scripts
β”‚   β”œβ”€β”€ train_v9.py
β”‚   β”œβ”€β”€ train_v10.py
β”‚   └── train_v11.py
β”‚
β”œβ”€β”€ prediction/                       # Prediction scripts
β”‚   β”œβ”€β”€ predict_v9.py
β”‚   β”œβ”€β”€ predict_v10.py
β”‚   └── predict_v11.py
β”‚
β”œβ”€β”€ features/                         # Feature extraction
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ text_processing.py           # Section extraction
β”‚   β”œβ”€β”€ location_features.py         # Location indicators
β”‚   └── temporal_features.py         # Temporal features
β”‚
β”œβ”€β”€ evaluation/                       # Evaluation scripts
β”‚   β”œβ”€β”€ compare_models.py            # V9 vs V10 vs V11
β”‚   β”œβ”€β”€ threshold_optimization.py   # Find optimal thresholds
β”‚   └── error_analysis.py           # Analyze false positives
β”‚
β”œβ”€β”€ scripts/                          # Utility scripts
β”‚   β”œβ”€β”€ download_model.py            # Download from HF
β”‚   └── prepare_data.py              # Data preprocessing
β”‚
β”œβ”€β”€ notebooks/                        # Jupyter notebooks
β”‚   β”œβ”€β”€ model_comparison.ipynb
β”‚   β”œβ”€β”€ feature_analysis.ipynb
β”‚   └── demo.ipynb
β”‚
β”œβ”€β”€ docs/                            # Documentation
β”‚   β”œβ”€β”€ COMPREHENSIVE_REPORT.pdf     # Full technical report
β”‚   β”œβ”€β”€ model_architecture.md
β”‚   β”œβ”€β”€ feature_engineering.md
β”‚   └── training_guide.md
β”‚
└── tests/                           # Unit tests
    β”œβ”€β”€ test_features.py
    β”œβ”€β”€ test_models.py
    └── test_predictions.py

πŸ”¬ Methodology

Progressive Feature Engineering

Our approach built three successive models, each addressing specific challenges:

V9: Baseline BERT Classifier

  • Pure semantic understanding
  • Issue: 65% of false positives were IHCA cases
  • Learning: Model confused arrest terminology regardless of location

V10: Location-Aware Classifier

  • Added 2 location features (22 OHCA + 25 IHCA indicators)
  • Reduced IHCA false positives by 55.7%
  • Issue: Lost sensitivity (96.1% β†’ 84.2%)
  • Learning: Location helps but temporal context missing

V11: Temporal + Location-Aware Classifier

  • Added 5 temporal features (timing, movement patterns)
  • Recovered sensitivity (84.2% β†’ 92.1%)
  • Maintained high specificity (89.4%)
  • Learning: Temporal sequence crucial for disambiguation

Feature Categories

Location Features (2):

  1. OHCA indicator count: home, EMS, scene, field, bystander, ambulance...
  2. IHCA indicator count: floor, ICU, ward, room, bed, code blue...

Temporal Features (5):

  1. Arrest timing score: "before arrival" vs "during hospitalization" phrases
  2. First location outside: Binary indicator of first location mentioned
  3. First location inside: Binary indicator of first location mentioned
  4. Movement outside→inside: Count of transition patterns
  5. Movement inside→inside: Count of transition patterns

πŸ“ˆ Training

Data

  • Training Set: MIMIC-III clinical notes

    • 330 notes total (47 OHCA, 283 Non-OHCA)
    • Split: 70% train / 15% validation / 15% test
    • Average note length: 13,042 characters
  • Validation Set: UChicago C19 dataset

    • 647 manually annotated cases
    • 203 OHCA (31.4%)
    • 137 IHCA (21.2%)
    • 307 Non-arrest (47.4%)

Train Your Own Model

# Train V11 model
python training/train_v11.py \
    --data_path /path/to/mimic_labelled_binary.csv \
    --output_dir ./models/v11_output \
    --batch_size 4 \
    --learning_rate 2e-5 \
    --num_epochs 5

# Train with custom hyperparameters
python training/train_v11.py \
    --config configs/custom_config.json

🎯 Threshold Selection

V11 offers flexible threshold tuning for different clinical scenarios:

Use Case Threshold Sensitivity Specificity F1 When to Use
Screening 0.14 92.1% 89.4% 0.856 Maximize recall
Balanced 0.74 82.3% 93.2% 0.831 General use
Research 0.85 75.4% 95.0% 0.810 High precision needed
# Use different thresholds
result_screening = predict_ohca(note, threshold=0.14)  # High sensitivity
result_balanced = predict_ohca(note, threshold=0.74)   # Balanced
result_research = predict_ohca(note, threshold=0.85)   # High specificity

πŸ“Š Evaluation

Compare All Models

# Run comprehensive comparison
python evaluation/compare_models.py \
    --data_path /path/to/c19_validation.csv \
    --output_dir ./results

# Generate comparison plots
python evaluation/compare_models.py --plot

Error Analysis

# Analyze false positives and false negatives
python evaluation/error_analysis.py \
    --model_path models/v11_output/final_model \
    --data_path /path/to/validation.csv

πŸ” Example Predictions

True OHCA Case

Patient found unresponsive at home. Family called 911.
EMS arrived, started CPR. ROSC in field.
Transported to ED.

β†’ Prediction: OHCA (98.5%)
β†’ Key features:
   - Location: home (OHCA +1), EMS (+1)
   - Temporal: "found at home", "ROSC in field"
   - Movement: outside→inside

True IHCA Case

Patient admitted to medical floor for pneumonia.
On hospital day 3, found unresponsive in bed.
Code blue called. CPR initiated on floor.

β†’ Prediction: Non-OHCA (2.3% OHCA probability)
β†’ Key features:
   - Location: floor (+1), bed (+1)
   - Temporal: "hospital day 3", "admitted"
   - Movement: inside→inside

Challenging Case (Mixed Signals)

Patient arrested at home. EMS called, ROSC achieved.
Admitted to ICU. On hospital day 2, arrested again.

β†’ V9: OHCA (85%) - sees "home" and "EMS"
β†’ V10: Uncertain (52%) - conflicting locations
β†’ V11: OHCA (78%) - temporal features indicate primary arrest was OHCA

🀝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Areas for improvement:

  • Additional temporal features
  • Multi-institution validation
  • Support for non-English notes
  • Real-time deployment pipeline
  • Explainability visualizations

πŸ“ Citation

If you use this code or model in your research, please cite:

@misc{moukaddem2025ohca,
  author = {Moukaddem, Mona},
  title = {OHCA Classifier: Automated Out-of-Hospital Cardiac Arrest Identification 
           using Temporal and Location-Aware Deep Learning},
  year = {2025},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/monajm36/ohca-classifier}},
  note = {Model available at \url{https://huggingface.co/monajm36/ohca-classifier-v11}}
}

πŸ“„ License

This project is licensed under the MIT License - see LICENSE file for details.

πŸ™ Acknowledgments

  • MIMIC-III Database: Johnson et al., Scientific Data (2016)
  • UChicago C19 Dataset: Validation data source
  • BiomedNLP-PubMedBERT: Microsoft Research
  • Hugging Face: Model hosting and transformers library

πŸ“§ Contact

Mona Moukaddem

πŸ”— Links


Built with ❀️ for improving cardiac arrest research and patient care

About

V11 Temporal + Location-Aware OHCA Classifier - 92.1% sensitivity

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages