A transformer-based deep learning system for automatically identifying Out-of-Hospital Cardiac Arrest (OHCA) cases from clinical notes, with specific focus on reducing false positives from in-hospital cardiac arrest (IHCA) cases.
| Metric | V9 (Baseline) | V10 (+ Location) | V11 (+ Temporal) |
|---|---|---|---|
| Sensitivity | 96.1% | 84.2% | 92.1% |
| Specificity | 69.6% | 89.6% | 89.4% |
| F1-Score | 0.732 | 0.814 | 0.856 |
| AUC-ROC | 0.932 | 0.938 | 0.956 |
| IHCA False Positives | 88 (64.2%) | 39 (28.5%) | 39 (28.5%) |
Validation: 647 manually annotated clinical notes from UChicago C19 dataset
# Clone repository
git clone https://github.com/monajm36/ohca-classifier.git
cd ohca-classifier
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtfrom predict_v11 import predict_ohca
# Example clinical note
note = """
Patient found unresponsive at home by family. 911 called.
EMS arrived and initiated CPR. ROSC achieved in field.
Transported to ED.
"""
# Predict
result = predict_ohca(note, threshold=0.14)
print(f"Prediction: {result['prediction']}") # 'OHCA' or 'Non-OHCA'
print(f"Probability: {result['probability']:.2%}")
print(f"Features: {result['features']}")# Model is available on Hugging Face
# https://huggingface.co/monajm36/ohca-classifier-v11
# Or use the download script
python scripts/download_model.pyV11: Temporal + Location-Aware OHCA Classifier
Input Clinical Note
β
ββββββββββββββββββββββββ
β Text Processing β
β - Extract sections β
β - Tokenize (512) β
ββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββ
β Feature Extraction β
ββββββββββββββββββββββββββββββββββββββββ€
β 1. BERT Embeddings (768) β
β 2. Location Features (2) β
β β’ OHCA indicators (22 phrases) β
β β’ IHCA indicators (25 phrases) β
β 3. Temporal Features (5) β
β β’ Arrest timing score β
β β’ First location (in/out) β
β β’ Movement patterns β
ββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββ
β MLP Classifier β
β 775 β 512 β 256 β 2 β
ββββββββββββββββββββββββ
β
OHCA Probability
Base Model: microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract
ohca-classifier/
βββ README.md # This file
βββ requirements.txt # Python dependencies
βββ setup.py # Package installation
β
βββ models/ # Model code
β βββ __init__.py
β βββ v9_bert_classifier.py # V9: Baseline BERT
β βββ v10_location_aware.py # V10: + Location features
β βββ v11_temporal_location.py # V11: + Temporal features
β
βββ training/ # Training scripts
β βββ train_v9.py
β βββ train_v10.py
β βββ train_v11.py
β
βββ prediction/ # Prediction scripts
β βββ predict_v9.py
β βββ predict_v10.py
β βββ predict_v11.py
β
βββ features/ # Feature extraction
β βββ __init__.py
β βββ text_processing.py # Section extraction
β βββ location_features.py # Location indicators
β βββ temporal_features.py # Temporal features
β
βββ evaluation/ # Evaluation scripts
β βββ compare_models.py # V9 vs V10 vs V11
β βββ threshold_optimization.py # Find optimal thresholds
β βββ error_analysis.py # Analyze false positives
β
βββ scripts/ # Utility scripts
β βββ download_model.py # Download from HF
β βββ prepare_data.py # Data preprocessing
β
βββ notebooks/ # Jupyter notebooks
β βββ model_comparison.ipynb
β βββ feature_analysis.ipynb
β βββ demo.ipynb
β
βββ docs/ # Documentation
β βββ COMPREHENSIVE_REPORT.pdf # Full technical report
β βββ model_architecture.md
β βββ feature_engineering.md
β βββ training_guide.md
β
βββ tests/ # Unit tests
βββ test_features.py
βββ test_models.py
βββ test_predictions.py
Our approach built three successive models, each addressing specific challenges:
V9: Baseline BERT Classifier
- Pure semantic understanding
- Issue: 65% of false positives were IHCA cases
- Learning: Model confused arrest terminology regardless of location
V10: Location-Aware Classifier
- Added 2 location features (22 OHCA + 25 IHCA indicators)
- Reduced IHCA false positives by 55.7%
- Issue: Lost sensitivity (96.1% β 84.2%)
- Learning: Location helps but temporal context missing
V11: Temporal + Location-Aware Classifier
- Added 5 temporal features (timing, movement patterns)
- Recovered sensitivity (84.2% β 92.1%)
- Maintained high specificity (89.4%)
- Learning: Temporal sequence crucial for disambiguation
Location Features (2):
- OHCA indicator count: home, EMS, scene, field, bystander, ambulance...
- IHCA indicator count: floor, ICU, ward, room, bed, code blue...
Temporal Features (5):
- Arrest timing score: "before arrival" vs "during hospitalization" phrases
- First location outside: Binary indicator of first location mentioned
- First location inside: Binary indicator of first location mentioned
- Movement outsideβinside: Count of transition patterns
- Movement insideβinside: Count of transition patterns
-
Training Set: MIMIC-III clinical notes
- 330 notes total (47 OHCA, 283 Non-OHCA)
- Split: 70% train / 15% validation / 15% test
- Average note length: 13,042 characters
-
Validation Set: UChicago C19 dataset
- 647 manually annotated cases
- 203 OHCA (31.4%)
- 137 IHCA (21.2%)
- 307 Non-arrest (47.4%)
# Train V11 model
python training/train_v11.py \
--data_path /path/to/mimic_labelled_binary.csv \
--output_dir ./models/v11_output \
--batch_size 4 \
--learning_rate 2e-5 \
--num_epochs 5
# Train with custom hyperparameters
python training/train_v11.py \
--config configs/custom_config.jsonV11 offers flexible threshold tuning for different clinical scenarios:
| Use Case | Threshold | Sensitivity | Specificity | F1 | When to Use |
|---|---|---|---|---|---|
| Screening | 0.14 | 92.1% | 89.4% | 0.856 | Maximize recall |
| Balanced | 0.74 | 82.3% | 93.2% | 0.831 | General use |
| Research | 0.85 | 75.4% | 95.0% | 0.810 | High precision needed |
# Use different thresholds
result_screening = predict_ohca(note, threshold=0.14) # High sensitivity
result_balanced = predict_ohca(note, threshold=0.74) # Balanced
result_research = predict_ohca(note, threshold=0.85) # High specificity# Run comprehensive comparison
python evaluation/compare_models.py \
--data_path /path/to/c19_validation.csv \
--output_dir ./results
# Generate comparison plots
python evaluation/compare_models.py --plot# Analyze false positives and false negatives
python evaluation/error_analysis.py \
--model_path models/v11_output/final_model \
--data_path /path/to/validation.csvPatient found unresponsive at home. Family called 911.
EMS arrived, started CPR. ROSC in field.
Transported to ED.
β Prediction: OHCA (98.5%)
β Key features:
- Location: home (OHCA +1), EMS (+1)
- Temporal: "found at home", "ROSC in field"
- Movement: outsideβinside
Patient admitted to medical floor for pneumonia.
On hospital day 3, found unresponsive in bed.
Code blue called. CPR initiated on floor.
β Prediction: Non-OHCA (2.3% OHCA probability)
β Key features:
- Location: floor (+1), bed (+1)
- Temporal: "hospital day 3", "admitted"
- Movement: insideβinside
Patient arrested at home. EMS called, ROSC achieved.
Admitted to ICU. On hospital day 2, arrested again.
β V9: OHCA (85%) - sees "home" and "EMS"
β V10: Uncertain (52%) - conflicting locations
β V11: OHCA (78%) - temporal features indicate primary arrest was OHCA
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Areas for improvement:
- Additional temporal features
- Multi-institution validation
- Support for non-English notes
- Real-time deployment pipeline
- Explainability visualizations
If you use this code or model in your research, please cite:
@misc{moukaddem2025ohca,
author = {Moukaddem, Mona},
title = {OHCA Classifier: Automated Out-of-Hospital Cardiac Arrest Identification
using Temporal and Location-Aware Deep Learning},
year = {2025},
publisher = {GitHub},
howpublished = {\url{https://github.com/monajm36/ohca-classifier}},
note = {Model available at \url{https://huggingface.co/monajm36/ohca-classifier-v11}}
}This project is licensed under the MIT License - see LICENSE file for details.
- MIMIC-III Database: Johnson et al., Scientific Data (2016)
- UChicago C19 Dataset: Validation data source
- BiomedNLP-PubMedBERT: Microsoft Research
- Hugging Face: Model hosting and transformers library
Mona Moukaddem
- GitHub: @monajm36
- Hugging Face: @monajm36
- Email: [your-email@example.com]
- π€ Pre-trained Model (Hugging Face)
- π Interactive Demo (coming soon)
- π Technical Report
- π Blog Post (coming soon)
Built with β€οΈ for improving cardiac arrest research and patient care