Production-ready Emergency Medicine scribe with built-in safety guardrails.
Designed for ED teams and NVIDIA healthcare AI startups that need a curated, bleeding-edge reference stack.
This repository now doubles as a living showcase of NVIDIA’s latest open healthcare AI tooling. The H100 playbook demonstrated here spans clinical text, imaging, multimodal, and deployment workflows:
- CUDA 13.0 Update 2 with driver
580.105.08(Blackwell-ready) and Nsight Compute/Systems 2025.4. - CUTLASS 4.3.0 kernels + FlashAttention 3.0.1 for high-throughput Hopper compute.
- NVIDIA Nemotron Nano / Super Models served through NIM (text-only
llama‑3.1‑nemotron‑nano‑8b‑v1+ VLllama‑3.1‑nemotron‑nano‑vl‑8b‑v1examples). - NVIDIA NeMo 25.09.1 + ModelOpt 0.21 for speech/Large Language Model fine-tuning.
- MONAI 1.5.1 (weekly) pipelines for medical imaging triage and FHIR export glue.
- Clara Holoscan 3.0.1 operators for edge inference + DALI (
cuda130) preprocessing. - TensorRT‑LLM 0.17.1 & Triton 3.2.1 for low-latency deployment.
- Presidio 2.2.360 + GOATnote guardrails for HIPAA compliance.
📁 See
docs/NVIDIA-Healthcare-AI-Showcase.mdfor setup commands, architecture diagrams, and links to each toolkit.
Safety guardrails for high-stakes clinical documentation:
- ✅ Vital signs validation: HR, BP, RR, Temp, SpO2, GCS within valid ranges
- ✅ Medication limits: 11 common ED medications with max safe doses
- ✅ Protocol warnings: ACLS, ATLS, sepsis, stroke, STEMI protocols
- ✅ ED documentation structure: 12-section format with Medical Decision Making
- ✅ High-risk awareness: Prompts for critical rule-outs
- ✅ Medical-legal compliance: Return precautions, informed consent
Built-in Guardrails (cannot be disabled in production):
- ✅ Vital Signs Validation: HR, BP, RR, Temp, SpO2, GCS within valid ranges
- ✅ Medication Limits: 11 common ED medications with max safe doses
- ✅ Protocol Warnings: ACLS, ATLS, sepsis, stroke, STEMI protocols
- ✅ ED Note Structure: All 12 required sections enforced
Example: Catches "HR 350 BPM" →
Example: Catches "Morphine 50mg IV" →
- Guardrails validated:
./deploy/verify.shand CLI tests flag unsafe vitals and medication doses (e.g.morphine 50 mg→ critical violation). - HIPAA PHI detection: Microsoft Presidio 2.2.360 removes identifiers; smoke tests surface redaction counts in CLI output.
- FHIR R4 export:
FHIRExporterintegration for GCP Healthcare API (see Python API example). - Measured MONAI speedup:
scripts/monai_h100_benchmark.pyon H100 (CUDA 13.0.2) recorded 3.27 ms baseline vs 1.57 ms optimized (≈2.1×). - Reproducible profiling: Nsight Compute artefacts live in
docs/benchmarks/ncu/latest/for transparent kernel analysis.
- Model:
nvidia/llama-3.1-nemotron-nano-8b-v1served via NVIDIA NIM (public model as of Nov 2025) - PHI Detection: Microsoft Presidio 2.2.360 (HIPAA 18-identifier coverage)
- FHIR: GCP Healthcare API (R4 dataset/store helpers)
- Infrastructure: H100 PCIe, CUDA 13.0.2, CUTLASS 4.3.0, FlashAttention 3.0.1
For startups evaluating NVIDIA's healthcare AI stack:
# One-command deploy to H100
git clone https://github.com/GOATnote-Inc/scribegoat.git
cd scribegoat
export NGC_API_KEY="nvapi-YOUR-KEY-HERE"
./deploy/quick_start.shWhat you get:
- ✅ H100 deployment validated via
./deploy/h100_auto_deploy.sh(Nov 12 2025 smoke test) - ✅ Safety guardrails (vitals, meds, protocols)
- ✅ HIPAA-compliant PHI detection
- ✅ FHIR R4 export ready
- ✅ Gradio UI on port 7860
Full deployment guide: deploy/DEPLOY.md
See NVIDIA tech in action:
# Run Nsight profiling (exports to docs/benchmarks/ncu/)
./deploy/profile.shTechnologies showcased:
- CUDA 13.0 (Blackwell support, tile programming)
- CUTLASS 4.3 (high-performance primitives)
- FlashAttention-3 (H100-optimized attention)
- cuDNN 9.15, TensorRT 10.14
Profiling guide: deploy/PROFILE.md
- Report:
docs/benchmarks/ncu/latest/ncu_monai_opt_latest.ncu-rep(MONAI UNet,torch.compile, AMP) - Summary CSV + plots:
docs/benchmarks/ncu/latest/ - Top kernels:
vol2col(39% time) +vol2im(19% time) dominate; CUTLASS GEMM/GEMV kernels peak around 53% SM occupancy in this run. - Full-metric replay inflates wall-clock time; for quick spot checks run
sudo ncu --set roofline -c 1. - Recreate sanitized artefacts with
python scripts/process_ncu_report.py --raw /tmp/ncu_raw.csv --outdir docs/benchmarks/ncu/<tag>.
python app.py
# Visit: http://localhost:7860
# Or get Brev public URL: brev urls# Generate ED note
python -m goatnote_scribe.cli "35M with chest pain, 2h duration..."
# From file
cat encounter.txt | python -m goatnote_scribe.cli
# With FHIR export
python -m goatnote_scribe.cli --export-fhir "Patient presents..."from goatnote_scribe import GOATScribe
scribe = GOATScribe()
result = scribe(
"45M with fever, cough, dyspnea. BP 128/82, HR 92, RR 18, SpO2 96%."
)
print(result['note']) # Complete ED note (12 sections)
print(result['guardrail_safe']) # Safety validation
print(result['fhir_bundle']) # FHIR R4 bundle
scribe.wipe() # HIPAA cleanupfrom goatnote_scribe import GOATScribe, FHIRExporter
scribe = GOATScribe()
exporter = FHIRExporter()
# Generate note
result = scribe("Patient encounter text...")
# Upload to GCP Healthcare API
response = exporter.upload_bundle(result['fhir_bundle'])
print(f"Uploaded: {response['id']}")# Required
NGC_API_KEY=nvapi-YOUR-KEY-HERE
# Optional (defaults shown)
NIM_URL=https://integrate.api.nvidia.com/v1
MODEL_NAME=nvidia/llama-3.1-nemotron-nano-8b-v1
TEMPERATURE=0.1
MAX_TOKENS=512
# GCP FHIR (optional)
GCP_PROJECT_ID=your-project-id
GCP_LOCATION=us-central1
GCP_DATASET=scribe-dataset
GCP_FHIR_STORE=scribe-storefrom goatnote_scribe import GOATScribe, ScribeConfig
config = ScribeConfig(
nim_api_key="nvapi-...",
model_name="nvidia/nemotron-nano-9b-v2",
temperature=0.1,
max_tokens=512,
enable_reasoning=True # Use /think for audit trails
)
scribe = GOATScribe(config=config)Clinical Text
↓
PHI Detection (Presidio)
↓
De-identified Text
↓
Draft Generation (/think enabled)
↓
Self-Critique Pass
↓
Final SOAP Note
↓
FHIR R4 Bundle
↓
GCP Healthcare API (optional)
Model Choice: Nemotron Nano 9B v2
- 6x throughput vs Nemotron Super 49B
- 128K context vs 32K (full encounter support)
- Toggleable reasoning for HIPAA audit transparency
- Well-documented API, proven production use
PHI Detection: Microsoft Presidio
- 18 HIPAA identifier detection
- Anonymization before model inference
- No PHI sent to API endpoints
Memory Safety
- Zero-residue GPU cleanup via
wipe() - Ephemeral processing (no disk persistence)
- CUDA cache clearing after each session
✅ 18 HIPAA Identifiers Detected:
- Names
- Geographic subdivisions (ZIP, city, state)
- Dates (birth, admission, discharge, death)
- Phone numbers
- Fax numbers
- Email addresses
- Social Security numbers
- Medical record numbers
- Health plan numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers
- Device identifiers
- URLs
- IP addresses
- Biometric identifiers
- Full-face photos
- Any unique identifying numbers
- Toggleable Reasoning:
/thinktoken preserves model's reasoning process - Redaction Map: Returns
[(entity_type, start, end)]for each PHI detection - FHIR Bundles: Structured, auditable output format
- Memory Wiping: Zero-residue cleanup via
wipe()method
# Always wipe after processing
try:
result = scribe(clinical_text)
process_result(result)
finally:
scribe.wipe() # Ensures cleanup even if error occurs| Model | Tokens/sec | First Token | Context |
|---|---|---|---|
| Nemotron Super 49B | ~250 | 150ms | 32K |
| Nemotron Nano 9B v2 | ~1500 | <50ms | 128K |
6x throughput improvement enables real-time clinical documentation
- Draft Generation: <2 seconds
- Self-Critique: <2 seconds
- Total Pipeline: <5 seconds (including PHI detection)
# Install dev dependencies
pip install -e ".[dev]"
# Optional: NVIDIA healthcare add-ons (requires https://pypi.nvidia.com)
pip install --upgrade -r requirements-nvidia-healthcare.txt \
--extra-index-url https://pypi.nvidia.com
# Run tests
pytest
# With coverage
pytest --cov=src/goatnote_scribe --cov-report=html# Linting
ruff check src/
black --check src/
# Type checking
mypy src/# Install pre-commit
pip install pre-commit
# Setup hooks
pre-commit install
# Run manually
pre-commit run --all-filesclass GOATScribe:
def __init__(self, config: Optional[ScribeConfig] = None)
def __call__(
self,
prompt: str,
audio: Optional[bytes] = None,
patient_id: str = "anon-001"
) -> Dict[str, any]
def wipe(self) -> Noneclass FHIRExporter:
def __init__(self, config: Optional[ScribeConfig] = None)
def upload_bundle(self, bundle: Dict) -> Dict
def get_patient(self, patient_id: str) -> Dict
def search_documents(
self,
patient_id: Optional[str] = None,
limit: int = 10
) -> Dict- ✅ Nemotron Nano 9B v2 integration
- ✅ Emergency Medicine guardrails (vitals, meds, protocols)
- ✅ HIPAA-compliant PHI detection (18 identifiers)
- ✅ FHIR R4 export to GCP Healthcare API
- ✅ One-command deployment (< 5 min)
- ✅ H100 profiling with NCU/Nsight
- ✅ Gradio UI and CLI
- NCU CI/CD integration (performance regression detection)
- Comprehensive test suite (pytest + guardrails)
- Example notebooks (chest pain, stroke, trauma)
- Audio transcription (Whisper large-v3-turbo)
- LoRA fine-tuning for ED specialty (MIMIC-IV-ED)
- Batch processing API
- Multi-language PHI detection (Spanish, Chinese)
- FHIR R5 support
- Clara integration for medical imaging
See CONTRIBUTING.md for development guidelines.
- Security vulnerabilities: Email b@thegoatnote.com (do not open public issues)
- Bugs: Open an issue with reproduction steps
- Feature requests: Open an issue with use case description
Apache License 2.0 - see LICENSE for details.
@software{goat_scribe_2025,
title = {GOAT Scribe: H100-Optimized HIPAA-Compliant Medical Scribe},
author = {Dent, Brandon},
year = {2025},
url = {https://github.com/GOATnote-Inc/scribegoat},
note = {NVIDIA Nemotron Nano 9B v2}
}Email: b@thegoatnote.com
Organization: GOATnote Inc.
Repository: https://github.com/GOATnote-Inc/scribegoat
- NVIDIA: Nemotron Nano 9B v2 model and NIM platform
- Microsoft: Presidio PHI detection framework
- Google Cloud: Healthcare API and FHIR infrastructure
- HL7: FHIR R4 healthcare interoperability standards