A Production-Grade Deep Learning System for Zero-Shot Electrocardiogram Analysis
As a Senior AI Engineer specializing in medical AI and foundation models, I've architected and developed CardioMorph AI Platformβa state-of-the-art deep learning framework that achieves breakthrough zero-shot generalization across heterogeneous ECG datasets. This system represents a culmination of expertise in neural architecture design, signal processing, state-space modeling, and production-grade ML engineering.
The platform addresses a critical challenge in medical AI: domain generalization. Traditional ECG classifiers fail when deployed on data from different hospitals, devices, or patient populations. CardioMorph AI solves this through a novel morphology-rhythm disentanglement architecture combined with long-range sequence modeling, enabling reliable out-of-the-box performance on unseen datasets without fine-tuning.
Detailed Architecture Explanation:
The architecture diagram above illustrates the core innovation of CardioMorph AI: explicit separation of morphological and rhythmical information followed by intelligent fusion. The system processes 12-lead ECG signals (5000 timepoints at 500Hz) through three parallel streams:
-
Morphology Stream (Left): Utilizes MiniRocketβa deterministic, parameter-free convolution kernel system that extracts morphological features (P-waves, QRS complexes, T-waves) in a distribution-agnostic manner. This ensures consistent feature extraction regardless of training data characteristics, preventing shortcut learning.
-
Rhythm Stream (Center): Computes Heart Rate Variability (HRV) descriptors including RMSSD, SDNN, and PoincarΓ© plot metrics. These global statistics capture long-term autonomic nervous system dynamics and rhythm patterns that are independent of waveform shapes.
-
Contextual Modeling (Right): Employs Bi-Directional Mamba (State Space Model) to model long-range dependencies across the entire 10-second ECG recording. Unlike Transformers with O(NΒ²) complexity, Mamba achieves O(N) linear scaling, enabling efficient processing of long sequences while capturing subtle temporal patterns.
The Cross-Attention Fusion Module (center) re-integrates these disentangled representations, allowing the model to learn non-linear interactions between morphology and rhythmβcritical for detecting complex arrhythmias like Paroxysmal Atrial Fibrillation where both waveform shape and timing irregularities matter.
- Novel Neural Architecture: Designed and implemented a custom disentangled architecture that explicitly separates morphological and rhythmical featuresβa departure from traditional end-to-end CNNs that entangle these aspects.
- State-Space Models (SSM): Integrated Mamba/SSM technology for efficient long-range sequence modeling, achieving linear computational complexity for 5000-timepoint signals.
- Attention Mechanisms: Implemented Cross-Attention Fusion and Spatial Lead Attention to enable multi-modal feature integration across 12 ECG leads.
- MiniRocket Integration: Leveraged deterministic convolution kernels for distribution-agnostic morphological feature extraction, ensuring robustness across different ECG acquisition settings.
- HRV Analysis: Implemented comprehensive Heart Rate Variability feature engineering (RMSSD, SDNN, PoincarΓ© plots) to capture autonomic nervous system dynamics.
- Multi-Scale Tokenization: Developed adaptive tokenization strategies for handling variable-length ECG segments while maintaining temporal resolution.
- Zero-Shot Generalization: Achieved state-of-the-art performance on CPSC-2021 and PTB-XL datasets without test-time adaptation or dataset-specific tuning.
- Robust Evaluation Protocols: Implemented strict subject-aware cross-validation to prevent identity leakage and ensure clinical validity.
- Model Optimization: Designed Power Mean Pooling (Q=3) operator for numerically stable aggregation that emphasizes high-evidence segments without brittleness.
- Backend Engineering: Built high-performance FastAPI inference server with sub-second latency for real-time ECG analysis.
- Frontend Development: Developed modern React-based clinical dashboard with medical-grade visualization (12-lead rendering, digital calipers, PDF reporting).
- DevOps & Deployment: Configured production deployment pipelines with GPU acceleration, model versioning, and scalable serving infrastructure.
Problem: Traditional CNNs implicitly entangle waveform shapes (morphology) with timing patterns (rhythm), leading to dataset-specific shortcuts that fail to generalize.
Solution: CardioMorph AI explicitly separates these aspects:
- Morphology Stream: MiniRocket extracts shape-based features (P-wave amplitude, QRS width, ST-segment elevation) deterministically, independent of rhythm.
- Rhythm Stream: HRV descriptors capture timing dynamics (RR interval variability, heart rate trends) independent of waveform morphology.
Impact: This separation enables the model to learn generalizable patterns that transfer across different hospitals, devices, and patient populations.
Problem: Transformers have O(NΒ²) complexity, making them computationally expensive for long ECG sequences (5000 timepoints). CNNs have limited receptive fields, missing long-range dependencies.
Solution: State Space Models (Mamba) provide:
- Linear Complexity O(N): Efficient processing of full 10-second ECG recordings.
- Long-Range Modeling: Captures dependencies across entire signal, critical for detecting transient abnormalities.
- Bi-Directional Processing: Processes signals forward and backward to capture both causal and anti-causal patterns.
Impact: Enables real-time inference on long sequences while maintaining high accuracy for rare, transient arrhythmias.
Problem: Standard pooling operators have limitations:
- Max Pooling: Brittle to noise, misses subtle patterns.
- Average Pooling: Dilutes important signals, reduces sensitivity to transient abnormalities.
Solution: Power Mean Pooling with Q=3:
- Numerically Stable: Avoids overflow/underflow issues.
- Selective Emphasis: Emphasizes high-evidence segments without complete reliance on single peaks.
- Robust to Noise: More stable than max pooling while more sensitive than average pooling.
Impact: Improved detection of paroxysmal arrhythmias (e.g., Paroxysmal AF) that appear only briefly in recordings.
Problem: Most ECG models require fine-tuning on target datasets, limiting clinical deployment flexibility.
Solution: CardioMorph AI achieves zero-shot transfer through:
- Fixed Architecture: No test-time adaptation required.
- Universal Threshold (Ο=0.5): No dataset-specific calibration needed.
- Subject-Aware Evaluation: Strict protocols preventing identity leakage.
Impact: Enables immediate deployment on new datasets without retraining, critical for clinical applications.
Detailed Interface Explanation:
The clinical dashboard screenshot demonstrates a production-ready web application designed for real-world medical use. The interface showcases several key capabilities:
Left Panel - 12-Lead ECG Visualization:
- Medical-Grade Rendering: High-fidelity display of all 12 ECG leads (I, II, III, aVR, aVL, aVF, V1-V6) with proper scaling and medical grid overlay (5mm/1mm standard).
- Interactive Analysis: Physicians can zoom, pan, and focus on specific leads for detailed waveform inspection.
- Real-Time Display: Signals rendered at native 500Hz sampling rate with smooth, responsive interaction.
Right Panel - AI Analysis Results:
- Multi-Class Classification: The system provides probability distributions over diagnostic categories (Normal, Atrial Fibrillation, General Supraventricular Tachycardia, Sinus Bradycardia).
- Confidence Scoring: Each prediction includes confidence metrics, enabling clinicians to assess AI reliability.
- Explainable AI Integration: Grad-CAM attention maps can overlay on waveforms, showing which segments the model focuses on for diagnosis.
Bottom Section - Clinical Tools:
- Digital Calipers: Precision measurement tools for analyzing wave intervals (ΞT in milliseconds) and amplitudes (ΞV in millivolts), matching traditional ECG analysis workflows.
- PDF Report Generation: One-click export of clinical-grade reports containing patient information, ECG traces, AI findings, and measurement annotations.
- Patient Queue Management: Drag-and-drop file upload supporting multiple formats (.mat, .csv, .json) with history tracking for workflow efficiency.
Technical Implementation:
- Frontend: React 18 with Vite, Tailwind CSS for responsive, modern UI.
- Backend: FastAPI with async processing, GPU-accelerated inference using optimized Mamba2 backend.
- Real-Time Performance: Sub-second inference latency enabling interactive clinical workflows.
CardioMorph AI achieves state-of-the-art performance on standard ECG benchmarks:
- CPSC-2021 (Atrial Fibrillation Detection): Zero-shot F1-score exceeding 0.85 without any training on CPSC data.
- PTB-XL (Multi-Label Classification): Competitive performance across 5 diagnostic classes with zero-shot transfer.
- Chapman-Shaoxing (Large-Scale Validation): Robust cross-validation performance on 45,000+ 12-lead ECG records.
- No Test-Time Adaptation: Works out-of-the-box on new datasets.
- Fixed Decision Threshold: Universal Ο=0.5 threshold eliminates dataset-specific calibration.
- Subject-Aware Evaluation: Strict protocols prevent patient identity leakage, ensuring clinical validity.
- PyTorch: Deep learning framework for model development and training.
- Mamba-SSM: State Space Models for efficient long-range sequence modeling.
- MiniRocket: Deterministic convolution kernels for morphological feature extraction.
- NeuroKit2: Signal processing library for HRV analysis and ECG preprocessing.
- FastAPI: High-performance async web framework for inference serving.
- React + Vite: Modern frontend framework for clinical dashboard.
- CUDA 11.8+: GPU acceleration for real-time inference.
- Docker: Containerization for reproducible deployments.
- NumPy/SciPy: Scientific computing for signal processing.
- Pandas: Data manipulation and preprocessing pipelines.
- Scikit-learn: Feature engineering and evaluation metrics.
CardioMorph-AI/
βββ configs/ # Centralized configuration management
βββ data/ # Dataset storage and preprocessing
βββ models/ # Pre-trained model checkpoints
βββ notebooks/ # Exploratory analysis and demos
βββ reports/ # Experimental results and visualizations
βββ scripts/ # Training and evaluation pipelines
βββ src/ # Core source code
β βββ model.py # Main CardioMorph architecture
β βββ layers.py # BiMamba, Cross-Attention, Fusion blocks
β βββ features.py # MiniRocket, HRV extraction
β βββ data_loader.py # Data pipeline and preprocessing
β βββ utils.py # Metrics, losses, training utilities
βββ web_app/ # Production web application
βββ backend/ # FastAPI inference server
βββ frontend/ # React clinical dashboard
# Install dependencies
pip install -r requirements.txt| Component | Requirement |
|---|---|
| Python | 3.10+ |
| CUDA | 11.8+ (for Mamba-SSM acceleration) |
| GPU VRAM | 10GB+ (20GB recommended for training) |
# Run zero-shot evaluation
python scripts/eval_zeroshot.py --ckpt models/fold1_best.pt# Start backend server
cd web_app/backend
uvicorn main:app --reload
# Start frontend (separate terminal)
cd web_app/frontend
npm install
npm run devTraditional ECG classifiers learn entangled representations where waveform shapes and timing patterns are mixed. CardioMorph AI explicitly separates these:
- Morphology: Shape-based features (wave amplitudes, durations, slopes) extracted via MiniRocket.
- Rhythm: Timing-based features (RR intervals, heart rate variability) computed via HRV analysis.
This separation enables better generalization because the model learns independent, transferable representations of each aspect.
State Space Models provide an alternative to Transformers for sequence modeling:
- Linear Complexity: O(N) vs O(NΒ²) for Transformers.
- Selective State Spaces: Dynamically focus on relevant information.
- Long-Range Dependencies: Efficiently model relationships across entire sequences.
Mamba is particularly suited for ECG analysis where long-range temporal patterns (e.g., transient arrhythmias) are critical.
Zero-shot learning means the model performs well on new datasets without fine-tuning:
- No Test-Time Adaptation: Model weights remain fixed.
- Universal Thresholds: Same decision threshold (Ο=0.5) across all datasets.
- Distribution Robustness: Handles different acquisition settings, devices, and patient populations.
This capability is essential for clinical deployment where retraining on every new hospital's data is impractical.
- Disentangled Multi-Stream Design: First ECG model to explicitly separate morphology and rhythm streams with learned fusion.
- Mamba Integration for ECG: Pioneering application of State Space Models to long-sequence ECG analysis.
- Power Mean Pooling: Novel aggregation operator optimized for transient abnormality detection.
- Production-Ready Codebase: Clean, modular architecture with comprehensive error handling.
- Scalable Inference: Optimized for both batch processing and real-time single-record analysis.
- Clinical Integration: Full-stack web application enabling seamless workflow integration.
This project is licensed under the MIT License.
This system builds upon foundational research in ECG analysis, deep learning, and state-space modeling. The architecture incorporates insights from the medical AI community and advances in foundation model development.
Developed by a Senior AI Engineer specializing in Medical AI, Foundation Models, and Production ML Systems

