ECGdashboard

Hello and welcome to the repository for my Data Engineering project during my fellowship at Insight Data Science in New York 2018.

Business Problem

Aims

Reduce staffing costs by providing a centralized monitoring dashboard to view multiple patient's ECG signals at once.
Index and store raw physiological signals for downstream analysis.

Description

Hospitals collect and process a lot of signals and are now moving towards storing them. My project aims to process and display ECG signals in near real time from patients. These ECG signals are processed every minute to analyze the beat-to-beat heart rate (HR) of each patient in the hospital. These signals are also indexed and stored for use as inputs when developing machine learning models to predict adverse physiological events to improve patient care.

Solution

My pipeline loads in ECG timeseries data from an S3 bucket which contains separate files for each patient. The files are ingested line by line simulating sampling of the ECG signals by my kafka brokers. The brokers produce messages into a topic which is subscribed to by two spark streaming clusters. The first spark streaming cluster has a 2-second mini-batch interval which groups the signals from each patient and saves them to my PostgreSQL database. The second cluster has a 60-second mini-batch interval to measure the number of beats over the batch period to calculate heart rate per minute over time and save them to my database. Botht the time series ECG samples and the calculated heart rate's are displayed on my dash front end.

Front-End Views

Architecture

Installation

Requirements

peg

Setup

To setup clusters run setup/setup_cluster.sh.
To setup database follow setup/db_setup.txt.
To setup the airflow job follow setup/run_airflow.txt
To start the pipeline follow the instructions in src/README.md

Testing

Unittests can be run using the run_unittests.sh file in the ./test folder. Results are stored in ./test/results.txt.

Cluster Configuration

four m4.large EC2 instances for Kafka producers
four m4.large EC2 instances for Spark Streaming (2s mini-batch) which writes to postgres database
four m4.large EC2 instances for Spark Streaming (60s mini-batch) which calculates heart rate over 60s period
one m4.large EC2 instance for TimescaleDB/PostgreSQL database
one t2.micro EC2 instance for Dash app

Name		Name	Last commit message	Last commit date
Latest commit History 325 Commits
docs		docs
setup		setup
src		src
test		test
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECGdashboard

Table of Contents

Business Problem

Aims

Description

Solution

Front-End Views

Architecture

Installation

Requirements

Setup

Testing

Cluster Configuration

Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ECGdashboard

Table of Contents

Business Problem

Aims

Description

Solution

Front-End Views

Architecture

Installation

Requirements

Setup

Testing

Cluster Configuration

Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages