Project: Build an application with Big Data Analytics in Healthcare

Topic details:

Big data analytics for IoT-based on

A tool generating data streaming like many electricity/water meters
Data collection: using Kafka
Data analytics: (1) filtering, sampling, integration, (2) your analytics goals – using Spark
Visualization: using Tableau/Kibana
Functions and performance evaluation

Dataset

MIMIC-IV data set, more details in ./docs/Dataset_Notes.md

Project Structure

.
|-- configs
|-- data
|   |-- mimic-iv-3.1
|   |   |-- hosp
|   |   `-- icu
|   `-- mimic-iv-ed-2.2
|       `-- ed
|-- docker
|   |-- base
|   |-- kafka
|   `-- spark
|-- docs
|-- kafka
|-- KafkaProducer
|-- misc
|-- scripts
|-- spark
`-- Tableau

Guide for Developers

Dev environment

It is recommended to run and develop the project within an Ubuntu environment using Docker. The current setup already covers the initialization of a Docker Ubuntu container that will serve as the primary dev environment. Below is a simple guide to set up the dev environment.

1. Build environment (containers)

For the first run, in the main project repo, run:

docker compose up -d --build

2. Access the container

docker exec -it <container_name> /bin/bash

Replace <container_name> with the actual container name.
You can change the container name in ./docker-compose.yml.
By default, it is skibidi.

3. Python virtual environment & dependencies

Python 3.12.x is available in the container (python3, python3-venv).
Create a virtual environment:

python3 -m venv .venv

Activate it:

source ./.venv/bin/activate

Install dependencies:

pip install -r ./docker/base/requirements.txt

Update the requirements file when needed:

pip freeze > ./docker/base/requirements.txt

Data source

The data used in this project is MIMIC-IV v3.1, specifically the modules: hosp, icu, and ed. The data itself require credentialed access.
Once you get access to the data. There is 2 ways to feed data to the project

1. Local data storage (not recommended)

The first and simplest way is to download and extract the dataset into your machine. Placing them under the ./data directory. However, the MIMIC-IV dataset is huge. (Fully extracted size can reach up to ~100 GBs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project: Build an application with Big Data Analytics in Healthcare

Topic details:

Dataset

Project Structure

Guide for Developers

Dev environment

1. Build environment (containers)

2. Access the container

3. Python virtual environment & dependencies

Data source

1. Local data storage (not recommended)

2. Google BigQuery

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
KafkaProducer		KafkaProducer
Tableau		Tableau
configs/data_schema		configs/data_schema
docker		docker
docs		docs
misc		misc
scripts		scripts
spark		spark
streamlit		streamlit
streamlit_mock		streamlit_mock
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

KoderuNoKo/BigData_Healthcare

Folders and files

Latest commit

History

Repository files navigation

Project: Build an application with Big Data Analytics in Healthcare

Topic details:

Dataset

Project Structure

Guide for Developers

Dev environment

1. Build environment (containers)

2. Access the container

3. Python virtual environment & dependencies

Data source

1. Local data storage (not recommended)

2. Google BigQuery

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages