Big data analytics for IoT-based on
- A tool generating data streaming like many electricity/water meters
- Data collection: using Kafka
- Data analytics: (1) filtering, sampling, integration, (2) your analytics goals – using Spark
- Visualization: using Tableau/Kibana
- Functions and performance evaluation
- MIMIC-IV data set, more details in
./docs/Dataset_Notes.md
.
|-- configs
|-- data
| |-- mimic-iv-3.1
| | |-- hosp
| | `-- icu
| `-- mimic-iv-ed-2.2
| `-- ed
|-- docker
| |-- base
| |-- kafka
| `-- spark
|-- docs
|-- kafka
|-- KafkaProducer
|-- misc
|-- scripts
|-- spark
`-- Tableau- It is recommended to run and develop the project within an Ubuntu environment using Docker. The current setup already covers the initialization of a Docker Ubuntu container that will serve as the primary dev environment. Below is a simple guide to set up the dev environment.
For the first run, in the main project repo, run:
docker compose up -d --builddocker exec -it <container_name> /bin/bash- Replace
<container_name>with the actual container name. - You can change the container name in
./docker-compose.yml. - By default, it is
skibidi.
- Python 3.12.x is available in the container (
python3,python3-venv). - Create a virtual environment:
python3 -m venv .venv- Activate it:
source ./.venv/bin/activate- Install dependencies:
pip install -r ./docker/base/requirements.txt- Update the requirements file when needed:
pip freeze > ./docker/base/requirements.txt- The data used in this project is MIMIC-IV v3.1, specifically the modules:
hosp,icu, anded. The data itself require credentialed access. - Once you get access to the data. There is 2 ways to feed data to the project
The first and simplest way is to download and extract the dataset into your machine. Placing them under the ./data directory. However, the MIMIC-IV dataset is huge. (Fully extracted size can reach up to ~100 GBs)