Skip to content

Official repository for the Transformer for Multivariate Irregular Time Series (T-MITS).

License

Notifications You must be signed in to change notification settings

numediart/T-MITS

 
 

Repository files navigation

T-MITS

This repository is the official implementation for Multivariable Serum Creatinine Forecasting for Acute Kidney Injury Detection Using an Explainable Transformer-based Model, by Cyprien Gille, Galaad Altares, Benjamin Colette, Karim Zouaoui Boudjeltia, Matei Mancas and Virginie Vandenbulcke (EMBC 2025).

Diagram of the T-MITS architecture

Reference

If the code in this repository has been useful to you, please cite the original article using the Cite this repository button (Located in the top right of the GitHub page, above Releases).

You can also cite the article directly using the reference below.

@inproceedings{gilleMultivariableSerumCreatinine2025,
  title = {Multivariable {{Serum Creatinine Forecasting}} for {{Acute Kidney Injury Detection Using}} an {{Explainable Transformer-based Model}}},
  booktitle = {2025 47th {{Annual International Conference}} of the {{IEEE Engineering}} in {{Medicine}} and {{Biology Society}} ({{EMBC}})},
  author = {Gille, Cyprien and Altares, Galaad and Colette, Benjamin and Boudjeltia, Karim Zouaoui and Mancas, Matei and Vandenbulcke, Virginie},
  year = 2025,
  month = jul,
  pages = {1--7},
  issn = {2694-0604},
  doi = {10.1109/EMBC58623.2025.11251723},
  keywords = {Accuracy,Forecasting,Injuries,Kidney,Mortality,Predictive models,Prognostics and health management,Time series analysis,Transformers,Usability}
}

Repository Contents

All relevant files should have a docstring at the top and extensive comments to tell you what they do, but you can find an overview of the contents of this repository below.

File Description Executable?
culling_reg.py Task-aware postprocessing script that removes unusable measures and stays after preprocessing Yes
eval_kfold_TMITS.py Computes metrics for a trained T-MITS model Yes
kfold_TMITS.py Trains a T-MITS model with cross-validation (optional) Yes
preprocess_eicu.py Task-agnostic eICU preprocessing Yes
preprocess_mimic.py Task-agnostic MIMIC-IV preprocessing using pre-selected variables Yes
.gitignore Allows for non-tracking of generated files (namely) No
config.py Dataclasses controlling the configuration of the main scripts No
pyproject.toml Description of the python project and its dependencies (see Installing section below) No
uv.lock uv lockfile indicating a working set of packages for this repository, for reproducibility No
dataset_classes/dataset_base.py Base ICU torch Dataset class No
dataset_classes/dataset_regression.py ICU torch Dataset class intended for regression No
models/attention.py Basic attention torch module No
models/loss.py Quantile Loss torch module No
models/tmits.py T-MITS torch module No
models/transformer.py Transformer wrapper torch module No
models/UD.py Up-dimensional embedding torch module No
utility_functions/eval_utils.py Utility functions used for evaluation No
utility_functions/preprocessing_utils.py Utility functions used for preprocessing (includes the pre-selected variables for preprocessing) No
utility_functions/utils.py Various utility functions No
results/* Trained model checkpoints and evaluation metrics No

Installing

All dependencies for this project are specified in the pyproject.toml file, following PEP 621.

With uv (recommended)

You can install dependencies using the uv python package manager, a modern replacement for all of conda's features (and more). To do so, simply install uv and run the following command in this repository:

uv sync

This will create a virtual environment for this project and install its dependencies. This is the recommended and maintained way to setup this repository.

With pip

Since pyproject.toml is a standard format, you can also install this project by simply running the following command. We would advise you to do so in a virtual environment. Note that this method is not recommended because it might not install the best version of pytorch for your hardware (because pytorch has its own indexes), among other things.

pip install .

Pipeline

1. Data Preprocessing

Datasets and directory structure

All results presented in our paper were obtained on either the MIMIC-IV dataset or the eICU-CRD dataset, which can both be obtained freely after following a quick training. Instructions can be found at the bottom of the two previous links.

Once obtained, the datasets should be placed in the same root directory as this repository, as such :

<top-directory>/
├── mimic-iv-2.2/
│   ├── icu/
│   ├── hosp/
│   ├── ...
├── eicu-crd-2.0/
│   ├── lab.csv
│   ├── ...
├── T-MITS/
│   ├── dataset_classes/
│   ├── ...

If you wish to place them elsewhere, you will have to modify the paths at the start of the preprocessing scripts.

Files to decompress

If you don't want to decompress every file in each dataset to save space, you can just decompress the following files:

  • In the mimic-iv-2.2/ directory:
    • hosp/admissions.csv
    • hosp/patients.csv
    • icu/chartevents.csv
  • In the eicu-crd-2.0/ directory:
    • lab.csv
    • vitalPeriodic.csv
    • patient.csv

Running preprocessing scripts

All preprocessing scripts produce two .csv files: the dataset and a key mapping the original variable labels (such as "Heart Rate") to their reindexed integer ids. This is mainly used to tell scripts (culling, training) which variable interests you without having to know its internal id.

To get the bottom-up (29 variables) or top-down (206 variables) versions of the processed MIMIC-IV dataset, set the top_down flag at the top of preprocess_mimic.py and run it:

uv run preprocess_mimic.py

To get the processed eICU-CRD dataset, run:

uv run preprocess_eicu.py

2. Data post-processing (culling)

Preprocessing is task-agnostic, so you still need to remove from the preprocessed dataset all stays and measures that are unusable for your task. For example, this includes stays with no regression target.

The culling script is also used to create cohorts based on several criteria, such as the first value of the target variable in a stay, which variable should be maskable during training without creating empty stays, or the maximum length of a stay. Note that by default, cohort-defining parameters that are changed from their defaults will be added to the output filename for clarity (see attr_in_paths in CullingConfig in config.py).

To cull a dataset, adjust the CullingConfig at the top of culling_reg.py and run it.

uv run culling_reg.py

This will produce a ready-to-use .csv. It can also split each stay into separate .csv files (can be faster than filtering the dataset during training), and save the classification label of each stay (useful for stratified splitting) in a .json dictionnary.

3. Training

As always, adjust the TrainingConfig at the top of the script and run it:

uv run kfold_TMITS.py

This will produce:

  • A record of the training config (.json)
  • A logfile of the training process (.log)

For each cross-validation fold, this will produce:

  • Best model checkpoints (.pth)
  • A record of the training and testing indices as numpy arrays (.npy)

4. Evaluating

As always, adjust the EvalConfig at the top of the script and run it:

uv run eval_kfold_TMITS.py

This will reuse the saved training config to evaluate the trained model.

If all config booleans are set to True, this will produce (for the train set and the test set):

  • Regression and classification metrics (.csv),
  • Normalized confusion matrices,
  • Ground truth and Predicted value arrays, both for regression and for classification (.npy) (aligned with the stay indexes saved during training right after splitting),
  • An .xlsx sheet with columns for the stay id, the true and predicted values, and the true and predicted classes.

Decoupled Evaluation Data

You can also call this script's main function with a path to a culled cohort as the data_override argument: this allows for the evaluation of a model on a different cohort than the one it was trained on (for example, train on stays that started in stage 0, and evaluate on stays that ended in stages 1 2 or 3). Note however that if none of the test indexes used during the training process are in the overriding cohort, the script will raise an error.

Using Trained Model Checkpoints

To promote ease of use and reproducibility, we provide the full outputs of our training and evaluation pipelines in the results folder. As such, you will find for each experiment (i.e. in each subfolder) the following files:

  • T_MITS_[X].pth files: best model checkpoint from training fold X.
  • T_MITS_config.json: serialized configuration object used for training this experiment.
  • T_MITS_[train/test]_idx_[X].npy files: stay indices used for the train/test split for fold X, saved in binary using numpy.save.
  • confusion_test.png: Confusion matrix on the test set of the first fold.
  • metrics_test.csv: Evaluation metrics on the test sets of all folds.
  • true_pred_values_test_[X].xlsx files: Ground truths and predicted values and classes for each stay in the test set of fold X.
  • arrays/: Numpy arrays of all values (ground truths and predicted) for the test sets of all folds.
  • arrays_classif/: Numpy arrays of all classes (ground truths and predicted) for the test sets of all folds.

Note: the order of the stays (as dictated by the indexes (idx files) and as reported in the .xlsx files) is consistent throughout all numpy arrays.

License

The code in this repository is available under a GPLv3 license.

About

Official repository for the Transformer for Multivariate Irregular Time Series (T-MITS).

Topics

Resources

License

Stars

Watchers

Forks

Languages

  • Python 100.0%