T-MITS

This repository is the official implementation for Multivariable Serum Creatinine Forecasting for Acute Kidney Injury Detection Using an Explainable Transformer-based Model, by Cyprien Gille, Galaad Altares, Benjamin Colette, Karim Zouaoui Boudjeltia, Matei Mancas and Virginie Vandenbulcke (EMBC 2025).

Reference

If the code in this repository has been useful to you, please cite the original article using the Cite this repository button (Located in the top right of the GitHub page, above Releases).

You can also cite the article directly using the reference below.

@inproceedings{gilleMultivariableSerumCreatinine2025,
  title = {Multivariable {{Serum Creatinine Forecasting}} for {{Acute Kidney Injury Detection Using}} an {{Explainable Transformer-based Model}}},
  booktitle = {2025 47th {{Annual International Conference}} of the {{IEEE Engineering}} in {{Medicine}} and {{Biology Society}} ({{EMBC}})},
  author = {Gille, Cyprien and Altares, Galaad and Colette, Benjamin and Boudjeltia, Karim Zouaoui and Mancas, Matei and Vandenbulcke, Virginie},
  year = 2025,
  month = jul,
  pages = {1--7},
  issn = {2694-0604},
  doi = {10.1109/EMBC58623.2025.11251723},
  keywords = {Accuracy,Forecasting,Injuries,Kidney,Mortality,Predictive models,Prognostics and health management,Time series analysis,Transformers,Usability}
}

Repository Contents

All relevant files should have a docstring at the top and extensive comments to tell you what they do, but you can find an overview of the contents of this repository below.

File	Description	Executable?
culling_reg.py	Task-aware postprocessing script that removes unusable measures and stays after preprocessing	Yes
eval_kfold_TMITS.py	Computes metrics for a trained T-MITS model	Yes
kfold_TMITS.py	Trains a T-MITS model with cross-validation (optional)	Yes
preprocess_eicu.py	Task-agnostic eICU preprocessing	Yes
preprocess_mimic.py	Task-agnostic MIMIC-IV preprocessing using pre-selected variables	Yes
.gitignore	Allows for non-tracking of generated files (namely)	No
config.py	Dataclasses controlling the configuration of the main scripts	No
pyproject.toml	Description of the python project and its dependencies (see Installing section below)	No
uv.lock	uv lockfile indicating a working set of packages for this repository, for reproducibility	No
dataset_classes/dataset_base.py	Base ICU torch Dataset class	No
dataset_classes/dataset_regression.py	ICU torch Dataset class intended for regression	No
models/attention.py	Basic attention torch module	No
models/loss.py	Quantile Loss torch module	No
models/tmits.py	T-MITS torch module	No
models/transformer.py	Transformer wrapper torch module	No
models/UD.py	Up-dimensional embedding torch module	No
utility_functions/eval_utils.py	Utility functions used for evaluation	No
utility_functions/preprocessing_utils.py	Utility functions used for preprocessing (includes the pre-selected variables for preprocessing)	No
utility_functions/utils.py	Various utility functions	No
results/*	Trained model checkpoints and evaluation metrics	No

Installing

All dependencies for this project are specified in the pyproject.toml file, following PEP 621.

With uv (recommended)

You can install dependencies using the uv python package manager, a modern replacement for all of conda's features (and more). To do so, simply install uv and run the following command in this repository:

uv sync

This will create a virtual environment for this project and install its dependencies. This is the recommended and maintained way to setup this repository.

With pip

Since pyproject.toml is a standard format, you can also install this project by simply running the following command. We would advise you to do so in a virtual environment. Note that this method is not recommended because it might not install the best version of pytorch for your hardware (because pytorch has its own indexes), among other things.

pip install .

Pipeline

1. Data Preprocessing

Datasets and directory structure

All results presented in our paper were obtained on either the MIMIC-IV dataset or the eICU-CRD dataset, which can both be obtained freely after following a quick training. Instructions can be found at the bottom of the two previous links.

Once obtained, the datasets should be placed in the same root directory as this repository, as such :

<top-directory>/
├── mimic-iv-2.2/
│   ├── icu/
│   ├── hosp/
│   ├── ...
├── eicu-crd-2.0/
│   ├── lab.csv
│   ├── ...
├── T-MITS/
│   ├── dataset_classes/
│   ├── ...

If you wish to place them elsewhere, you will have to modify the paths at the start of the preprocessing scripts.

Files to decompress

If you don't want to decompress every file in each dataset to save space, you can just decompress the following files:

In the mimic-iv-2.2/ directory:
- hosp/admissions.csv
- hosp/patients.csv
- icu/chartevents.csv
In the eicu-crd-2.0/ directory:
- lab.csv
- vitalPeriodic.csv
- patient.csv

Running preprocessing scripts

All preprocessing scripts produce two .csv files: the dataset and a key mapping the original variable labels (such as "Heart Rate") to their reindexed integer ids. This is mainly used to tell scripts (culling, training) which variable interests you without having to know its internal id.

To get the bottom-up (29 variables) or top-down (206 variables) versions of the processed MIMIC-IV dataset, set the top_down flag at the top of preprocess_mimic.py and run it:

uv run preprocess_mimic.py

To get the processed eICU-CRD dataset, run:

uv run preprocess_eicu.py

2. Data post-processing (culling)

Preprocessing is task-agnostic, so you still need to remove from the preprocessed dataset all stays and measures that are unusable for your task. For example, this includes stays with no regression target.

The culling script is also used to create cohorts based on several criteria, such as the first value of the target variable in a stay, which variable should be maskable during training without creating empty stays, or the maximum length of a stay. Note that by default, cohort-defining parameters that are changed from their defaults will be added to the output filename for clarity (see attr_in_paths in CullingConfig in config.py).

To cull a dataset, adjust the CullingConfig at the top of culling_reg.py and run it.

uv run culling_reg.py

This will produce a ready-to-use .csv. It can also split each stay into separate .csv files (can be faster than filtering the dataset during training), and save the classification label of each stay (useful for stratified splitting) in a .json dictionnary.

3. Training

As always, adjust the TrainingConfig at the top of the script and run it:

uv run kfold_TMITS.py

This will produce:

A record of the training config (.json)
A logfile of the training process (.log)

For each cross-validation fold, this will produce:

Best model checkpoints (.pth)
A record of the training and testing indices as numpy arrays (.npy)

4. Evaluating

As always, adjust the EvalConfig at the top of the script and run it:

uv run eval_kfold_TMITS.py

This will reuse the saved training config to evaluate the trained model.

If all config booleans are set to True, this will produce (for the train set and the test set):

Regression and classification metrics (.csv),
Normalized confusion matrices,
Ground truth and Predicted value arrays, both for regression and for classification (.npy) (aligned with the stay indexes saved during training right after splitting),
An .xlsx sheet with columns for the stay id, the true and predicted values, and the true and predicted classes.

Decoupled Evaluation Data

You can also call this script's main function with a path to a culled cohort as the data_override argument: this allows for the evaluation of a model on a different cohort than the one it was trained on (for example, train on stays that started in stage 0, and evaluate on stays that ended in stages 1 2 or 3). Note however that if none of the test indexes used during the training process are in the overriding cohort, the script will raise an error.

Using Trained Model Checkpoints

To promote ease of use and reproducibility, we provide the full outputs of our training and evaluation pipelines in the results folder. As such, you will find for each experiment (i.e. in each subfolder) the following files:

T_MITS_[X].pth files: best model checkpoint from training fold X.
T_MITS_config.json: serialized configuration object used for training this experiment.
T_MITS_[train/test]_idx_[X].npy files: stay indices used for the train/test split for fold X, saved in binary using numpy.save.
confusion_test.png: Confusion matrix on the test set of the first fold.
metrics_test.csv: Evaluation metrics on the test sets of all folds.
true_pred_values_test_[X].xlsx files: Ground truths and predicted values and classes for each stay in the test set of fold X.
arrays/: Numpy arrays of all values (ground truths and predicted) for the test sets of all folds.
arrays_classif/: Numpy arrays of all classes (ground truths and predicted) for the test sets of all folds.

Note: the order of the stays (as dictated by the indexes (idx files) and as reported in the .xlsx files) is consistent throughout all numpy arrays.

License

The code in this repository is available under a GPLv3 license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

T-MITS

Reference

Repository Contents

Installing

With uv (recommended)

With pip

Pipeline

1. Data Preprocessing

Datasets and directory structure

Files to decompress

Running preprocessing scripts

2. Data post-processing (culling)

3. Training

4. Evaluating

Decoupled Evaluation Data

Using Trained Model Checkpoints

License

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
dataset_classes		dataset_classes
images		images
models		models
results		results
utility_functions		utility_functions
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE.md		LICENSE.md
README.md		README.md
config.py		config.py
culling_reg.py		culling_reg.py
eval_kfold_TMITS.py		eval_kfold_TMITS.py
kfold_TMITS.py		kfold_TMITS.py
preprocess_eicu.py		preprocess_eicu.py
preprocess_mimic.py		preprocess_mimic.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

numediart/T-MITS

Folders and files

Latest commit

History

Repository files navigation

T-MITS

Reference

Repository Contents

Installing

With uv (recommended)

With pip

Pipeline

1. Data Preprocessing

Datasets and directory structure

Files to decompress

Running preprocessing scripts

2. Data post-processing (culling)

3. Training

4. Evaluating

Decoupled Evaluation Data

Using Trained Model Checkpoints

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages