This repository contains the official implementation of CoSTI. CoSTI introduces a novel adaptation of Consistency Models (CMs) to the domain of Multivariate Time Series Imputation (MTSI), achieving significant reductions in inference (-98%) time while maintaining competitive imputation accuracy.
Multivariate Time Series Imputation (MTSI) is a critical task in various domains like healthcare and traffic management, where incomplete data can compromise decision-making. Existing state-of-the-art methods, such as Denoising Diffusion Probabilistic Models (DDPMs), offer high imputation accuracy but suffer from high computational costs. CoSTI leverages Consistency Training to:
- Achieve comparable imputation quality to DDPMs.
- Drastically reduce inference times (up to 98% faster).
- Enable scalability for real-time applications.
For further details, please refer to our paper.
CoSTI combines concepts from Consistency Models and Multivariate Time Series Imputation to construct a framework optimized for speed and accuracy. The method includes:
- Spatio-Temporal Feature Extraction Modules (STFEMs): Extract spatio-temporal dependencies using transformers and Mamba blocks.
- Noise Estimation Modules (NEMs): Adapted for consistency models to predict Gaussian noise efficiently.
- Deterministic Imputation: Ensures robust and reproducible results by aggregating multiple imputations.
The following datasets are used for experiments:
- AQI-36: Air quality dataset with 36 sensors.
- METR-LA and PEMS-BAY: Traffic datasets covering Los Angeles and San Francisco.
- PhysioNet Challenge 2019: Clinical dataset for ICU patient monitoring.
The datasets employed in this study are publicly available and free to use. Specifically:
- The Torch SpatioTemporal library (Cini et al., 2022) provides tools to download and preprocess the AQI-36, METR-LA, and PEMS-BAY datasets. The PhysioNet Challenge 2019 dataset is accessible at https://physionet.org/content/challenge-2019/1.0.0/. Alternatively, we explain how to obtain this dataset in our repository using a script later on.
You can set up the required environment in two ways:
-
Using
requirements.txtfile:Install the required packages directly:
pip install -r requirements.txt
-
Using
setup.sh:Build a Docker image and container for the project:
sudo chmod +x setup.sh ./setup.sh
This method creates a reproducible environment using Docker, simplifying dependency management.
If you want to download the PhysioNet Challenge 2019 dataset or obtain the pre-trained weights, you can run the following script:
chmod +x download_data.sh
./download_data.shEach experiment can be replicated using the provided configuration files. To execute a specific experiment, use the following command:
python ./scripts/<experiment_script>.py --config-name <experiment_file>To replicate the training of CoSTI across five runs and obtain the average results, the following scripts can be executed for each dataset configuration:
python ./scripts/run_average_experiment.py --config-name aqi36
python ./scripts/run_average_experiment.py --config-name metr-la_point
python ./scripts/run_average_experiment.py --config-name metr-la_block
python ./scripts/run_average_experiment.py --config-name pems-bay_point
python ./scripts/run_average_experiment.py --config-name pems-bay_block
python ./scripts/run_average_experiment.py --config-name mimic-challengeWe provide pre-trained weights for each dataset to enable testing and result replication using 2-step sampling. For example, for the AQI-36 dataset, you can run:
python ./scripts/run_k_test_experiment.py --config-name aqi36 test_sigmas=[80]python ./scripts/run_k_test_experiment.py --config-name aqi36You can modify the noise levels (./config/k_test/aqi36.yaml or with the parameter test_sigmas in the command line.
To replicate the sensitivity analysis, execute the following:
python ./scripts/run_sensitivity_experiment.py --config-name metr-la_pointTo perform imputation using the provided weights:
python ./scripts/impute_data.py --config-name aqi36
python ./scripts/impute_data.py --config-name metr-la_point
python ./scripts/impute_data.py --config-name metr-la_block
python ./scripts/impute_data.py --config-name pems-bay_point
python ./scripts/impute_data.py --config-name pems-bay_block
python ./scripts/impute_data.py --config-name mimic-challenge@article{SOLISGARCIA2025114117,
title = {CoSTI: Consistency models for (a faster) spatio-temporal imputation},
journal = {Knowledge-Based Systems},
volume = {327},
pages = {114117},
year = {2025},
issn = {0950-7051},
doi = {https://doi.org/10.1016/j.knosys.2025.114117},
url = {https://www.sciencedirect.com/science/article/pii/S095070512501158X},
author = {Javier Solís-García and Belén Vega-Márquez and Juan A. Nepomuceno and Isabel A. Nepomuceno-Chamorro},
keywords = {Multivariate time series imputation, Consistency models, Consistency training, Generative models, Spatio-temporal data},
abstract = {Multivariate Time Series Imputation (MTSI) is crucial for many applications, such as healthcare monitoring and traffic management, where incomplete data can compromise decision-making. Existing state-of-the-art methods, like Denoising Diffusion Probabilistic Models (DDPMs), achieve high imputation accuracy; however, they suffer from significant computational costs and are notably time-consuming due to their iterative nature. In this work, we propose CoSTI, an innovative adaptation of Consistency Models (CMs) for the MTSI domain. CoSTI employs Consistency Training to achieve comparable imputation quality to DDPMs while drastically reducing inference times, making it more suitable for real-time applications. We evaluate CoSTI across multiple datasets and missing data scenarios, demonstrating up to a 98 % reduction in imputation time with performance on par with diffusion-based models. This work bridges the gap between efficiency and accuracy in generative imputation tasks, providing a scalable solution for handling missing data in critical spatio-temporal systems. The code for this project can be found here: https://github.com/javiersgjavi/CoSTI.}
}
