CSTS - Correlation Structures in Time Series

Overview

This repository contains the code for generating, validating, and evaluating the CSTS (Correlation Structures in Time Series) benchmark dataset. CSTS is a comprehensive synthetic benchmark for evaluating the discovery of correlation structures in multivariate time series data.

Key features of CSTS:

Synthetic time series data with 23 distinct correlation structures
Systematic variation of data conditions (distribution shifts, sparsification, downsampling)
Ground truth segmentation and clustering labels
Controlled degraded clustering results for validation method evaluation
Extensible data generation framework

For quick access to the dataset without setting up this repository, use our Hugging Face dataset and Google Colab notebook.

Repository Contents

This repository provides:

Data Generation: Code for generating synthetic datasets with known correlation structures
Data Validation: Tools for validating the preservation of correlation structures
Evaluation Framework: Methods for assessing clustering algorithms and validation indices
Case Study Implementation: Code for reproducing the TICC algorithm evaluation

Directory Structure

corrclust-validation/
├── csts/                                  # Data directory (not in git, see clone HF data)
│   ├── exploratory/                       # Training/exploration subjects
│   │   ├── irregular_p30/                 # Partial data variants (70% observations)
│   │   ├── irregular_p90/                 # Sparse data variant (10% observations)
│   │   ├── raw/                           # Complete raw data variant
│   │   ├── normal/                        # Complete correlated data variant
│   │   ├── non_normal/                    # Complete non-normal data variant
│   │   └── downsampled_1min/              # Complete downsampled data variant
│   └── confirmatory/                      # Validation data
│       └── ...                            # Same structure as exploratory
├── src/                    
│   ├── data_generation/                   # Synthetic data generation
│   ├── evaluation/                        # Evaluation Methods
│   ├── experiments/                       # Run scripts for expriments
│   ├── use_case/                          # TICC example case study
│   ├── visualisation/                     # Results visualisation
│   └── utils/                             # Helper functions
├── tests/                                 # Unit and Integration Tests
├── conda-exact.yml                        # Conda environment exact versions for CSTS
├── private-yaml-template.yml              # Template file to create private.yaml for WANDB config
└── README.md                              # This file

Getting Started

Accessing the Data

The complete dataset is available on Hugging Face. This is the easiest way to access the data if you just want to evaluate your algorithms.

from datasets import load_dataset

# Load data for the exploratory split, complete correlated variant
data = load_dataset("idegen/csts", name="correlated_complete_data", split="exploratory")

# Load corresponding ground truth labels
labels = load_dataset("idegen/csts", name="correlated_complete_labels", split="exploratory")

Use this Repository (For Evaluation/Development/Extension)

1. Clone the Code Repository

# Clone via SSH
git clone git@github.com:isabelladegen/corrclust-validation.git
cd corrclust-validation

2. Clone the Hugging Face Data

# Make sure Git LFS is installed
git lfs install

# Clone the dataset into the corrclust-validation directory
git clone https://huggingface.co/datasets/idegen/csts

3. Create conda environment

conda env create -f conda-exact.yml
conda activate corr-24

Key Applications

This codebase supports several research applications:

Evaluating Clustering Algorithms: Test how well algorithms discover correlation structures across data variants, see Algorithm Evaluation Guide
Assessing Validation Methods: Evaluate internal and external validation indices for correlation-based clustering
Analyzing Preprocessing Effects: Investigate how techniques like downsampling affect correlation structures
Extending the Benchmark: Generate custom data variants with different properties, see Data Generation Guide

Citation

If you use this code, the CSTS dataset or our benchmark findings in your research, please cite our paper accordingly. This is the arXiv preprint version that describes the benchmark, check back for updates:

@misc{degen2025csts,
      title={CSTS: A Benchmark for the Discovery of Correlation Structures in Time Series Clustering}, 
      author={Isabella Degen and Zahraa S Abdallah and Henry W J Reeve and Kate Robson Brown},
      year={2025},
      eprint={2505.14596},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.14596}, 
}

If you use our validation method or findings please cite our, please cite our paper accordingly. This is the arXiv preprint version that provides the validation thresholds, check back for updates:

@misc{degen2025canonical,
    title={Canonical Correlation Patterns for Validating Clustering of Multivariate Time Series},
    author={Isabella Degen and Zahraa S Abdallah and Kate Robson Brown and Henry W J Reeve},
    year={2025},
    eprint={2507.16497},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2507.16497}
}

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 362 Commits
docs		docs
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda-exact.yml		conda-exact.yml
conda.yml		conda.yml
private-yaml-template.yaml		private-yaml-template.yaml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CSTS - Correlation Structures in Time Series

Overview

Repository Contents

Directory Structure

Getting Started

Accessing the Data

Use this Repository (For Evaluation/Development/Extension)

1. Clone the Code Repository

2. Clone the Hugging Face Data

3. Create conda environment

Key Applications

Citation

License

About

Uh oh!

Releases 1

Packages

Languages

License

isabelladegen/corrclust-validation

Folders and files

Latest commit

History

Repository files navigation

CSTS - Correlation Structures in Time Series

Overview

Repository Contents

Directory Structure

Getting Started

Accessing the Data

Use this Repository (For Evaluation/Development/Extension)

1. Clone the Code Repository

2. Clone the Hugging Face Data

3. Create conda environment

Key Applications

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages