This repository contains the code and configurations for a study submitted as a scientific research article including two new proposed approach for zero inflated time series generation.
Zero-Inflated Time Series (ZITS), characterized by excessive zeros relative to standard distributions, pose fundamental challenges for synthetic data generation due to the need to simultaneously preserve the temporal dependencies, the excess zero structure, and the heavy-tailed non-zero distributions. Current generative models, including variational autoencoders (VAE) and generative adversarial networks (GAN), lack the mechanisms for handling zero-inflated data and fail to generate data with similar characteristics.
This work proposes two architectures, ZITS-GAN and ZITS-VAE, designed for zero-inflated time series generation through a two-head decoder structure that separately models the zero pattern and the non-zero magnitude. The proposed architectures use dilated convolutional layers to capture long sequence dependencies and stabilize training through
- TimeGAN,
- TimeVAE,
- ChronoGAN,
- TransFusion,
- FIDE.
- Name: M5
- Description: provided on Kaggle, which contains daily unit sales collected by Walmart and distributed as several CSV files. The raw release comprises a wide table of bottom-level series, each row corresponds to a single stock keeping unit (SKU) at a single store, together with a separate calendar mapping that associates each recorded day index with its calendar date and auxiliary fields such as weekday, month, and special events. This publicly available data comprises 30,490 items over 1,941 days. The M5 data explicitly focuses on realistic retail demand and displays intermittency (lots of zero daily sales).
- Access: Available at 🔗 M5 Dataset Page
- Name: Household IoT Devices
- Description: consists of time series data capturing the daily operational duration of household IoT devices characterized by discrete operating cycles over one year. To ensure statistical relevance and data quality, we retained from the raw dataset - comprising individual running cycle durations in seconds per device - only those devices exhibiting a sufficient number of running cycles.
- Access: unavailable.
- TimeGAN
- TimeVAE
- ChronoGAN
- TransFusion
- FIDE
- Proposed models: ZITS-GAN and ZITS-VAE
We show here a summary of the evaluations made showing the difference in 0 ratios (Δ 0 ratio), the Long Sequence Predictive Score (LPS) and the Long Sequence Discriminative Score (LDS) obtained.
| Model | Dataset | Δ 0 ratio | LPS | LDS |
|---|---|---|---|---|
| TimeVAE | M5 | 0.144 | 1.344 | 0.389 |
| ZITS-GAN | M5 | 0.012 | 1.448 | 0.213 |
| ZITS-VAE | M5 | 0.014 | 1.408 | 0.126 |
| TimeVAE | Household IoT Devices | 0.259 | 2847 | 0.5 |
| ZITS-GAN | Household IoT Devices | 0.038 | 2153 | 0 |
| ZITS-VAE | Household IoT Devices | 0.033 | 2153 | 0.248 |
For questions, please contact: 📧 ardeleaneugenrichard@gmail.com