HIRE: Hierarchical Residual Encoding for Multiresolution Time Series Compression

Welcome to the repository accompanying our SIGMOD '23 paper Hierarchical Residual Encoding for Multiresolution Time Series Compression. We propose a new type of compression algorithm called HIRE that produces an encoding for time series data that can be decoded at multiple error bounds (thresholds). A sequence of pooling and spline operations is constructed, and these operations are applied recursively to an error vector. This repository can be used to reproduce the main results of the paper, or to explore how HIRE could work in your own system.

Dataset files

Download from the following Google Drive link: https://drive.google.com/drive/folders/1gxU9GskX9f60meHUnwcaqs0FvE75tuEN?usp=sharing

Requirements:

Create a new Python virtual environment
Install the dependencies via pip install -r requirements.txt
We used TRC as our downstream compression as explained in the paper. It can be installed through the link: https://github.com/powturbo/Turbo-Range-Coder

Our method: HIRE

Located in hier.py
HierarchicalSketch class implements univariate HIRE
MultivariateHierarchical class implements multivariate HIRE

Detailed comments explaining the correspondence beteween parts of the code and the paper can be found throughout.

Main experiment file

Navigate to experiments.py and choose baselines you would like to run in initialize(...)
Change data path to the dataset location in run(..., DATA_DIRECTORY = "your_path", FILENAME = "data_file")
Change results path for desired metrics in the with open(...) statements
Run file

LFZip, Buff, and optimal split experiments

cd revision_HIRE_experiments/
Place the datasets in the datasets/ directory (from the below Google Drive link)

To run LFZip, create a conda virtual environment and conda install lfzip. Then run python3 experiments_LFZip.py
To run Buff, first install the rust programming language on your local machine: https://www.rust-lang.org/tools/install. Then cd buff-master/database and python3 buff_experiments.py
To run the optimal splitting experiment python3 optimal_splitting_experiments.py For the ablation study of the optimal split, run python3 optimal_splitting_experiments_ablation.py
To run the experiments comparing the different error functions at various levels of the hierarchy, run python3 experiments_other_errs.py

Citation

If you use HIRE in your paper, please use the following citation:

@article{Barbarioli2023Hierarchical, author = {Barbarioli, Bruno and Mersy, Gabriel and Sintos, Stavros and Krishnan, Sanjay}, title = {Hierarchical Residual Encoding for Multiresolution Time Series Compression}, year = {2023}, issue_date = {May 2023}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {1}, number = {1}, url = {https://doi.org/10.1145/3588953}, doi = {10.1145/3588953}, abstract = {Data compression is a key technique for reducing the cost of data transfer from storage to compute nodes. Increasingly, modern data scales necessitate lossy compression techniques, where exactness is sacrificed for a smaller compressed representation. One challenge in lossy compression is that different applications may have different accuracy demands. Today's compression techniques struggle in this setting either forcing the user to compress at the strictest accuracy demand, or to re-encode the data at multiple resolutions. This paper proposes a simple, but effective multiresolution compression algorithm for time series data, where a single encoding can effectively be decompressed at multiple output resolutions. There are a number of benefits over current state-of-the-art techniques for time series compression. (1) The storage footprint of this encoding is smaller than re-encoding the data at multiple resolutions. (2) Similarly, the compression latency is generally smaller than re-encoding at multiple resolutions. (3) Finally, the decompression latency of our encoding is significantly faster than single encodings at the strictest accuracy demand.}, journal = {Proc. ACM Manag. Data}, month = {may}, articleno = {99}, numpages = {26}, keywords = {compression, storage, temporal databases} }

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
revision_HIRE_experiments		revision_HIRE_experiments
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
apca.py		apca.py
core.py		core.py
datasets_timeseries.ipynb		datasets_timeseries.ipynb
delta.py		delta.py
experiments.py		experiments.py
final_experiments_hier.ipynb		final_experiments_hier.ipynb
hier.py		hier.py
identity.py		identity.py
itcompress.py		itcompress.py
quantize.py		quantize.py
requirements.txt		requirements.txt
spartan.py		spartan.py
squish.py		squish.py
xor.py		xor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HIRE: Hierarchical Residual Encoding for Multiresolution Time Series Compression

Dataset files

Requirements:

Our method: HIRE

Main experiment file

LFZip, Buff, and optimal split experiments

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

gmersy/HIRE

Folders and files

Latest commit

History

Repository files navigation

HIRE: Hierarchical Residual Encoding for Multiresolution Time Series Compression

Dataset files

Requirements:

Our method: HIRE

Main experiment file

LFZip, Buff, and optimal split experiments

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages