Hi -
Our team would like to build on and continue the work of "Benchmarking distributed data warehouse solutions for storing genomic variant information", with a particular focus on cloud environments (using, e.g., object stores & S3 as opposed to hadoop-based solutions).
It would really help us be able to use your work as a starting point and initial point of comparison to begin with your test data; we probably wouldn't start up a hadoop cluster with hive just to run the data generation pipeline described on the wiki: https://github.com/ZSI-Bio/variantsdwh/wiki/Data-generator .
The link provided for test data here: https://github.com/ZSI-Bio/variantsdwh/wiki/Test-data (http://zsibio.ii.pw.edu.pl/dwh_benchmark/) 404s; is that data still available somewhere?
Many thanks.
Hi -
Our team would like to build on and continue the work of "Benchmarking distributed data warehouse solutions for storing genomic variant information", with a particular focus on cloud environments (using, e.g., object stores & S3 as opposed to hadoop-based solutions).
It would really help us be able to use your work as a starting point and initial point of comparison to begin with your test data; we probably wouldn't start up a hadoop cluster with hive just to run the data generation pipeline described on the wiki: https://github.com/ZSI-Bio/variantsdwh/wiki/Data-generator .
The link provided for test data here: https://github.com/ZSI-Bio/variantsdwh/wiki/Test-data (http://zsibio.ii.pw.edu.pl/dwh_benchmark/) 404s; is that data still available somewhere?
Many thanks.