The readers and writers now read from a file that has to be in the format specified, but large datasets are very inefficient when stored even on a cluster, because it significantly increases network transfer. It's more efficient to rely on a bit of cpu to decompress the file. It should be possible to inject a decompressor class to the reader classes and inject a compressor into the writer classes.
The readers and writers now read from a file that has to be in the format specified, but large datasets are very inefficient when stored even on a cluster, because it significantly increases network transfer. It's more efficient to rely on a bit of cpu to decompress the file. It should be possible to inject a decompressor class to the reader classes and inject a compressor into the writer classes.