Currently Symbolicator tries to download multiple parts of a file concurrently using range requests. An http server which implements this incorrectly, e.g. yielding Content-Encoding: gzip for parts from the original compressed file will fail the download (S3 may do that). But even if the parts were correctly re-compressed chunks, it will break in Symbolicator as Symbolicator needs to know the exact file size and chunk size to assemble the final file.
#1890 disables transparent decompression on a client level for S3, but Symbolicator can implement this better, but always disabling transparent decompression (for all clients), and manually adding Accept-Encoding headers. Then dealing with decompression after the file is fully downloaded.
For the most part fetch_file already does that, and presumably the "maybe decompressed" logic was exactly added because of S3. We can do better and use the actual header given to us, instead of guessing (guessing may still be a good fallback though).
The only problem are other code paths which don't go through fetch_file, but we can turn maybe_decompress_file into a Read or Stream implementation which now handles decompression transparently again.
Currently Symbolicator tries to download multiple parts of a file concurrently using range requests. An http server which implements this incorrectly, e.g. yielding
Content-Encoding: gzipfor parts from the original compressed file will fail the download (S3 may do that). But even if the parts were correctly re-compressed chunks, it will break in Symbolicator as Symbolicator needs to know the exact file size and chunk size to assemble the final file.#1890 disables transparent decompression on a client level for S3, but Symbolicator can implement this better, but always disabling transparent decompression (for all clients), and manually adding
Accept-Encodingheaders. Then dealing with decompression after the file is fully downloaded.For the most part
fetch_filealready does that, and presumably the "maybe decompressed" logic was exactly added because of S3. We can do better and use the actual header given to us, instead of guessing (guessing may still be a good fallback though).The only problem are other code paths which don't go through
fetch_file, but we can turnmaybe_decompress_fileinto aReadorStreamimplementation which now handles decompression transparently again.