broker: multiple members for gzip compression #455

williamhbaker · 2026-01-02T18:59:48Z

Each active gzip writer introduces a small but significant amount of memory overhead, on the order of several hundred KB. When there are many active journals being written, this can add up to a large amount of memory usage.

This adds a threshold where incremental compression will occur only if there is at least 1 MB of data to compress, and creates a new gzip writing mechanism that allows closing & creating new gzip members, concatenated into the same output file. The spool logic uses this mechanism to create a new gzip member for every batch of incremental compression, eliminating the need to hold a gzip writer in memory for the entire lifetime of the fragment file.

This change only applies to standard gzip compression with client-side decompression. If decompression offloading is used, gzip files will continue to be written in a single stream, as there are issues with some object stores truncating multi-member gzip content after the first member.

Manual Testing:

For 1000 journals actively being written observed a ~1 GiB drop in RSS memory usage post-change, confirmed by profiling to be from a reduction in flate memory overhead.
Basic writes and reads from journals covering no compression, gzip, and snappy from file / S3 / GCS / Azure stores
E2E testing with a Flow local stack using AWS, GCS, and Azure storage mappings
Quick throughput testing: There wasn't any difference in attainable throughput when writing to a modest number of journals as fast as possible running things on my laptop, and CPU usage looked about the same. Theoretically I'd expect some amount of increased CPU for re-initializing new GZIP writers, but at least from this crude test I didn't see anything major.

Each active gzip writer introduces a small but significant amount of memory overhead, on the order of several hundred KB. When there are many active journals being written, this can add up to a large amount of memory usage. This adds a threshold where incremental compression will occur only if there is at least 1 MB of data to compress, and creates a new gzip writing mechanism that allows closing & creating new gzip members, concatenated into the same output file. The spool logic uses this mechanism to create a new gzip member for every batch of incremental compression, eliminating the need to hold a gzip writer in memory for the entire lifetime of the fragment file. This change only applies to standard gzip compression with client-side decompression. If decompression offloading is used, gzip files will continue to be written in a single stream, as there are issues with some object stores truncating multi-member gzip content after the first member.

jgraettinger

LGTM

williamhbaker force-pushed the wb/compression branch from 2c6ffe7 to b932228 Compare January 2, 2026 20:57

williamhbaker marked this pull request as draft January 2, 2026 22:26

williamhbaker force-pushed the wb/compression branch 6 times, most recently from 6754c4f to 019b816 Compare January 7, 2026 17:14

williamhbaker force-pushed the wb/compression branch from 019b816 to e956fa9 Compare January 7, 2026 17:53

williamhbaker requested a review from jgraettinger January 7, 2026 18:10

williamhbaker marked this pull request as ready for review January 7, 2026 18:10

jgraettinger approved these changes Jan 9, 2026

View reviewed changes

williamhbaker mentioned this pull request Jan 15, 2026

gazette/journal: read gzip with multiple members estuary/flow#2620

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

broker: multiple members for gzip compression #455

broker: multiple members for gzip compression #455

Uh oh!

williamhbaker commented Jan 2, 2026 •

edited

Loading

Uh oh!

jgraettinger left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

broker: multiple members for gzip compression #455

Are you sure you want to change the base?

broker: multiple members for gzip compression #455

Uh oh!

Conversation

williamhbaker commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgraettinger left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

williamhbaker commented Jan 2, 2026 •

edited

Loading