Skip to content

[FEA] Zero Conf - test/prototype chunked compress/encrypt #12509

@revans2

Description

@revans2

Is your feature request related to a problem? Please describe.
We want to try and improve our CPU utilization for shuffle with compression/encryption on shuffle partitions larger than 200. This was the main reason for the multi-threaded shuffle. To be able to do the CPU computation heavy processing in a thread pool instead of doing it in a single thread. I think we can do even better still if we can chunk the input data at arbitrary points and then encrypt/compress the data in a thread pool before it is written out to a file/etc.

The goal of this is to see if we can build a thread pool implementation that can take one large buffer of data to be output, chunk it into decently large chunks, optionally encrypt/compress that data and then slice it back together again as it is written out.

Along with this have the reader do the opposite. As it pulls in the data, detect that there are separate chunks of information, so that they get farmed out to a thread pool to do the decryption/decompression.

We want to make sure that we can prototype this for an internal shuffle where we don't care about the file format where the data is stored in, but also if we need to write the data out in a format that is compatible with the external shuffle file format. I think we should be able to make it work in both cases, but we might need a bit of extra metadata inline to know how the chunks are split up.

Metadata

Metadata

Assignees

Labels

performanceA performance related task/issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions