Why Bitswsap and compression may not be a good match?

As part of [RFC|BB|L203A](https://github.com/protocol/ResNetLab/blob/master/beyond-bitswap/rfc/rfcBBL203A.md) of the Beyond Bitswap project we are exploring the use of compression within Bitswap. In our initial explorations we have [implemented two main strategies](https://github.com/adlrocha/go-bitswap/tree/feature/rfcBBL203A):
* `BlockCompression`: It compresses the `RawData` of blocks using GZip.
* `FullCompression`: It compresses the full Bitswap message using GZip and adds it the new `CompressedPayload` of a new Bitswap message.

Thus, two additional fields have been added to Bitswap's protobuf message: `CompressedPaylaod` including the compressed payload of the original Bitswap message if `FullCompression` is enabled, and the `CompressionType` flag to signal the compression strategy used.

Initial tests show an approximate x10 overhead by the use of compression compared to Bitswap without compression. We compared our GZip compression approach to existing implementations of GZip compression http handlers in golang, and the main difference is that GZip http handlers pipe the compression writer to the http server writer streaming the compressed bytes directly to the client. In our case, we can't use stream compression for several reasons:
* The most limiting one is that we use prefixes to determine the size of messages in our transmission channel. In order to stream the compressed message to the libp2p stream writer we need to know in advance the size of the compressed message and we can't do this beforehand.
* We use a `CompressionType` field in the protobuf so we can be backward compatible. This is not a stopper because we could easily use a multicodec to signal that the stream is compressed.
* In the current implementation of `go-bitswap` where the libp2p protocol stream is referenced and written in `message.go`, there is no easy way to pipe the compressed stream to the protocol stream writer (this is probably fixable once we figure out how to avoid the required size prefix).
* Bitswap transmissions are done "block by block" so we can only compress "on-the-fly" the blocks requested in a wantlist and being sent in a message. We can't compress in advance all the blocks of a DAG so they are ready for future transmissions (although according to the blocks requested we could have a good sense of "what comes next"). As an idea from the top of my mind we could have JIT compression (compression of blocks "on-the-fly") and optimized compression (where we try to predict the next block and compile it in advance), using a similar approach to [V8's compilation](https://medium.com/hackernoon/javascript-v8-engine-explained-3f940148d4ef).

As a result of all of this evaluations we want to open a discussion on the following topics:
* In order to be able to leverage stream handlers, instead of using a prefix to signal the size of sent messages, we could use multicodec prefix and `KeepReading` and `EndOfStream` signals in protocol streams so we don't have to know message sizes beforehand and we can pipe streams (such as what we were looking to do for stream compression).
* Consider porting Bitswap from a message protocol to a stream protocol similar to graphsync, where first there is a discovery phase and then a stream is opened between the requesting peer and the holder of content. [RFCBBL1201](https://github.com/protocol/ResNetLab/blob/master/beyond-bitswap/rfc/rfcBBL1201.md) goes in this direction.
* Consider storing block's `RawData` compressed in the datastore removing the need of "on-the-fly" compression.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why Bitswsap and compression may not be a good match? #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why Bitswsap and compression may not be a good match? #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions