-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Hi SpiralDB, I've had a great experience using the fsst lib and vortex -- thank you for building them.
I'm trying to make fsst even faster. Currently the fsst compress the vortex varbin array by iterating each string and compress them individually. This works well for long strings but not so much for short strings, especially strings that are shorter than 8 bytes -- in that case, every string compression will fallback to the slow path.
The original paper suggests to copy the short strings to a new buffer and add a terminator between strings (Section 5.2). With the new long buffer, we can compress faster even with scalar code (and potentially auto-vectorized code). The new long buffer also allows us to compress with AVX512.
I guess the first step is to add a terminator byte to the symbol table, as shown in the c++ implementation.
Would you happen to have plans to implement this or any thoughts on the approach? I’d love to hear your thoughts.