Skip to content

Faster short string compression with terminator byte and AVX512 #48

@XiangpengHao

Description

@XiangpengHao

Hi SpiralDB, I've had a great experience using the fsst lib and vortex -- thank you for building them.

I'm trying to make fsst even faster. Currently the fsst compress the vortex varbin array by iterating each string and compress them individually. This works well for long strings but not so much for short strings, especially strings that are shorter than 8 bytes -- in that case, every string compression will fallback to the slow path.

The original paper suggests to copy the short strings to a new buffer and add a terminator between strings (Section 5.2). With the new long buffer, we can compress faster even with scalar code (and potentially auto-vectorized code). The new long buffer also allows us to compress with AVX512.

I guess the first step is to add a terminator byte to the symbol table, as shown in the c++ implementation.

Would you happen to have plans to implement this or any thoughts on the approach? I’d love to hear your thoughts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions