Add FSST support as an active proposal#532
Conversation
|
cc @julienledem @emkornfield could you please review, just adding FSST as an active proposal thanks |
| |-----|--------------|---------| | ||
| | [github issue] | adding this new encoding | POC | | ||
| | [github issue] | add Variant type | Implementation | | ||
| | [Issue #531](https://github.com/apache/parquet-format/issues/531) | FSST support for Parquet format | Implementation | |
There was a problem hiding this comment.
I don't think we are at implementation yet. I think we still want to work out:
- New page layout
- Sharing dictionaries
The issue also doesn't have links to any of the docs produced so far?
There was a problem hiding this comment.
+1 let's have the status "Draft/POC"
And please add the link to your doc in the issue.
Thanks!
There was a problem hiding this comment.
Hi @emkornfield I'll raise a draft PR for this, we would not need shared dictionaries since the symbol table empirically works best on a per-page level, and only a few hundred bytes. The existing decoder holds the symbol table as the prefix similar to DELTA_BINARY_PACKED encoder, can discuss more on the pr thanks
There was a problem hiding this comment.
Does this include FSST-12 or just FSST-8 (we don't need to discuss here, looking forward to seeing the numbers).
d524a17 to
83b296d
Compare
83b296d to
1df9c2d
Compare
|
Thank you! |
|
LGTM, merging. Thank you @ArnavBalyan |
Rationale for this change
What changes are included in this PR?
Do these changes have PoC implementations?
Closes #531