Skip to content

Add FSST support as an active proposal#532

Merged
emkornfield merged 1 commit intoapache:masterfrom
ArnavBalyan:arnavb/fsst-proposal
Nov 23, 2025
Merged

Add FSST support as an active proposal#532
emkornfield merged 1 commit intoapache:masterfrom
ArnavBalyan:arnavb/fsst-proposal

Conversation

@ArnavBalyan
Copy link
Copy Markdown
Member

Rationale for this change

  • Added FSST support under active proposals

What changes are included in this PR?

  • Updated proposals

Do these changes have PoC implementations?

  • No

Closes #531

@ArnavBalyan
Copy link
Copy Markdown
Member Author

cc @julienledem @emkornfield could you please review, just adding FSST as an active proposal thanks

Comment thread proposals/README.md Outdated
|-----|--------------|---------|
| [github issue] | adding this new encoding | POC |
| [github issue] | add Variant type | Implementation |
| [Issue #531](https://github.com/apache/parquet-format/issues/531) | FSST support for Parquet format | Implementation |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we are at implementation yet. I think we still want to work out:

  1. New page layout
  2. Sharing dictionaries

The issue also doesn't have links to any of the docs produced so far?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 let's have the status "Draft/POC"
And please add the link to your doc in the issue.
Thanks!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @emkornfield I'll raise a draft PR for this, we would not need shared dictionaries since the symbol table empirically works best on a per-page level, and only a few hundred bytes. The existing decoder holds the symbol table as the prefix similar to DELTA_BINARY_PACKED encoder, can discuss more on the pr thanks

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this include FSST-12 or just FSST-8 (we don't need to discuss here, looking forward to seeing the numbers).

Copy link
Copy Markdown
Member

@julienledem julienledem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you

@ArnavBalyan
Copy link
Copy Markdown
Member Author

Thank you!

@emkornfield
Copy link
Copy Markdown
Contributor

LGTM, merging. Thank you @ArnavBalyan

@emkornfield emkornfield merged commit 3ab52ff into apache:master Nov 23, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce FSST Support for Parquet

3 participants