Skip to content

Use docs from parquet-format as source of truth#142

Merged
emkornfield merged 8 commits intoapache:productionfrom
emkornfield:use_parquet_format
Dec 9, 2025
Merged

Use docs from parquet-format as source of truth#142
emkornfield merged 8 commits intoapache:productionfrom
emkornfield:use_parquet_format

Conversation

@emkornfield
Copy link
Copy Markdown
Contributor

@emkornfield emkornfield commented Dec 3, 2025

Before there was duplication of some markdown docs between parquet-site and parquet-format, this adds parquet-format as a submodule and an rule that can import from the submodule.

Note, staging seems quite out of date but if desired we can try this out there first (I tested locally with docker and this seems to produce reasonable results).

@emkornfield emkornfield requested review from alamb and wgtmac December 3, 2025 00:17
@emkornfield
Copy link
Copy Markdown
Contributor Author

This looks like it might have broken Variant type links, I'll see if I can fix.

@emkornfield
Copy link
Copy Markdown
Contributor Author

Variant types should now be fixed.

Copy link
Copy Markdown
Collaborator

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @emkornfield -- this is a great step forward

I found only one broken image -- otherwise this is a great step forward

linkTitle: "Compression"
weight: 1
---
## Overview
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rendered this locally and it looks good to me

Screenshot 2025-12-03 at 3 18 18 PM

@@ -0,0 +1,7 @@
---
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice to have variant on the webpage now

Screenshot 2025-12-03 at 3 21 03 PM

Comment thread hugo.toml Outdated


[module]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit is that the changes in this file seem unecessary

@@ -0,0 +1,15 @@
<!DOCTYPE html>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty clever. It might be worth some comments explaining what it does

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment.

cost for reading them if it is not doing selective scans. The index structures'
location and length are stored in ColumnChunk.

![Page Index Layout](/images/PageIndexLayout.png)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image from https://github.com/apache/parquet-format/blob/master/PageIndex.md#technical-approach doesn't seem to be visible anymore:

Screenshot 2025-12-03 at 3 34 57 PM

The image appears to be there in public/images/PageIndexLayout.png but the rendered link is doc/images/PageIndexLayout.png

Screenshot 2025-12-03 at 3 38 22 PM

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that all embedded images from apache/parquet-format will be invisible now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, nice catch, let me see what I can do to fix these things up automatically.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think this is fixed (it at least seem to be working when I render locally).

cost for reading them if it is not doing selective scans. The index structures'
location and length are stored in ColumnChunk.

![Page Index Layout](/images/PageIndexLayout.png)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that all embedded images from apache/parquet-format will be invisible now?

@emkornfield
Copy link
Copy Markdown
Contributor Author

@alamb @wgtmac thanks for the reviews, I'll send an heads up the the mailing list in case others notice anything that breaks and then merge if there aren't objections.

Comment thread hugo.toml Outdated
Comment thread hugo.toml Outdated
@alamb
Copy link
Copy Markdown
Collaborator

alamb commented Dec 8, 2025

@alamb @wgtmac thanks for the reviews, I'll send an heads up the the mailing list in case others notice anything that breaks and then merge if there aren't objections.

Sounds great -- thank you

@alamb
Copy link
Copy Markdown
Collaborator

alamb commented Dec 9, 2025

(for anyone following along, relevant mailing list note is here: https://lists.apache.org/thread/qp0ob5z4lvthk94w08d0k9k02ql52fzs)

@emkornfield emkornfield merged commit b588d89 into apache:production Dec 9, 2025
1 check passed
@alamb
Copy link
Copy Markdown
Collaborator

alamb commented Dec 9, 2025

@alamb
Copy link
Copy Markdown
Collaborator

alamb commented Feb 9, 2026

I just made a PR to update the docs to reflect the latest parquet-format changes.

TLDR is that this new system works great 👌 👨‍🍳 -- thanks again @emkornfield

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants