Write pages #52

ZJONSSON · 2018-02-15T00:18:48Z

Splits each column into pages as determined by pageSize. Encoding into buffer is done as soon as we have enough rows to encode a page, even though the rowBuffer is far from being full. This ideally should reduce memory usage since the encoded pages should be substantially smaller than the shred data. Also pages facilitate scrolling fast through a column, skipping pages of no interest, and also help in locating individual records without having to read the whole column.

* bitpacking should work for any length of data, not just multiple of 8 (last packed is padded if less than 8) * Improve runs estimation - only start a new run if we are at a mod 8 === 0, otherwise use bitpacking

This moves data into encoded buffer as soon as possible, reducing memory requirements for the whole rowGroup

into ParquetReader so they can be called without arguments.

This was referenced Feb 15, 2018

Statistics #53

Open

Return array of buffers instead of buffer.concat #54

Open

ZJONSSON force-pushed the write-pages branch from 38775bc to f2d8355 Compare February 27, 2018 17:15

Fix RLE encode/decode

07fb2fd

* bitpacking should work for any length of data, not just multiple of 8 (last packed is padded if less than 8) * Improve runs estimation - only start a new run if we are at a mod 8 === 0, otherwise use bitpacking

ZJONSSON force-pushed the write-pages branch from f2d8355 to 45623f1 Compare March 2, 2018 13:56

Encode pages when pageRowCount > PageSize

ae23bdf

This moves data into encoded buffer as soon as possible, reducing memory requirements for the whole rowGroup

ZJONSSON force-pushed the write-pages branch from 45623f1 to ae23bdf Compare March 2, 2018 16:47

Internalize encodePages, writeRowGroup and writeFooter

63e6851

into ParquetReader so they can be called without arguments.

snyk-bot mentioned this pull request May 15, 2023

[Snyk] Upgrade async-mutex from 0.2.6 to 0.4.0 nguyenthdat/parquetjs#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Write pages #52

Write pages #52

Uh oh!

ZJONSSON commented Feb 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Write pages #52

Are you sure you want to change the base?

Write pages #52

Uh oh!

Conversation

ZJONSSON commented Feb 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant