Skip to content

Conversation

@jamiekb
Copy link

@jamiekb jamiekb commented Mar 9, 2021

Thanks for maintaining this package.
When a value of UINT_32 logical type is converted to/from an INT32 primitive, no action is taken to adjust its range. As a result, if a value >= 2^31 is encoded as UINT_32, the encoder will fail since the value is out of range of a signed INT32. Likewise, a UINT_32 read from file would be signed.

jim-lake pushed a commit to jim-lake/parquetjs that referenced this pull request Oct 15, 2025
Problem
=======
`typeLength`, and potentially `precision`, with value "null" causes
incorrect primitive type detection result.

Solution
========
We should handle the null values such that when the `typeLength` or
`precisions` field is of value "null", its primitive type are detected
as "INT64".

Steps to Verify:
The bug reproduces when the parquet file consists of a Dictionary_Page
with a INT64 field whose typeLength is null upon read. Unfortunately, I
don't have such a test file for now. My debugging was based on a piece
of privately shared data from our customer.

When the bug reproduces, the primitive type parsed from the schema
(Fixed_Length_Byte_Array) won't match the primitive type discovered from
the column data (Int64). Due to a discrepancy on how the library decodes
data pages, when the data is in a Dictionary_Page, the decoding logic
will hit the check for `typeLength` and fail. For Data_Page and
Data_Page_V2, decoding ignores the schema and privileges the primitive
type inferred from the column data. However, for Dictionary_Page,
decoding uses the primitive type specified in the schema.

decodeDataPageV2

https://github.com/LibertyDSNP/parquetjs/blob/91fc71f262c699fdb5be50df2e0b18da8acf8e19/lib/reader.ts#L1104

decodeDictionaryPage

https://github.com/LibertyDSNP/parquetjs/blob/91fc71f262c699fdb5be50df2e0b18da8acf8e19/lib/reader.ts#L947

Notice that one uses "opts.type" while the other uses
"opts.column.primitiveType".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant