Skip to content

Column with List(Struct) causes failed to decode level data for struct array (regression in 56) #8404

@valkum

Description

@valkum

Describe the bug

I have a file generated using polars which polars, duckdb and other tools read fine. arrow-rs fails with a Failed to decode level data for struct array error.

To Reproduce

Try to read the following file: https://storage.googleapis.com/ids-next/arrow-bug-dremel-encoding.parquet e.g. via

pub(crate) fn bug() -> () {
    let src = "https://storage.googleapis.com/ids-next/arrow-bug-dremel-encoding.parquet";
    let mut reader = File::open(src.strip_prefix("file://").unwrap()).unwrap();

    let metadata = ArrowReaderMetadata::load(&mut reader, ArrowReaderOptions::default()).unwrap();

    let mut market_reader =
        ParquetRecordBatchReaderBuilder::new_with_metadata(reader, metadata.clone())
            .build()
            .unwrap();

    let mut count = 0;

    while let Some(batch) = market_reader.next() {
        let batch = batch.unwrap();
        count += batch.num_rows();
    }

    println!("Read {} rows", count);

    ()
}

Expected behavior

The file should read or provide a better error if the file contains something unsupported.

Additional context

I tried to come up with a unit test for struct_array.rs but ran out of time.
I tried this with a struct of only ints, but that didn't trigger the bug; thus, I assume the dict might be part of the cause.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugparquetChanges to the parquet crate

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions