Skip to content

fix(parquet): converting parquet schema with backward compatible repeated struct/primitive with provided arrow schema#8496

Merged
alamb merged 27 commits intoapache:mainfrom
rluvaton:fix-reading-backward-compat-repeated-struct-primitive-with-inferred-schema
Apr 6, 2026
Merged

fix(parquet): converting parquet schema with backward compatible repeated struct/primitive with provided arrow schema#8496
alamb merged 27 commits intoapache:mainfrom
rluvaton:fix-reading-backward-compat-repeated-struct-primitive-with-inferred-schema

Conversation

@rluvaton
Copy link
Copy Markdown
Member

@rluvaton rluvaton commented Sep 29, 2025

Which issue does this PR close?

Rationale for this change

Fix reading old parquet files

What changes are included in this PR?

tests and the fix, but mostly tests.

Are these changes tested?

yes

Are there any user-facing changes?

No

…ated struct/primitive with provided arrow schema

closes:
- apache#8495
@github-actions github-actions bot added the parquet Changes to the parquet crate label Sep 29, 2025
@rluvaton rluvaton marked this pull request as ready for review September 29, 2025 18:33
Comment thread parquet/src/arrow/schema/complex.rs Outdated
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rluvaton -- I took a quick review of this PR and the code looks reasonable to me, but I don't understand the legacy inferring logic / problem so I can't really review this PR fully yet

Can someone help me out with a link / document that describes what the legacy inferring is? Is it https://github.com/apache/parquet-format/blob/9fd57b59e0ce1a82a69237dcf8977d3e72a2965d/LogicalTypes.md?plain=1#L718-L723

Poking around that file, it looks like @etseidl may know something about this as he authored several commits, for example

Comment thread parquet/src/arrow/schema/complex.rs Outdated
/// Converts `self` into an arrow list, with its current type as the field type
/// accept an optional `list_data_type` to specify the type of list to create
///
/// This is used to convert deprecated repeated columns (not in a list), into their arrow representation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reference we can add a link to (I am not familiar with the "deprecated repeated columns" you are referencing here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

@rluvaton rluvaton Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly, updated

Comment thread parquet/src/arrow/schema/complex.rs Outdated
| Some(DataType::LargeList(field_hint))
| Some(DataType::FixedSizeList(field_hint, _)) => Some(field_hint.as_ref()),
Some(_) => unreachable!(
"should be validated earlier that list_data_type is only a type of list"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even though this should be impossible, I worry about panic'ing here because if there is a bug that error is more severe than "just an error"

ALso, while this may be true at the moment, I can imagine that some future refactor messes it up, in which case this may become reachable and the compiler won't complain

I would prefer returning a general_err! with some sort of "Internal error: should be validated..." type message

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Oct 8, 2025

I will try and find time tomorrow to review this

@rluvaton
Copy link
Copy Markdown
Member Author

@alamb Ping :)

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Oct 28, 2025

I see this one, thank you - unfortunately I have many other PRs ahead of it in the review queue.

As I am not super familiar with this part of the spec it will likely take me longer to review as well - -maybe someone else who is more familiar can help out

Comment thread parquet/src/arrow/schema/complex.rs
_ => Field::new(name, data_type, nullable),
};

Ok(field.with_metadata(hint.metadata().clone()))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the extension type be added here too ? As for None below.
I.e. something like:

Suggested change
let merged = field.with_metadata(hint.metadata().clone());
try_add_extension_type(merged, parquet_type)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should but not part of this PR

@rluvaton
Copy link
Copy Markdown
Member Author

rluvaton commented Feb 4, 2026

I see this one, thank you - unfortunately I have many other PRs ahead of it in the review queue.

As I am not super familiar with this part of the spec it will likely take me longer to review as well - -maybe someone else who is more familiar can help out

@alamb so can you please assign the relevant people, so we can merge this 4 months old PR

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Feb 11, 2026

@alamb so can you please assign the relevant people, so we can merge this 4 months old PR

I can't assign anyone as I have no actual authority over anyone's time -- All I can do is to beg and/or cajole others into helping do so.

One thing that would help me review this pr (and perhaps others as well) is a more complete description / documentation of what this code does, with examples, as I am not all that familiar with the current schema representation of lists (and not at all familiar with the older one) would to help provide context (ideally in comments) of what this PR is doing

For example, for me to understand comments like this, I need to go into the referenced URL and then try and match the terminology used there to what is used in this repo and PR. While I can do that it will take time and time is the thing I seem to have the least of

    /// This is used to convert [deprecated repeated columns] (not in a list), into their arrow representation

Maybe it could give an example so that the code was more self contained. For example, an example of the current List representation and the old (deprecated) representation?

Comment thread parquet/src/arrow/schema/complex.rs
@etseidl
Copy link
Copy Markdown
Contributor

etseidl commented Feb 12, 2026

I'm trying to come up to speed on this, but an early observation is that one test (backward_compat_list_struct_with_nested_repeated_primitive_respects_arrow_hint) appears to violate this line in the spec:

For all fields in the schema, implementations should use either LIST and MAP annotations or unannotated repeated fields, but not both. When using the annotations, no unannotated repeated types are allowed.

From the test in question:

        // This is a backward-compatible LIST (rule 4) where the struct element contains
        // a repeated primitive. The arrow hint specifies that the inner repeated primitive
        // should be LargeList<Int32>.
        let message_type = "
            message schema {
                optional group my_list (LIST) {
                    repeated group my_list_tuple {
                        required binary str (STRING);
                        repeated int32 values;
                    }
                }
            }
        ";

That said, plugging the tests from this PR into main without the fix yields

failures:
    arrow::schema::complex::tests::backward_compat_list_struct_with_nested_repeated_primitive_respects_arrow_hint
    arrow::schema::complex::tests::convert_schema_with_nested_repeated_struct_and_primitives
    arrow::schema::complex::tests::convert_schema_with_repeated_primitive_should_use_inferred_schema
    arrow::schema::complex::tests::convert_schema_with_repeated_primitive_should_use_inferred_schema_for_list_as_well
    arrow::schema::complex::tests::convert_schema_with_repeated_struct_and_inferred_schema
    arrow::schema::complex::tests::convert_schema_with_repeated_struct_and_inferred_schema_and_field_id

test result: FAILED. 24 passed; 6 failed; 0 ignored; 0 measured; 918 filtered out; finished in 0.64s

Other than the test mentioned above, the other failing tests seem like they should succeed.

@rluvaton
Copy link
Copy Markdown
Member Author

@etseidl updated, thanks for catching that!

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @rluvaton @etseidl and @martin-g

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 1, 2026

run benchmarks arrow_reader arrow_reader_clickbench

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4169205369-643-gh852 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing fix-reading-backward-compat-repeated-struct-primitive-with-inferred-schema (090b00e) to 61b5763 (merge-base) diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4169205369-644-9729b 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing fix-reading-backward-compat-repeated-struct-primitive-with-inferred-schema (090b00e) to 61b5763 (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                             fix-reading-backward-compat-repeated-struct-primitive-with-inferred-schema    main
-----                                             --------------------------------------------------------------------------    ----
arrow_reader_clickbench/async/Q1                  1.00   1082.9±4.14µs        ? ?/sec                                           1.01  1099.0±25.67µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.00      6.4±0.04ms        ? ?/sec                                           1.01      6.5±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.00      7.4±0.06ms        ? ?/sec                                           1.02      7.5±0.05ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.00     14.2±0.07ms        ? ?/sec                                           1.00     14.3±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.00     16.8±0.07ms        ? ?/sec                                           1.01     17.0±0.09ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.00     15.7±0.05ms        ? ?/sec                                           1.01     15.9±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.00      3.1±0.04ms        ? ?/sec                                           1.01      3.1±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.06     94.1±1.98ms        ? ?/sec                                           1.00     88.9±8.01ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.28    104.7±6.80ms        ? ?/sec                                           1.00     81.6±0.35ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.00    117.8±2.79ms        ? ?/sec                                           1.03    120.9±2.76ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.00    238.2±0.53ms        ? ?/sec                                           1.03    246.4±0.71ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.00     19.1±0.08ms        ? ?/sec                                           1.00     19.2±0.11ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.00     56.7±0.15ms        ? ?/sec                                           1.02     57.6±0.39ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.00     56.9±0.12ms        ? ?/sec                                           1.02     57.9±0.46ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.01     18.6±1.40ms        ? ?/sec                                           1.00     18.4±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.00     14.8±0.13ms        ? ?/sec                                           1.02     15.2±0.25ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.00      5.4±0.02ms        ? ?/sec                                           1.00      5.4±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.00     13.1±0.22ms        ? ?/sec                                           1.04     13.6±0.29ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.00     24.0±0.27ms        ? ?/sec                                           1.04     25.1±0.44ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.00      5.8±0.04ms        ? ?/sec                                           1.01      5.8±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.00      5.0±0.05ms        ? ?/sec                                           1.01      5.1±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.00      3.6±0.02ms        ? ?/sec                                           1.00      3.5±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.00   1056.6±7.44µs        ? ?/sec                                           1.02   1073.1±3.41µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.00      6.3±0.05ms        ? ?/sec                                           1.01      6.3±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.00      7.2±0.04ms        ? ?/sec                                           1.01      7.3±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.00     14.2±0.08ms        ? ?/sec                                           1.00     14.2±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.01     16.9±0.13ms        ? ?/sec                                           1.00     16.7±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.01     15.8±0.10ms        ? ?/sec                                           1.00     15.7±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.00      3.0±0.02ms        ? ?/sec                                           1.01      3.0±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.00     71.0±0.26ms        ? ?/sec                                           1.01     71.6±0.55ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.00     79.6±0.25ms        ? ?/sec                                           1.02     81.0±3.18ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.00     96.3±0.21ms        ? ?/sec                                           1.03     99.5±0.98ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.00    217.3±0.99ms        ? ?/sec                                           1.23    267.2±5.38ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.00     19.1±0.18ms        ? ?/sec                                           1.01     19.4±0.17ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.00     56.2±0.26ms        ? ?/sec                                           1.02     57.3±0.53ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.00     56.4±0.20ms        ? ?/sec                                           1.02     57.5±0.42ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.00     18.1±0.05ms        ? ?/sec                                           1.01     18.3±0.09ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.00     14.3±0.15ms        ? ?/sec                                           1.03     14.8±0.29ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.01      5.3±0.02ms        ? ?/sec                                           1.00      5.3±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.00     12.9±0.23ms        ? ?/sec                                           1.02     13.1±0.28ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.00     23.2±0.21ms        ? ?/sec                                           1.04     24.2±0.63ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.00      5.5±0.02ms        ? ?/sec                                           1.05      5.7±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.00      4.8±0.02ms        ? ?/sec                                           1.02      4.9±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.00      3.4±0.02ms        ? ?/sec                                           1.01      3.5±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00    865.0±1.45µs        ? ?/sec                                           1.01    877.5±2.31µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.00      5.0±0.05ms        ? ?/sec                                           1.01      5.1±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.00      5.9±0.04ms        ? ?/sec                                           1.01      6.0±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.01     21.6±0.09ms        ? ?/sec                                           1.00     21.5±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.00     28.3±0.78ms        ? ?/sec                                           1.05     29.8±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.00     22.9±0.09ms        ? ?/sec                                           1.00     22.9±0.12ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.00      2.7±0.03ms        ? ?/sec                                           1.00      2.7±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.00    119.9±0.19ms        ? ?/sec                                           1.02    122.5±0.87ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.00     91.3±0.12ms        ? ?/sec                                           1.02     93.1±0.36ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.02    141.4±1.02ms        ? ?/sec                                           1.00    138.6±1.14ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.00   273.3±13.29ms        ? ?/sec                                           1.02   278.7±14.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.09     29.4±0.12ms        ? ?/sec                                           1.00     26.9±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.00    105.7±0.15ms        ? ?/sec                                           1.03    108.7±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.00    103.2±0.12ms        ? ?/sec                                           1.03    106.5±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.00     18.6±0.07ms        ? ?/sec                                           1.01     18.9±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.00     21.9±0.06ms        ? ?/sec                                           1.01     22.1±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.00      6.8±0.01ms        ? ?/sec                                           1.00      6.8±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.00     11.2±0.04ms        ? ?/sec                                           1.01     11.3±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.00     20.6±0.10ms        ? ?/sec                                           1.01     20.9±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.00      5.2±0.02ms        ? ?/sec                                           1.01      5.2±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.00      5.6±0.04ms        ? ?/sec                                           1.01      5.6±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.00      4.3±0.03ms        ? ?/sec                                           1.01      4.4±0.02ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 783.9s
Peak memory 3.1 GiB
Avg memory 3.0 GiB
CPU user 705.1s
CPU sys 77.7s
Disk read 0 B
Disk write 2.0 GiB

branch

Metric Value
Wall time 776.9s
Peak memory 3.3 GiB
Avg memory 3.1 GiB
CPU user 709.3s
CPU sys 67.7s
Disk read 0 B
Disk write 171.3 MiB

File an issue against this benchmark runner

@rluvaton
Copy link
Copy Markdown
Member Author

rluvaton commented Apr 6, 2026

can we merge this PR?

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 6, 2026

Yes, merged!

@alamb alamb merged commit 871c6d2 into apache:main Apr 6, 2026
16 checks passed
@rluvaton rluvaton deleted the fix-reading-backward-compat-repeated-struct-primitive-with-inferred-schema branch April 9, 2026 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

providing the same schema that read from backward compatible parquet fails: incompatible arrow schema, expected struct got List

5 participants