fix(parquet): converting parquet schema with backward compatible repeated struct/primitive with provided arrow schema by rluvaton · Pull Request #8496 · apache/arrow-rs

rluvaton · 2025-09-29T16:13:31Z

Which issue does this PR close?

Closes providing the same schema that read from backward compatible parquet fails: incompatible arrow schema, expected struct got List #8495.

Rationale for this change

Fix reading old parquet files

What changes are included in this PR?

tests and the fix, but mostly tests.

Are these changes tested?

yes

Are there any user-facing changes?

No

…ated struct/primitive with provided arrow schema closes: - apache#8495

…primitive-with-inferred-schema # Conflicts: # parquet/src/arrow/schema/complex.rs

…primitive-with-inferred-schema

alamb

Thank you @rluvaton -- I took a quick review of this PR and the code looks reasonable to me, but I don't understand the legacy inferring logic / problem so I can't really review this PR fully yet

Can someone help me out with a link / document that describes what the legacy inferring is? Is it https://github.com/apache/parquet-format/blob/9fd57b59e0ce1a82a69237dcf8977d3e72a2965d/LogicalTypes.md?plain=1#L718-L723

Poking around that file, it looks like @etseidl may know something about this as he authored several commits, for example

apache/parquet-format#466

alamb · 2025-09-30T20:42:26Z

+    /// Converts `self` into an arrow list, with its current type as the field type
+    /// accept an optional `list_data_type` to specify the type of list to create
+    ///
+    /// This is used to convert deprecated repeated columns (not in a list), into their arrow representation


Is there any reference we can add a link to (I am not familiar with the "deprecated repeated columns" you are referencing here

Maybe it is it https://github.com/apache/parquet-format/blob/9fd57b59e0ce1a82a69237dcf8977d3e72a2965d/LogicalTypes.md?plain=1#L718-L723 ?

I believe it's https://github.com/apache/parquet-format/blob/9fd57b59e0ce1a82a69237dcf8977d3e72a2965d/LogicalTypes.md?plain=1#L640-L643, and specifically https://github.com/apache/parquet-format/blob/9fd57b59e0ce1a82a69237dcf8977d3e72a2965d/LogicalTypes.md?plain=1#L649-L650.

exactly, updated

alamb · 2025-09-30T20:48:19Z

+            | Some(DataType::LargeList(field_hint))
+            | Some(DataType::FixedSizeList(field_hint, _)) => Some(field_hint.as_ref()),
+            Some(_) => unreachable!(
+                "should be validated earlier that list_data_type is only a type of list"


even though this should be impossible, I worry about panic'ing here because if there is a bug that error is more severe than "just an error"

ALso, while this may be true at the moment, I can imagine that some future refactor messes it up, in which case this may become reachable and the compiler won't complain

I would prefer returning a general_err! with some sort of "Internal error: should be validated..." type message

alamb · 2025-10-08T21:10:48Z

I will try and find time tomorrow to review this

rluvaton · 2025-10-26T10:48:07Z

@alamb Ping :)

alamb · 2025-10-28T20:27:08Z

I see this one, thank you - unfortunately I have many other PRs ahead of it in the review queue.

As I am not super familiar with this part of the spec it will likely take me longer to review as well - -maybe someone else who is more familiar can help out

martin-g · 2025-10-28T21:35:55Z

                _ => Field::new(name, data_type, nullable),
            };

            Ok(field.with_metadata(hint.metadata().clone()))


Should the extension type be added here too ? As for None below.
I.e. something like:

Suggested change

let merged = field.with_metadata(hint.metadata().clone());

try_add_extension_type(merged, parquet_type)

I think we should but not part of this PR

…primitive-with-inferred-schema

rluvaton · 2026-02-04T12:24:55Z

I see this one, thank you - unfortunately I have many other PRs ahead of it in the review queue.

As I am not super familiar with this part of the spec it will likely take me longer to review as well - -maybe someone else who is more familiar can help out

@alamb so can you please assign the relevant people, so we can merge this 4 months old PR

alamb · 2026-02-11T20:20:08Z

@alamb so can you please assign the relevant people, so we can merge this 4 months old PR

I can't assign anyone as I have no actual authority over anyone's time -- All I can do is to beg and/or cajole others into helping do so.

One thing that would help me review this pr (and perhaps others as well) is a more complete description / documentation of what this code does, with examples, as I am not all that familiar with the current schema representation of lists (and not at all familiar with the older one) would to help provide context (ideally in comments) of what this PR is doing

For example, for me to understand comments like this, I need to go into the referenced URL and then try and match the terminology used there to what is used in this repo and PR. While I can do that it will take time and time is the thing I seem to have the least of

    /// This is used to convert [deprecated repeated columns] (not in a list), into their arrow representation

Maybe it could give an example so that the code was more self contained. For example, an example of the current List representation and the old (deprecated) representation?

etseidl · 2026-02-12T00:56:44Z

I'm trying to come up to speed on this, but an early observation is that one test (backward_compat_list_struct_with_nested_repeated_primitive_respects_arrow_hint) appears to violate this line in the spec:

For all fields in the schema, implementations should use either LIST and MAP annotations or unannotated repeated fields, but not both. When using the annotations, no unannotated repeated types are allowed.

From the test in question:

        // This is a backward-compatible LIST (rule 4) where the struct element contains
        // a repeated primitive. The arrow hint specifies that the inner repeated primitive
        // should be LargeList<Int32>.
        let message_type = "
            message schema {
                optional group my_list (LIST) {
                    repeated group my_list_tuple {
                        required binary str (STRING);
                        repeated int32 values;
                    }
                }
            }
        ";

That said, plugging the tests from this PR into main without the fix yields

failures:
    arrow::schema::complex::tests::backward_compat_list_struct_with_nested_repeated_primitive_respects_arrow_hint
    arrow::schema::complex::tests::convert_schema_with_nested_repeated_struct_and_primitives
    arrow::schema::complex::tests::convert_schema_with_repeated_primitive_should_use_inferred_schema
    arrow::schema::complex::tests::convert_schema_with_repeated_primitive_should_use_inferred_schema_for_list_as_well
    arrow::schema::complex::tests::convert_schema_with_repeated_struct_and_inferred_schema
    arrow::schema::complex::tests::convert_schema_with_repeated_struct_and_inferred_schema_and_field_id

test result: FAILED. 24 passed; 6 failed; 0 ignored; 0 measured; 918 filtered out; finished in 0.64s

Other than the test mentioned above, the other failing tests seem like they should succeed.

…primitive-with-inferred-schema

rluvaton · 2026-03-29T22:25:18Z

@etseidl updated, thanks for catching that!

alamb

Looks good to me -- thank you @rluvaton @etseidl and @martin-g

…primitive-with-inferred-schema

alamb · 2026-04-01T10:50:17Z

run benchmarks arrow_reader arrow_reader_clickbench

adriangbot · 2026-04-01T10:53:03Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4169205369-643-gh852 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing fix-reading-backward-compat-repeated-struct-primitive-with-inferred-schema (090b00e) to 61b5763 (merge-base) diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-01T10:53:20Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4169205369-644-9729b 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing fix-reading-backward-compat-repeated-struct-primitive-with-inferred-schema (090b00e) to 61b5763 (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-01T11:19:36Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                                             fix-reading-backward-compat-repeated-struct-primitive-with-inferred-schema    main
-----                                             --------------------------------------------------------------------------    ----
arrow_reader_clickbench/async/Q1                  1.00   1082.9±4.14µs        ? ?/sec                                           1.01  1099.0±25.67µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.00      6.4±0.04ms        ? ?/sec                                           1.01      6.5±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.00      7.4±0.06ms        ? ?/sec                                           1.02      7.5±0.05ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.00     14.2±0.07ms        ? ?/sec                                           1.00     14.3±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.00     16.8±0.07ms        ? ?/sec                                           1.01     17.0±0.09ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.00     15.7±0.05ms        ? ?/sec                                           1.01     15.9±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.00      3.1±0.04ms        ? ?/sec                                           1.01      3.1±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.06     94.1±1.98ms        ? ?/sec                                           1.00     88.9±8.01ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.28    104.7±6.80ms        ? ?/sec                                           1.00     81.6±0.35ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.00    117.8±2.79ms        ? ?/sec                                           1.03    120.9±2.76ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.00    238.2±0.53ms        ? ?/sec                                           1.03    246.4±0.71ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.00     19.1±0.08ms        ? ?/sec                                           1.00     19.2±0.11ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.00     56.7±0.15ms        ? ?/sec                                           1.02     57.6±0.39ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.00     56.9±0.12ms        ? ?/sec                                           1.02     57.9±0.46ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.01     18.6±1.40ms        ? ?/sec                                           1.00     18.4±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.00     14.8±0.13ms        ? ?/sec                                           1.02     15.2±0.25ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.00      5.4±0.02ms        ? ?/sec                                           1.00      5.4±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.00     13.1±0.22ms        ? ?/sec                                           1.04     13.6±0.29ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.00     24.0±0.27ms        ? ?/sec                                           1.04     25.1±0.44ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.00      5.8±0.04ms        ? ?/sec                                           1.01      5.8±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.00      5.0±0.05ms        ? ?/sec                                           1.01      5.1±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.00      3.6±0.02ms        ? ?/sec                                           1.00      3.5±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.00   1056.6±7.44µs        ? ?/sec                                           1.02   1073.1±3.41µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.00      6.3±0.05ms        ? ?/sec                                           1.01      6.3±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.00      7.2±0.04ms        ? ?/sec                                           1.01      7.3±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.00     14.2±0.08ms        ? ?/sec                                           1.00     14.2±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.01     16.9±0.13ms        ? ?/sec                                           1.00     16.7±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.01     15.8±0.10ms        ? ?/sec                                           1.00     15.7±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.00      3.0±0.02ms        ? ?/sec                                           1.01      3.0±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.00     71.0±0.26ms        ? ?/sec                                           1.01     71.6±0.55ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.00     79.6±0.25ms        ? ?/sec                                           1.02     81.0±3.18ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.00     96.3±0.21ms        ? ?/sec                                           1.03     99.5±0.98ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.00    217.3±0.99ms        ? ?/sec                                           1.23    267.2±5.38ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.00     19.1±0.18ms        ? ?/sec                                           1.01     19.4±0.17ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.00     56.2±0.26ms        ? ?/sec                                           1.02     57.3±0.53ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.00     56.4±0.20ms        ? ?/sec                                           1.02     57.5±0.42ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.00     18.1±0.05ms        ? ?/sec                                           1.01     18.3±0.09ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.00     14.3±0.15ms        ? ?/sec                                           1.03     14.8±0.29ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.01      5.3±0.02ms        ? ?/sec                                           1.00      5.3±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.00     12.9±0.23ms        ? ?/sec                                           1.02     13.1±0.28ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.00     23.2±0.21ms        ? ?/sec                                           1.04     24.2±0.63ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.00      5.5±0.02ms        ? ?/sec                                           1.05      5.7±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.00      4.8±0.02ms        ? ?/sec                                           1.02      4.9±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.00      3.4±0.02ms        ? ?/sec                                           1.01      3.5±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00    865.0±1.45µs        ? ?/sec                                           1.01    877.5±2.31µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.00      5.0±0.05ms        ? ?/sec                                           1.01      5.1±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.00      5.9±0.04ms        ? ?/sec                                           1.01      6.0±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.01     21.6±0.09ms        ? ?/sec                                           1.00     21.5±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.00     28.3±0.78ms        ? ?/sec                                           1.05     29.8±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.00     22.9±0.09ms        ? ?/sec                                           1.00     22.9±0.12ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.00      2.7±0.03ms        ? ?/sec                                           1.00      2.7±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.00    119.9±0.19ms        ? ?/sec                                           1.02    122.5±0.87ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.00     91.3±0.12ms        ? ?/sec                                           1.02     93.1±0.36ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.02    141.4±1.02ms        ? ?/sec                                           1.00    138.6±1.14ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.00   273.3±13.29ms        ? ?/sec                                           1.02   278.7±14.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.09     29.4±0.12ms        ? ?/sec                                           1.00     26.9±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.00    105.7±0.15ms        ? ?/sec                                           1.03    108.7±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.00    103.2±0.12ms        ? ?/sec                                           1.03    106.5±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.00     18.6±0.07ms        ? ?/sec                                           1.01     18.9±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.00     21.9±0.06ms        ? ?/sec                                           1.01     22.1±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.00      6.8±0.01ms        ? ?/sec                                           1.00      6.8±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.00     11.2±0.04ms        ? ?/sec                                           1.01     11.3±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.00     20.6±0.10ms        ? ?/sec                                           1.01     20.9±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.00      5.2±0.02ms        ? ?/sec                                           1.01      5.2±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.00      5.6±0.04ms        ? ?/sec                                           1.01      5.6±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.00      4.3±0.03ms        ? ?/sec                                           1.01      4.4±0.02ms        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	783.9s
Peak memory	3.1 GiB
Avg memory	3.0 GiB
CPU user	705.1s
CPU sys	77.7s
Disk read	0 B
Disk write	2.0 GiB

branch

Metric	Value
Wall time	776.9s
Peak memory	3.3 GiB
Avg memory	3.1 GiB
CPU user	709.3s
CPU sys	67.7s
Disk read	0 B
Disk write	171.3 MiB

File an issue against this benchmark runner

rluvaton · 2026-04-06T11:04:28Z

can we merge this PR?

alamb · 2026-04-06T19:07:47Z

Yes, merged!

fix(parquet): converting parquet schema with backward compatible repe…

1bcea51

…ated struct/primitive with provided arrow schema closes: - apache#8495

github-actions bot added the parquet Changes to the parquet crate label Sep 29, 2025

rluvaton added 2 commits September 29, 2025 19:25

add tests with inferring list types as well

cc5ab0d

add more tests and fix bug

fb2579a

rluvaton marked this pull request as ready for review September 29, 2025 18:33

rluvaton added 7 commits September 29, 2025 21:34

format

297cf65

added more tests as I know I have a bug

85d84b6

format

52eceed

add more tests

d299ca9

fix

1aa573f

Merge branch 'main' into fix-reading-backward-compat-repeated-struct-…

65a0bd6

…primitive-with-inferred-schema # Conflicts: # parquet/src/arrow/schema/complex.rs

align with main

ab54f02

rluvaton commented Sep 29, 2025

View reviewed changes

Comment thread parquet/src/arrow/schema/complex.rs Outdated

rluvaton added 7 commits September 30, 2025 15:11

set map align

f0716dd

add more tests from spark codebase for more coverage

470acbb

rename

03dfa54

format

301c7a7

remove tests from ignore and added more comments

39b9263

Merge branch 'main' into fix-reading-backward-compat-repeated-struct-…

d8d9f64

…primitive-with-inferred-schema

remove or

59303fd

alamb mentioned this pull request Sep 30, 2025

Column with List(Struct) causes failed to decode level data for struct array (regression in 56) #8404

Closed

alamb reviewed Sep 30, 2025

View reviewed changes

add link to deprecated repeated columns and replace panic with error

4d7485a

rluvaton mentioned this pull request Oct 28, 2025

Release DataFusion 51.0.0 (Nov 2025) apache/datafusion#17558

Closed

37 tasks

martin-g reviewed Oct 28, 2025

View reviewed changes

alamb mentioned this pull request Nov 9, 2025

Release DataFusion 52.0.0 (Dec 2025 / Jan 2026) apache/datafusion#18566

Closed

41 tasks

alamb mentioned this pull request Jan 7, 2026

Release DataFusion 53.0.0 (Feb 2026 / Mar 2026) apache/datafusion#19692

Closed

26 tasks

rluvaton added 3 commits February 4, 2026 11:58

Merge branch 'main' into fix-reading-backward-compat-repeated-struct-…

c1ee664

…primitive-with-inferred-schema

add comment and test why changing it to inherit the value is wrong

3eaf7a7

format

9fb3a93

alamb reviewed Feb 11, 2026

View reviewed changes

Comment thread parquet/src/arrow/schema/complex.rs

rluvaton added 5 commits March 5, 2026 13:06

Merge branch 'main' into fix-reading-backward-compat-repeated-struct-…

ba0e1ef

…primitive-with-inferred-schema

remove bad test

ef7d611

update comment

e322de0

update comment

b3bc31c

update comment

6d9e68c

alamb mentioned this pull request Mar 20, 2026

Release DataFusion 54.0.0 (Apr 2026 / May 2026) apache/datafusion#21080

Open

26 tasks

alamb approved these changes Apr 1, 2026

View reviewed changes

Merge branch 'main' into fix-reading-backward-compat-repeated-struct-…

090b00e

…primitive-with-inferred-schema

alamb merged commit 871c6d2 into apache:main Apr 6, 2026
16 checks passed

rluvaton deleted the fix-reading-backward-compat-repeated-struct-primitive-with-inferred-schema branch April 9, 2026 13:36


	let merged = field.with_metadata(hint.metadata().clone());
	try_add_extension_type(merged, parquet_type)

Conversation

rluvaton commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

etseidl Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

rluvaton Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

rluvaton Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 8, 2025

Uh oh!

rluvaton commented Oct 26, 2025

Uh oh!

alamb commented Oct 28, 2025

Uh oh!

Uh oh!

martin-g Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

rluvaton Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

rluvaton commented Feb 4, 2026

Uh oh!

alamb commented Feb 11, 2026

Uh oh!

Uh oh!

etseidl commented Feb 12, 2026

Uh oh!

rluvaton commented Mar 29, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Apr 1, 2026

Uh oh!

adriangbot commented Apr 1, 2026

Uh oh!

adriangbot commented Apr 1, 2026

Uh oh!

adriangbot commented Apr 1, 2026

Uh oh!

rluvaton commented Apr 6, 2026

Uh oh!

alamb commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rluvaton commented Sep 29, 2025 •

edited

Loading

rluvaton Oct 8, 2025 •

edited

Loading