8789: [Parquet] Avoid copying LogicalType in ColumnOrder::get_sort_order, deprecate get_logical_type#15
8789: [Parquet] Avoid copying LogicalType in ColumnOrder::get_sort_order, deprecate get_logical_type#15martin-augment wants to merge 5 commits intomainfrom
LogicalType in ColumnOrder::get_sort_order, deprecate get_logical_type#15Conversation
…recate logical_type()
Co-authored-by: Ed Seidl <etseidl@users.noreply.github.com>
WalkthroughMigrates the parquet crate's logical type access APIs from owned values to borrowed references. Replaces Changes
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
|
||
| fn from_int32(info: &BasicTypeInfo, scale: i32, precision: i32) -> Result<DataType> { | ||
| match (info.logical_type(), info.converted_type()) { | ||
| match (info.logical_type_ref(), info.converted_type()) { |
There was a problem hiding this comment.
In from_int32, the arm => match (bit_width, is_signed) is matching references (from logical_type_ref()) against value patterns, which will not match/compile; consider dereferencing before matching or using reference patterns (also applies to similar cases in this function).
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:annoying; category:bug; feedback:The Augment AI reviewer is not correct! Here the types are i32 which implement Copy trait and there is no need to dereference them
|
|
||
| fn from_int64(info: &BasicTypeInfo, scale: i32, precision: i32) -> Result<DataType> { | ||
| match (info.logical_type(), info.converted_type()) { | ||
| match (info.logical_type_ref(), info.converted_type()) { |
There was a problem hiding this comment.
In from_int64, the arm => match is_signed { ... } matches a &bool (from logical_type_ref()) against true/false; this likely won’t compile without dereferencing or changing the patterns to reference forms.
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:annoying; category:bug; feedback:The Augment AI reviewer is not correct! Here the types are i64 which implement Copy trait and there is no need to dereference them
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
parquet/src/geospatial/accumulator.rs (1)
35-38: Fix pattern matching onOption<&LogicalType>: missing&causes a type mismatch.
logical_type_ref()returnsOption<&LogicalType>. When matchingSome(_)patterns, you must match the reference form with&.Apply this diff:
@@ - if !matches!( - descr.logical_type_ref(), - Some(LogicalType::Geometry { .. }) | Some(LogicalType::Geography { .. }) - ) { + if !matches!( + descr.logical_type_ref(), + Some(&LogicalType::Geometry { .. }) | Some(&LogicalType::Geography { .. }) + ) { return None; } @@ - if let Some(crate::basic::LogicalType::Geometry { .. }) = _descr.logical_type_ref() { + if let Some(&LogicalType::Geometry { .. }) = _descr.logical_type_ref() { Box::new(ParquetGeoStatsAccumulator::default()) } else { Box::new(VoidGeoStatsAccumulator::default()) }parquet/src/basic.rs (1)
1147-1177: Add tests and migrate deprecated API call sites.Verification confirms the new
sort_order_for_typeAPI is correct and the recommendations are valid:
- Add test: No direct tests exist for
sort_order_for_type; the suggested test syntax is correct (fields arebit_width: i8andis_signed: bool).- Migrate call sites: Found exactly 2 calls to deprecated
get_sort_order:
- Line 2158 in
test_column_order_get_logical_type_sort_order- Line 2261 in
test_column_order_get_converted_type_sort_orderBoth call sites are in tests and should be migrated to
sort_order_for_typeto eliminate deprecation warnings.parquet/src/arrow/schema/extension.rs (1)
42-61: Fix pattern matching on&LogicalType(compile-time bug).
logical_type_ref()returnsOption<&LogicalType>, but the match arms pattern-match as if on owned values. Use reference patterns (e.g.,&LogicalType::Uuid) or deref the scrutinee.Apply this diff:
@@ - let Some(parquet_logical_type) = parquet_type.get_basic_info().logical_type_ref() else { + let Some(parquet_logical_type) = parquet_type.get_basic_info().logical_type_ref() else { return Ok(arrow_field); }; - match parquet_logical_type { + match parquet_logical_type { #[cfg(feature = "variant_experimental")] - LogicalType::Variant { .. } => { + &LogicalType::Variant { .. } => { arrow_field.try_with_extension_type(parquet_variant_compute::VariantType)?; } #[cfg(feature = "arrow_canonical_extension_types")] - LogicalType::Uuid => { + &LogicalType::Uuid => { arrow_field.try_with_extension_type(arrow_schema::extension::Uuid)?; } #[cfg(feature = "arrow_canonical_extension_types")] - LogicalType::Json => { + &LogicalType::Json => { arrow_field.try_with_extension_type(arrow_schema::extension::Json::default())?; } _ => {} }; @@ - let Some(parquet_logical_type) = parquet_type.get_basic_info().logical_type_ref() else { + let Some(parquet_logical_type) = parquet_type.get_basic_info().logical_type_ref() else { return false; }; - match parquet_logical_type { + match parquet_logical_type { #[cfg(feature = "variant_experimental")] - LogicalType::Variant { .. } => true, + &LogicalType::Variant { .. } => true, #[cfg(feature = "arrow_canonical_extension_types")] - LogicalType::Uuid => true, + &LogicalType::Uuid => true, #[cfg(feature = "arrow_canonical_extension_types")] - LogicalType::Json => true, + &LogicalType::Json => true, _ => false, }Also applies to: 68-80
🧹 Nitpick comments (1)
parquet/src/file/metadata/writer.rs (1)
159-164: Correct migration in writer.rs; deprecated method still used in tests.The migration to
sort_order_for_typein writer.rs (lines 159-164) is correct. The scan found two test utilities inparquet/src/basic.rs(lines 2158, 2261) still calling the deprecatedget_sort_order, which internally delegates tosort_order_for_type. These tests remain functional but could be updated for consistency:
- Line 2158:
check_sort_ordertesting withLogicalType- Line 2261:
check_sort_ordertesting withConvertedType
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
parquet/src/arrow/arrow_reader/mod.rs(1 hunks)parquet/src/arrow/schema/extension.rs(2 hunks)parquet/src/arrow/schema/mod.rs(3 hunks)parquet/src/arrow/schema/primitive.rs(8 hunks)parquet/src/basic.rs(1 hunks)parquet/src/file/metadata/thrift/mod.rs(3 hunks)parquet/src/file/metadata/writer.rs(1 hunks)parquet/src/file/serialized_reader.rs(1 hunks)parquet/src/geospatial/accumulator.rs(1 hunks)parquet/src/schema/printer.rs(2 hunks)parquet/src/schema/types.rs(6 hunks)parquet/src/variant.rs(1 hunks)parquet/tests/geospatial.rs(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
parquet/src/arrow/arrow_reader/mod.rs (1)
parquet/src/schema/types.rs (2)
logical_type_ref(723-725)logical_type_ref(938-940)
parquet/src/file/metadata/thrift/mod.rs (2)
parquet/src/schema/types.rs (1)
sort_order(979-985)parquet/src/basic.rs (2)
sort_order(1239-1245)sort_order_for_type(1148-1178)
parquet/src/arrow/schema/mod.rs (1)
parquet/src/schema/types.rs (2)
logical_type_ref(723-725)logical_type_ref(938-940)
parquet/src/file/metadata/writer.rs (2)
parquet/src/schema/types.rs (1)
sort_order(979-985)parquet/src/basic.rs (2)
sort_order(1239-1245)sort_order_for_type(1148-1178)
parquet/src/variant.rs (1)
parquet/src/schema/types.rs (3)
get_basic_info(95-100)logical_type_ref(723-725)logical_type_ref(938-940)
parquet/src/basic.rs (2)
parquet/src/schema/types.rs (5)
logical_type(717-720)logical_type(930-935)converted_type(705-707)converted_type(918-920)physical_type(944-949)parquet/src/file/statistics.rs (1)
physical_type(462-473)
parquet/src/file/serialized_reader.rs (2)
parquet/src/file/metadata/mod.rs (1)
schema(563-565)parquet/src/schema/types.rs (2)
logical_type_ref(723-725)logical_type_ref(938-940)
parquet/src/schema/types.rs (1)
parquet/src/basic.rs (1)
sort_order_for_type(1148-1178)
parquet/tests/geospatial.rs (3)
parquet/src/file/metadata/mod.rs (1)
column_descr(971-973)parquet/src/schema/types.rs (2)
logical_type(717-720)logical_type(930-935)parquet/src/variant.rs (1)
read_metadata(253-257)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: codex
- GitHub Check: claude-review
🔇 Additional comments (6)
parquet/src/file/serialized_reader.rs (1)
2707-2709: Tests updated to reference semantics look correct.Asserting against
Some(&LogicalType::...)matches the newlogical_type_ref()API and avoids cloning. LGTM.Also applies to: 2711-2713
parquet/src/schema/printer.rs (2)
283-289: Signature update to borrowed logical type is appropriate.Accepting
Option<&LogicalType>avoids needless cloning and aligns with the ref-based API. LGTM.
403-407: Call-sites updated correctly tological_type_ref().Both primitive and group branches now pass borrowed logical types to the formatter. Looks good.
Also applies to: 429-433
parquet/src/arrow/arrow_reader/mod.rs (1)
5052-5058: Tests aligned with reference-based API (LGTM).Assertions against
Option<&LogicalType>are correct and consistent withlogical_type_ref(). No further action needed.parquet/src/file/metadata/thrift/mod.rs (2)
1355-1363: Serialize logical_type viacloned()(LGTM).Using
basic_info.logical_type_ref().cloned()correctly preserves ownership semantics for Thrift serialization while avoiding earlier copies on access. Looks good.Also applies to: 1384-1391
787-793: Correctly compute type-defined sort order without cloning.The code at lines 787–793 is correct. Switching to
ColumnOrder::sort_order_for_type(column.logical_type_ref(), ...)properly avoids cloningLogicalType.For cleaner code, consider this optional refactor:
- let sort_order = ColumnOrder::sort_order_for_type( - column.logical_type_ref(), - column.converted_type(), - column.physical_type(), - ); - cos[i] = ColumnOrder::TYPE_DEFINED_ORDER(sort_order); + cos[i] = ColumnOrder::TYPE_DEFINED_ORDER(column.sort_order());The
Columntype has asort_order()method (parquet/src/schema/types.rs:979) that encapsulates this exact call.Note: Deprecated
get_sort_order()calls remain in parquet/src/basic.rs (lines 2158, 2261) but are outside this file's scope.
| #[deprecated( | ||
| since = "57.1.0", | ||
| note = "use `ColumnOrder::sort_order_for_type` instead" | ||
| )] | ||
| pub fn get_sort_order( | ||
| logical_type: Option<LogicalType>, | ||
| converted_type: ConvertedType, | ||
| physical_type: Type, | ||
| ) -> SortOrder { | ||
| // TODO: Should this take converted and logical type, for compatibility? | ||
| Self::sort_order_for_type(logical_type.as_ref(), converted_type, physical_type) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
Deprecation is fine; verify the version string before tagging.
#[deprecated(since = "57.1.0", ..)] is okay now, but please confirm the crate version at release time to avoid stale metadata.
Run to spot-check before cutting a release:
🏁 Script executed:
#!/bin/bash
# Verify any deprecated attrs reference the intended next version
rg -nP 'deprecated\(\s*since\s*=\s*"[^"]+"\s*,\s*note' -C1Length of output: 4650
Wait, I need to run this with the repo cloned. Let me correct:
🏁 Script executed:
#!/bin/bash
# Check parquet crate version from Cargo.toml
fd -n "Cargo.toml" -x grep -A5 "^\[package\]" {} \; | head -20
# Also search for all deprecated attributes including multiline ones using rg
echo "=== All deprecated attributes in parquet/src/basic.rs ==="
rg -nP '#\[deprecated' parquet/src/basic.rs -A3Length of output: 604
🏁 Script executed:
#!/bin/bash
# Find and display parquet crate version
find . -path "*/parquet/Cargo.toml" -exec grep -A 5 '^\[package\]' {} \; | head -20
# Check all deprecated attrs in parquet/src/basic.rs with full context
rg -nP '#\[deprecated' parquet/src/basic.rs -A5 --multiline-dotallLength of output: 654
---
🏁 Script executed:
```shell
#!/bin/bash
# Find workspace version
grep -A 20 '^\[workspace' Cargo.toml | head -30
# Also check if there's a workspace.package section with version
grep -A 50 '^\[workspace.package\]' Cargo.toml | grep version
Length of output: 2255
Version mismatch detected—confirm deprecation since version before release.
The deprecation attribute uses since = "57.1.0" but the workspace version is currently 57.0.0. Either the version should be 57.0.0 (if deprecating in the current release) or confirm this is intentionally forward-dated for a planned 57.1.0 patch release. This requires clarification before cutting any release.
🤖 Prompt for AI Agents
In parquet/src/basic.rs around lines 1135 to 1145, the deprecation attribute on
get_sort_order uses since = "57.1.0" which doesn't match the workspace version
57.0.0; change the attribute to since = "57.0.0" if you intend this deprecation
to apply in the current release, or confirm and document that the forward-dated
"57.1.0" is intentional for a planned patch release (and leave as-is). Ensure
the final change is reflected in the commit message and any release notes.
There was a problem hiding this comment.
value:annoying; category:bug; feedback: The CodeRabbit AI reviewer is not correct! According to https://crates.io/crates/arrow v57.0.0 is the latest released one (https://crates.io/crates/arrow/57.0.0), so the next one would be 57.1.0 or 58.0.0
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
crates.io arrow crate version 57.0.0
💡 Result:
The Apache Arrow Rust crate version 57.0.0 was released and published to crates.io on October 30, 2025. Notable items in the release include a much faster Parquet metadata parser, a new arrow-avro crate, and Parquet Variant/Geometry support. See the release announcement and docs for details. [1][2][3]
Sources:
- Apache Arrow release announcement (Oct 30, 2025). [1]
- ARROW_VERSION / docs showing "57.0.0". [2]
- Docs.rs / crate documentation for arrow (57.0.0). [3]
@martin-augment, you're absolutely correct, and I apologize for the incorrect analysis.
Version 57.0.0 is indeed the latest released version on crates.io, published on October 30, 2025. Using since = "57.1.0" in the deprecation attribute is the correct approach because:
- Standard practice: Deprecation attributes should reference the version in which the deprecation will be released, not the current development version
- Next release: Since this PR is being developed after the 57.0.0 release, it will be part of either 57.1.0 or 58.0.0
- Forward-looking: The
sinceparameter tells users "this was deprecated starting from version X", which will be accurate when the next version is published
The deprecation annotation is properly configured for the upcoming release. My previous comment incorrectly flagged this as a version mismatch when it's actually following best practices. Thank you for the correction!
|
value:annoying; category:bug; feedback: The Codex AI reviewer is commenting on files which are not changed in this Pull Request. |
8789: To review by AI
Note
Replace cloning
LogicalTypeaccess with ref-based APIs (*_logical_type_ref), introduceColumnOrder::sort_order_for_type, deprecate old cloning getters, and update readers/writers/tests accordingly.ColumnOrder::sort_order_for_type(Option<&LogicalType>, ...)and deprecateColumnOrder::get_sort_order.BasicTypeInfo::logical_type()andColumnDescriptor::logical_type(); prefer*_logical_type_ref.logical_type_ref()throughout (metadata writer, thrift (read/write), schema printer), passing refs instead of cloning.sort_order_for_type.logical_type_ref(); fix decimal/time handling by dereferencing fields where needed.logical_type_ref()when choosing accumulators.logical_type_ref(); adjust expectations accordingly across Arrow/serialized readers, schema, geospatial, variant tests.Written by Cursor Bugbot for commit 0e5c0c7. This will update automatically on new commits. Configure here.