8839: Add support for `Union` types in `RowConverter` by martin-augment · Pull Request #20 · martin-augment/arrow-rs

martin-augment · 2025-11-14T10:44:33Z

8839: To review by AI

coderabbitai · 2025-11-14T10:44:57Z

Walkthrough

The pull request adds comprehensive support for Union data types to the row-based encoding/decoding system. Changes include: a new Union variant in the Codec enum storing per-type-branch converters and mode information; extension of the Encoder enum with a Union variant containing child rows, type identifiers, offsets, and union mode; Union handling throughout the codec lifecycle including construction, encoding/decoding, and length calculation; and validation that Union types are supported in sort operations. The implementation handles both Dense and Sparse union modes and includes tests for round-trip encoding, type preservation, and ordering semantics.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-8839-2025-11-14-10-44-25

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

augmentcode

Review completed. 4 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2025-11-14T10:47:08Z

arrow-row/src/lib.rs

+
+                let mut child_rows = Vec::with_capacity(converters.len());
+                for (type_id, converter) in converters.iter().enumerate() {
+                    let child_array = union_array.child(type_id as i8);


Using the enumerate index as type_id when calling union_array.child(type_id as i8) assumes union type IDs are dense and aligned with field order. Consider deriving the mapping from the actual union field type IDs; otherwise this may select the wrong child (or panic) when type IDs are non-contiguous.

_{🤖 Was this useful? React with 👍 or 👎}

value:incorrect-but-reasonable; category:bug; feedback:The Augment AI reviewer is not correct. For sparse unions the index could be used to get the variant/field of the union since sparse unions contain all its variants. Dense unions contain only an offset (the variant index) and the value for that variant. https://github.com/apache/arrow-rs/pull/8839/files#diff-8235802124a5a88240195fc4b8792cf39bb9a8703eb69308d10358966eeff765R1607 has the logic responsible for this.

augmentcode · 2025-11-14T10:47:08Z

arrow-row/src/lib.rs

+                        }
+                    };
+
+                    let child_row = child_rows[type_id as usize].row(child_row_i);


Indexing child_rows by type_id as usize assumes type IDs are 0..N-1 and match the vector order. This can read the wrong child rows for unions with non-dense or non-zero-based type IDs (also applies to similar indexing in encode_column).

_{🤖 Was this useful? React with 👍 or 👎}

value:incorrect-but-reasonable; category:bug; feedback:The Augment AI reviewer is not correct. For sparse unions the index could be used to get the variant/field of the union since sparse unions contain all its variants. Dense unions contain only an offset (the variant index) and the value for that variant. https://github.com/apache/arrow-rs/pull/8839/files#diff-8235802124a5a88240195fc4b8792cf39bb9a8703eb69308d10358966eeff765R1607 has the logic responsible for this.

augmentcode · 2025-11-14T10:47:08Z

arrow-row/src/lib.rs

+            mode,
+        } => {
+            let _union_array = column.as_any().downcast_ref::<UnionArray>().unwrap();
+            let null_sentinel = if opts.descending { 0x00 } else { 0x01 };


The first byte (null sentinel) is set solely based on opts.descending and not on validity, unlike other encoders that write 0x01 for valid and null_sentinel(opts) for null. This can produce inconsistent ordering/round-tripping for descending sorts and arrays with nulls.

_{🤖 Was this useful? React with 👍 or 👎}

value:annoying; category:bug; feedback:The Augment AI reviewer is not correct. The other encoders use the null_first option to calculate the sentinel value. See https://github.com/pydantic/arrow-rs/blob/55590118f1492addfbb40a736bdf7d67123d303f/arrow-row/src/lib.rs

augmentcode · 2025-11-14T10:47:08Z

arrow-row/src/lib.rs

+                    if options.descending { !id } else { id }
+                };
+
+                let type_id = type_id_byte as i8;


Using type_id as usize to index rows_by_field/null_rows assumes dense, zero-based type IDs aligned with converters; this will misplace rows and build incorrect arrays for unions with arbitrary type IDs. (Related to the mapping issue noted above.)

_{🤖 Was this useful? React with 👍 or 👎}

value:incorrect-but-reasonable; category:bug; feedback:The Augment AI reviewer is not correct. For sparse unions the index could be used to get the variant/field of the union since sparse unions contain all its variants. Dense unions contain only an offset (the variant index) and the value for that variant. https://github.com/apache/arrow-rs/pull/8839/files#diff-8235802124a5a88240195fc4b8792cf39bb9a8703eb69308d10358966eeff765R1607 has the logic responsible for this.

claude · 2025-11-14T10:47:39Z

Code Review: Add support for Union types in RowConverter

Thank you for implementing Union type support in the RowConverter! This is a valuable addition. Here's my detailed review:

Code Quality and Best Practices

Positive Aspects

✅ Consistent with existing patterns: The implementation follows the same approach used for Struct and Dictionary types, maintaining code consistency
✅ Handles both Union modes: Properly supports both Dense and Sparse union modes
✅ Good test coverage: Three comprehensive tests cover sparse, dense, and ordering scenarios

Issues Found

1. Typo in comment (arrow-row/src/lib.rs:532)

// since the encodedc ontents will be inverted if descending is set

Should be: "encoded contents" (missing space)

2. Critical Bug: Type ID mapping issue (arrow-row/src/lib.rs:634)

for (type_id, converter) in converters.iter().enumerate() {
    let child_array = union_array.child(type_id as i8);

Problem: This assumes type_ids are sequential starting from 0, but Arrow Union types can have arbitrary type_ids (e.g., type_ids could be [5, 10, 15]). The current code would:

Enumerate as 0, 1, 2
Try to access children with these indices
Miss the actual type_id values

Fix needed: Use the actual type_ids from the union fields:

for (type_id, _field) in union_fields.iter() {
    let child_array = union_array.child(type_id);
    let converter = &converters[/* need mapping */];

You'll need a mapping from type_id to converter index.

3. Same type ID mapping issue in decode (arrow-row/src/lib.rs:1912-1931)

The rows_by_field vector is indexed by field_idx = type_id as usize, which fails for non-sequential type IDs.

4. Same issue in child_arrays iteration (arrow-row/src/lib.rs:1939)

for (field_idx, converter) in converters.iter().enumerate() {

This assumes field indices match array positions, but with arbitrary type_ids this breaks.

Potential Bugs

Critical

Non-sequential type IDs: The fundamental assumption that type_ids map to array indices is incorrect. Union type_ids can be arbitrary i8 values. This will cause panics or incorrect data for unions with non-sequential type IDs.

Minor

Unreachable pattern (arrow-row/src/lib.rs:1750): The foreign pattern match is good defensive programming, but consider if there are edge cases during array construction that could trigger this.

Performance Considerations

Concerns

Multiple Vec allocations in decode: Lines 1911-1912 allocate vectors that could be pre-sized if we know the maximum type_id
Sparse mode null row insertion (line 1963-1975): The current approach iterates through all rows and does peek operations. Consider if a more efficient approach exists.

Positive

The encoding approach is efficient and follows existing patterns
Proper pre-allocation with Vec::with_capacity in most places

Security Concerns

⚠️ Unsafe code usage: The decode function uses unsafe blocks (line 1845, 1956, 1978). While this follows the existing pattern in the codebase:

Ensure convert_raw handles malformed row data gracefully
The type_id byte is read directly from user data - validate it's within bounds before using as array index
Add bounds checking before indexing rows_by_field[field_idx]

Test Coverage

Good Coverage ✅

Sparse unions: Basic functionality
Dense unions: Offset handling
Ordering: Comparison behavior across types

Missing Coverage ⚠️

Non-sequential type IDs: Critical test case missing
```
// Test with type_ids = [5, 10] instead of [0, 1]
```
Null handling: What happens when union fields themselves are nullable?
Empty unions: Edge case with no elements
Descending sort order: Tests only use default ascending order
Nested unions: Union within union scenarios
Round-trip with nulls_first option: Only default options tested

Recommendations

Must Fix

Fix the type ID mapping issue - this is a blocker
Add test for non-sequential type IDs

Should Fix

Fix the typo in the comment
Add bounds checking in decode to prevent panics on malformed data
Add tests for descending order and nulls_first options

Nice to Have

Consider performance optimization for sparse null insertion
Add documentation explaining the encoding format (type_id byte + child row bytes)

Overall Assessment

This is a solid implementation that follows the existing codebase patterns well. However, the type ID mapping bug is critical and must be fixed before merging. The fix will require:

Creating a HashMap or similar mapping from type_id -> converter index
Updating all three locations where this mapping is needed
Adding tests to catch this issue

Great work overall! Please address the critical bug and consider the other suggestions.

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4f421d1 and 5559011.

📒 Files selected for processing (1)

arrow-row/src/lib.rs (10 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

arrow-row/src/lib.rs (2)

arrow-array/src/array/mod.rs (3)

new_null_array (874-876)

offset (191-191)

new_empty_array (856-859)

arrow-array/src/array/union_array.rs (11)

nulls (460-460)

nulls (569-572)

nulls (777-779)

type_ids (267-269)

type_ids (462-465)

offsets (272-274)

type_id (261-264)

offset (773-775)

union_fields (2202-2210)

Arc (1293-1299)

Arc (2239-2244)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: claude-review
GitHub Check: Cursor Bugbot

🔇 Additional comments (1)

arrow-row/src/lib.rs (1)

1743-1743: Verify null sentinel logic for Union encoding.

Line 1743 uses if opts.descending { 0x00 } else { 0x01 } for the null_sentinel, which differs from the pattern used in other codecs that call the null_sentinel(opts) function (lines 1379-1384).

The null_sentinel() function considers both descending and nulls_first options. However, this appears to be encoding the "valid" sentinel (0x01) rather than the null sentinel (0x00), which might be intentionally different from struct encoding at line 1695.

Please verify this is correct and consider adding a comment explaining why Union uses a different pattern than other nested types like Struct.

coderabbitai · 2025-11-14T10:48:37Z

arrow-row/src/lib.rs

+            DataType::Union(fields, mode) => {
+                // similar to dictionaries and lists, we set descending to false and negate nulls_first
+                // since the encodedc ontents will be inverted if descending is set
+                let options = SortOptions {
+                    descending: false,
+                    nulls_first: sort_field.options.nulls_first != sort_field.options.descending,
+                };
+
+                let mut converters = Vec::with_capacity(fields.len());
+                let mut null_rows = Vec::with_capacity(fields.len());
+
+                for (_type_id, field) in fields.iter() {
+                    let sort_field =
+                        SortField::new_with_options(field.data_type().clone(), options);
+                    let converter = RowConverter::new(vec![sort_field])?;
+
+                    let null_array = new_null_array(field.data_type(), 1);
+                    let nulls = converter.convert_columns(&[null_array])?;
+                    let owned = OwnedRow {
+                        data: nulls.buffer.into(),
+                        config: nulls.config,
+                    };
+
+                    converters.push(converter);
+                    null_rows.push(owned);
+                }
+
+                Ok(Self::Union(converters, null_rows, *mode))
+            }


⚠️ Potential issue | 🔴 Critical

Critical: Incorrect indexing for Union converters by type_id.

The code builds converters and null_rows using Vec indexed by iteration order (0, 1, 2, ...), but Union type_ids can be arbitrary i8 values (e.g., [1, 3, 4] as shown in the relevant code snippets).

Later in the encoder (line 637), the code iterates with converters.iter().enumerate(), treating the enumeration index as if it were the type_id. Then at line 638, it calls union_array.child(type_id as i8), which expects the actual union type_id, not the enumeration index.

This causes a mismatch when type_ids are non-sequential. For example, if union_fields has type_ids [1, 3, 4], then:

converters[0] corresponds to type_id 1

converters[1] corresponds to type_id 3

converters[2] corresponds to type_id 4

But at line 637-638, enumerate() gives (0, 1, 2) and the code treats these as type_ids, incorrectly calling union_array.child(0), union_array.child(1), union_array.child(2) instead of child(1), child(3), child(4).

To fix this, either:

Use a HashMap<i8, (RowConverter, OwnedRow)> indexed by actual type_id, or

Store the type_ids alongside the converters and use them for lookup

Example fix using approach 1:

- DataType::Union(fields, mode) => { - // similar to dictionaries and lists, we set descending to false and negate nulls_first - // since the encodedc ontents will be inverted if descending is set - let options = SortOptions { - descending: false, - nulls_first: sort_field.options.nulls_first != sort_field.options.descending, - }; - - let mut converters = Vec::with_capacity(fields.len()); - let mut null_rows = Vec::with_capacity(fields.len()); - - for (_type_id, field) in fields.iter() { - let sort_field = - SortField::new_with_options(field.data_type().clone(), options); - let converter = RowConverter::new(vec![sort_field])?; - - let null_array = new_null_array(field.data_type(), 1); - let nulls = converter.convert_columns(&[null_array])?; - let owned = OwnedRow { - data: nulls.buffer.into(), - config: nulls.config, - }; - - converters.push(converter); - null_rows.push(owned); - } - - Ok(Self::Union(converters, null_rows, *mode)) - } + DataType::Union(fields, mode) => { + let options = SortOptions { + descending: false, + nulls_first: sort_field.options.nulls_first != sort_field.options.descending, + }; + + // Store converters indexed by type_id to handle non-sequential type_ids + let mut converters = Vec::with_capacity(fields.len()); + let mut null_rows = Vec::with_capacity(fields.len()); + let mut type_ids_list = Vec::with_capacity(fields.len()); + + for (type_id, field) in fields.iter() { + let sort_field = + SortField::new_with_options(field.data_type().clone(), options); + let converter = RowConverter::new(vec![sort_field])?; + + let null_array = new_null_array(field.data_type(), 1); + let nulls = converter.convert_columns(&[null_array])?; + let owned = OwnedRow { + data: nulls.buffer.into(), + config: nulls.config, + }; + + type_ids_list.push(type_id); + converters.push(converter); + null_rows.push(owned); + } + + Ok(Self::Union(converters, null_rows, type_ids_list, *mode)) + }

Then update the Codec enum to store the type_ids and adjust all access patterns accordingly throughout the encoder, decoder, etc.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In arrow-row/src/lib.rs around lines 530 to 558, the Union branch builds converters and null_rows indexed by iteration order but later code treats that index as the union type_id; this breaks when union field type_ids are non-sequential. Replace the Vecs with a mapping keyed by the actual i8 type_id (e.g., HashMap<i8, (RowConverter, OwnedRow)>) or store a Vec of (type_id, RowConverter, OwnedRow) and change Codec::Union to hold the type_ids mapping; then update the encoder/decoder code that currently uses enumerate() to look up by the real type_id (iterate over the map entries or use the stored type_id when calling union_array.child(type_id) and when accessing the converter/null_row).

value:incorrect-but-reasonable; category:bug; feedback:The Augment AI reviewer is not correct. For sparse unions the index could be used to get the variant/field of the union since sparse unions contain all its variants. Dense unions contain only an offset (the variant index) and the value for that variant. https://github.com/apache/arrow-rs/pull/8839/files#diff-8235802124a5a88240195fc4b8792cf39bb9a8703eb69308d10358966eeff765R1607 has the logic responsible for this.

coderabbitai · 2025-11-14T10:48:37Z

arrow-row/src/lib.rs

            }
+            DataType::Union(fields, mode) => {
+                // similar to dictionaries and lists, we set descending to false and negate nulls_first
+                // since the encodedc ontents will be inverted if descending is set


⚠️ Potential issue | 🟡 Minor

Fix typo in comment.

Line 532 contains a typo: "encodedc ontents" should be "encoded contents".

- // similar to dictionaries and lists, we set descending to false and negate nulls_first - // since the encodedc ontents will be inverted if descending is set + // similar to dictionaries and lists, we set descending to false and negate nulls_first + // since the encoded contents will be inverted if descending is set

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In arrow-row/src/lib.rs around line 532, fix the typo in the inline comment by replacing "encodedc ontents" with "encoded contents" so the comment reads correctly (e.g., "since the encoded contents will be inverted if descending is set").

value:good-to-have; category:typo; feedback:The CodeRabbit AI reviewer is correct! There is a typo in the comment

coderabbitai · 2025-11-14T10:48:38Z

arrow-row/src/lib.rs

+    #[test]
+    fn test_sparse_union() {
+        // create a sparse union with Int32 (type_id = 0) and Utf8 (type_id = 1)
+        let int_array = Int32Array::from(vec![Some(1), None, Some(3), None, Some(5)]);
+        let str_array = StringArray::from(vec![None, Some("b"), None, Some("d"), None]);
+
+        // [1, "b", 3, "d", 5]
+        let type_ids = vec![0, 1, 0, 1, 0].into();
+
+        let union_fields = [
+            (0, Arc::new(Field::new("int", DataType::Int32, false))),
+            (1, Arc::new(Field::new("str", DataType::Utf8, false))),
+        ]
+        .into_iter()
+        .collect();
+
+        let union_array = UnionArray::try_new(
+            union_fields,
+            type_ids,
+            None,
+            vec![Arc::new(int_array) as ArrayRef, Arc::new(str_array)],
+        )
+        .unwrap();
+
+        let union_type = union_array.data_type().clone();
+        let converter = RowConverter::new(vec![SortField::new(union_type)]).unwrap();
+
+        let rows = converter
+            .convert_columns(&[Arc::new(union_array.clone())])
+            .unwrap();
+
+        // round trip
+        let back = converter.convert_rows(&rows).unwrap();
+        let back_union = back[0].as_any().downcast_ref::<UnionArray>().unwrap();
+
+        assert_eq!(union_array.len(), back_union.len());
+        for i in 0..union_array.len() {
+            assert_eq!(union_array.type_id(i), back_union.type_id(i));
+        }
+    }
+
+    #[test]
+    fn test_dense_union() {
+        // create a dense union with Int32 (type_id = 0) and use Utf8 (type_id = 1)
+        let int_array = Int32Array::from(vec![1, 3, 5]);
+        let str_array = StringArray::from(vec!["a", "b"]);
+
+        let type_ids = vec![0, 1, 0, 1, 0].into();
+
+        // [1, "a", 3, "b", 5]
+        let offsets = vec![0, 0, 1, 1, 2].into();
+
+        let union_fields = [
+            (0, Arc::new(Field::new("int", DataType::Int32, false))),
+            (1, Arc::new(Field::new("str", DataType::Utf8, false))),
+        ]
+        .into_iter()
+        .collect();
+
+        let union_array = UnionArray::try_new(
+            union_fields,
+            type_ids,
+            Some(offsets), // Dense mode
+            vec![Arc::new(int_array) as ArrayRef, Arc::new(str_array)],
+        )
+        .unwrap();
+
+        let union_type = union_array.data_type().clone();
+        let converter = RowConverter::new(vec![SortField::new(union_type)]).unwrap();
+
+        let rows = converter
+            .convert_columns(&[Arc::new(union_array.clone())])
+            .unwrap();
+
+        // round trip
+        let back = converter.convert_rows(&rows).unwrap();
+        let back_union = back[0].as_any().downcast_ref::<UnionArray>().unwrap();
+
+        assert_eq!(union_array.len(), back_union.len());
+        for i in 0..union_array.len() {
+            assert_eq!(union_array.type_id(i), back_union.type_id(i));
+        }
+    }
+
+    #[test]
+    fn test_union_ordering() {
+        let int_array = Int32Array::from(vec![100, 5, 20]);
+        let str_array = StringArray::from(vec!["z", "a"]);
+
+        // [100, "z", 5, "a", 20]
+        let type_ids = vec![0, 1, 0, 1, 0].into();
+        let offsets = vec![0, 0, 1, 1, 2].into();
+
+        let union_fields = [
+            (0, Arc::new(Field::new("int", DataType::Int32, false))),
+            (1, Arc::new(Field::new("str", DataType::Utf8, false))),
+        ]
+        .into_iter()
+        .collect();
+
+        let union_array = UnionArray::try_new(
+            union_fields,
+            type_ids,
+            Some(offsets),
+            vec![Arc::new(int_array) as ArrayRef, Arc::new(str_array)],
+        )
+        .unwrap();
+
+        let union_type = union_array.data_type().clone();
+        let converter = RowConverter::new(vec![SortField::new(union_type)]).unwrap();
+
+        let rows = converter.convert_columns(&[Arc::new(union_array)]).unwrap();
+
+        /*
+        expected ordering
+
+        row 2: 5    - type_id 0
+        row 4: 20   - type_id 0
+        row 0: 100  - type id 0
+        row 3: "a"  - type id 1
+        row 1: "z"  - type id 1
+        */
+
+        // 5 < "z"
+        assert!(rows.row(2) < rows.row(1));
+
+        // 100 < "a"
+        assert!(rows.row(0) < rows.row(3));
+
+        // among ints
+        // 5 < 20
+        assert!(rows.row(2) < rows.row(4));
+        // 20 < 100
+        assert!(rows.row(4) < rows.row(0));
+
+        // among strigns
+        // "a" < "z"
+        assert!(rows.row(3) < rows.row(1));
+    }


🛠️ Refactor suggestion | 🟠 Major

Add test coverage for non-sequential union type_ids.

The three union tests use sequential type_ids [0, 1], but Arrow Union types support arbitrary i8 type_ids such as [1, 3, 4] (as shown in the relevant code snippets from union_array.rs).

The current tests won't catch indexing bugs related to non-sequential type_ids. Consider adding a test case like:

#[test] fn test_union_non_sequential_type_ids() { let int_array = Int32Array::from(vec![1, 2, 3]); let float_array = Float64Array::from(vec![4.0, 5.0]); let str_array = StringArray::from(vec!["a"]); // Use non-sequential type_ids: 1, 3, 4 let type_ids = vec![1, 3, 1, 4, 3].into(); let offsets = vec![0, 0, 1, 0, 1].into(); let union_fields = [ (1, Arc::new(Field::new("A", DataType::Int32, true))), (3, Arc::new(Field::new("B", DataType::Float64, true))), (4, Arc::new(Field::new("C", DataType::Utf8, true))), ] .into_iter() .collect(); let union_array = UnionArray::try_new( union_fields, type_ids, Some(offsets), vec![Arc::new(int_array), Arc::new(float_array), Arc::new(str_array)], ) .unwrap(); let converter = RowConverter::new(vec![SortField::new(union_array.data_type().clone())]).unwrap(); let rows = converter.convert_columns(&[Arc::new(union_array.clone())]).unwrap(); let back = converter.convert_rows(&rows).unwrap(); // Verify round-trip correctness assert_eq!(back[0].as_ref(), &union_array as &dyn Array); }

martin-augment · 2025-11-14T13:16:12Z

1. Typo in comment (arrow-row/src/lib.rs:532)
// since the encodedc ontents will be inverted if descending is set
Should be: "encoded contents" (missing space)

value:good-to-have; category:typo; feedback:The Claude AI reviewer is correct! There is a typo in the comment

martin-augment · 2025-11-14T13:19:04Z

2. Critical Bug: Type ID mapping issue (arrow-row/src/lib.rs:634)
for (type_id, converter) in converters.iter().enumerate() {
    let child_array = union_array.child(type_id as i8);
Problem: This assumes type_ids are sequential starting from 0, but Arrow Union types can have arbitrary type_ids (e.g., type_ids could be [5, 10, 15]). The current code would:

Enumerate as 0, 1, 2

Try to access children with these indices

Miss the actual type_id values

value:incorrect-but-reasonable; category:bug; feedback:The Augment AI reviewer is not correct. For sparse unions the index could be used to get the variant/field of the union since sparse unions contain all its variants. Dense unions contain only an offset (the variant index) and the value for that variant. https://github.com/apache/arrow-rs/pull/8839/files#diff-8235802124a5a88240195fc4b8792cf39bb9a8703eb69308d10358966eeff765R1607 has the logic responsible for this.

martin-augment · 2025-11-14T13:19:32Z

3. Same type ID mapping issue in decode (arrow-row/src/lib.rs:1912-1931)

The rows_by_field vector is indexed by field_idx = type_id as usize, which fails for non-sequential type IDs.

4. Same issue in child_arrays iteration (arrow-row/src/lib.rs:1939)
for (field_idx, converter) in converters.iter().enumerate() {
This assumes field indices match array positions, but with arbitrary type_ids this breaks.

value:incorrect-but-reasonable; category:bug; feedback:The Augment AI reviewer is not correct. For sparse unions the index could be used to get the variant/field of the union since sparse unions contain all its variants. Dense unions contain only an offset (the variant index) and the value for that variant. https://github.com/apache/arrow-rs/pull/8839/files#diff-8235802124a5a88240195fc4b8792cf39bb9a8703eb69308d10358966eeff765R1607 has the logic responsible for this.

Initial implementation of union row converter

5559011

augmentcode bot reviewed Nov 14, 2025

View reviewed changes

coderabbitai bot reviewed Nov 14, 2025

View reviewed changes

Conversation

martin-augment commented Nov 14, 2025

Uh oh!

coderabbitai bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot commented Nov 14, 2025

Code Review: Add support for Union types in RowConverter

Code Quality and Best Practices

Positive Aspects

Issues Found

1. Typo in comment (arrow-row/src/lib.rs:532)

2. Critical Bug: Type ID mapping issue (arrow-row/src/lib.rs:634)

3. Same type ID mapping issue in decode (arrow-row/src/lib.rs:1912-1931)

4. Same issue in child_arrays iteration (arrow-row/src/lib.rs:1939)

Potential Bugs

Critical

Minor

Performance Considerations

Concerns

Positive

Security Concerns

Test Coverage

Good Coverage ✅

Missing Coverage ⚠️

Recommendations

Must Fix

Should Fix

Nice to Have

Overall Assessment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment commented Nov 14, 2025

1. Typo in comment (arrow-row/src/lib.rs:532)

Uh oh!

coderabbitai bot commented Nov 14, 2025 •

edited

Loading