Skip to content

Commit 28f66f9

Browse files
Add Union encoding documentation (#9102)
# Which issue does this PR close? - Closes #9084. # What changes are included in this PR? Documentation on union types encoding in https://arrow.apache.org/rust/arrow_row/struct.RowConverter.html. # Are these changes tested? Yes. # Are there any user-facing changes? Yes. https://arrow.apache.org/rust/arrow_row/struct.RowConverter.html will get updated. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
1 parent a8346be commit 28f66f9

File tree

1 file changed

+41
-0
lines changed

1 file changed

+41
-0
lines changed

arrow-row/src/lib.rs

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -415,6 +415,41 @@ mod variable;
415415
///
416416
///```
417417
///
418+
/// ## Union Encoding
419+
///
420+
/// A union value is encoded as a single type-id byte followed by the row encoding of the selected child value.
421+
/// The type-id byte is always present; union arrays have no top-level null marker, so nulls are represented by the child encoding.
422+
///
423+
/// For example, given a union of Int32 (type_id = 0) and Utf8 (type_id = 1):
424+
///
425+
/// ```text
426+
/// ┌──┬──────────────┐
427+
/// 3 │00│01│80│00│00│03│
428+
/// └──┴──────────────┘
429+
/// │ └─ signed integer encoding (non-null)
430+
/// └──── type_id
431+
///
432+
/// ┌──┬────────────────────────────────┐
433+
/// "abc" │01│02│'a'│'b'│'c'│00│00│00│00│00│03│
434+
/// └──┴────────────────────────────────┘
435+
/// │ └─ string encoding (non-null)
436+
/// └──── type_id
437+
///
438+
/// ┌──┬──────────────┐
439+
/// null Int32 │00│00│00│00│00│00│
440+
/// └──┴──────────────┘
441+
/// │ └─ signed integer encoding (null)
442+
/// └──── type_id
443+
///
444+
/// ┌──┬──┐
445+
/// null Utf8 │01│00│
446+
/// └──┴──┘
447+
/// │ └─ string encoding (null)
448+
/// └──── type_id
449+
/// ```
450+
///
451+
/// See [`UnionArray`] for more details on union types.
452+
///
418453
/// # Ordering
419454
///
420455
/// ## Float Ordering
@@ -431,6 +466,12 @@ mod variable;
431466
/// The encoding described above will order nulls first, this can be inverted by representing
432467
/// nulls as `0xFF_u8` instead of `0_u8`
433468
///
469+
/// ## Union Ordering
470+
///
471+
/// Values of the same type are ordered according to the ordering of that type.
472+
/// Values of different types are ordered by their type id.
473+
/// The type_id is negated when descending order is specified.
474+
///
434475
/// ## Reverse Column Ordering
435476
///
436477
/// The order of a given column can be reversed by negating the encoded bytes of non-null values

0 commit comments

Comments
 (0)