489: feat!: Add a `UnionSchemaBuilder` by martin-augment · Pull Request #57 · martin-augment/avro-rs

martin-augment · 2026-02-26T11:38:32Z

489: To review by AI

This also fixes a issue with the original `new` implementation where it would insert named types in the `variant_index` and then `find_schema_with_known_schemata` would use the fast path without checking the schema. `find_schema_with_known_schemata` has also been simplified to use `known_schemata` directly instead of rebuilding it with the current schema, as this would cause duplicate schema errors after the incorrect fast path was removed. Another issue fixed is that logical types cannot be added if their base type is already in the union, and the same in reverse. The `UnionSchemaBuilder::variant_ignore_duplicates` and `UnionSchemaBuilder::contains` are needed for `avro_derive` to implement full support for enums.

coderabbitai · 2026-02-26T11:38:51Z

Walkthrough

This pull request introduces a builder pattern for union schema construction and refactors error handling for union duplicates. The UnionSchema type gains a public builder API via UnionSchemaBuilder with methods for adding variants and constructing unions. Error reporting is made more granular by splitting a single GetUnionDuplicate variant into three distinct variants for maps, arrays, and general duplicate types. Internal union representation changes include explicit indexing structures for tracking named and unnamed schemas. The Name struct adds PartialOrd and Ord derives. Supporting adjustments handle edge cases in Option<T> schema generation and simplify test code with error propagation.

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-489-2026-02-26-11-38-29

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-26T11:38:55Z

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new UnionSchemaBuilder to streamline the creation and validation of Avro union schemas. This builder provides a more structured approach to adding variants, preventing common errors like nested unions or duplicate types. The changes also refine the internal representation and resolution logic for union schemas, particularly improving how logical types and named schemas are handled during schema matching, leading to more accurate and robust schema validation.

Highlights

New UnionSchemaBuilder: Introduced a dedicated builder for UnionSchema to facilitate more robust and controlled construction of union schemas.
Improved Union Validation: Enhanced error handling for duplicate types within unions, specifically distinguishing between duplicate named schemas, and different map or array schemas.
Refined Schema Resolution: The UnionSchema::find_schema_with_known_schemata method was significantly refactored to better handle logical types and named schemas during value resolution.

Changelog

avro/src/error.rs
- Added GetUnionDuplicateMap and GetUnionDuplicateArray error variants to specifically report duplicate map or array schemas with differing subtypes within a union.
- Modified GetUnionDuplicate to include SchemaKind for more precise error reporting.
avro/src/schema/mod.rs
- Exported the new UnionSchemaBuilder from the union module.
avro/src/schema/name.rs
- Derived PartialOrd and Ord traits for the Name struct.
avro/src/schema/union.rs
- Introduced the UnionSchemaBuilder struct with methods for adding schema variants, checking for existing schemas, and building the final UnionSchema.
- Refactored UnionSchema::new to utilize the new UnionSchemaBuilder.
- Updated UnionSchema to include named_index for efficient lookup of named types.
- Modified UnionSchema::is_nullable to leverage the variant_index for improved performance.
- Completely re-implemented UnionSchema::find_schema_with_known_schemata to correctly resolve logical types and named schemas by converting values to base SchemaKinds.
- Added private helper functions value_to_base_schemakind and schema_to_base_schemakind to assist in logical type resolution.
- Included new test cases to validate the functionality of UnionSchemaBuilder with primitive, complex, and logical types, and to test map resolution.
avro/src/serde/derive.rs
- Adjusted the get_schema_in_ctxt implementation for Option<T> to correctly handle cases where T itself is Null, preventing redundant Null entries in the union.
avro/src/types.rs
- Simplified a test assertion in test_resolve_schemata_with_schemata by directly unwrapping the resolve_schemata result.

Activity

The pull request has been opened and is ready for review.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a UnionSchemaBuilder to construct UnionSchema instances and refactors validation logic for a cleaner API and improved error handling. However, the new resolution logic in UnionSchema::find_schema_with_known_schemata introduces critical regressions, breaking Avro specification compliance due to incorrect variant matching order and a lack of type promotion for unnamed types. The repeated deep cloning of untrusted input values during union resolution also poses a Denial of Service risk and is a general performance concern. Additionally, a minor bug exists where an incorrect error type is returned for duplicate array schemas.

gemini-code-assist · 2026-02-26T11:41:53Z

avro/src/schema/union.rs

    ) -> Option<(usize, &Schema)> {
-        let schema_kind = SchemaKind::from(value);
-        if let Some(&i) = self.variant_index.get(&schema_kind) {
-            // fast path
-            Some((i, &self.schemas[i]))
-        } else {
-            // slow path (required for matching logical or named types)
-
-            // first collect what schemas we already know
-            let mut collected_names: HashMap<Name, &Schema> = known_schemata
-                .map(|names| {
-                    names
-                        .iter()
-                        .map(|(name, schema)| (name.clone(), schema.borrow()))
-                        .collect()
+        let ValueSchemaKind { unnamed, named } = Self::value_to_base_schemakind(value);
+        // Unnamed schema types can be looked up directly using the variant_index
+        let unnamed = unnamed
+            .and_then(|kind| self.variant_index.get(&kind).copied())
+            .map(|index| (index, &self.schemas[index]))
+            .and_then(|(index, schema)| {
+                let kind = schema.discriminant();
+                // Maps and arrays need to be checked if they actually match the value
+                if kind == SchemaKind::Map || kind == SchemaKind::Array {
+                    let known_schemata_if_none = HashMap::new();
+                    let known_schemata = known_schemata.unwrap_or(&known_schemata_if_none);
+                    let namespace = if schema.namespace().is_some() {
+                        &schema.namespace()
+                    } else {
+                        enclosing_namespace
+                    };
+
+                    // TODO: Do this without the clone
+                    value
+                        .clone()
+                        .resolve_internal(schema, known_schemata, namespace, &None)
+                        .ok()
+                        .map(|_| (index, schema))
+                } else {
+                    Some((index, schema))
+                }
+            });
+        let named = named.and_then(|kind| {
+            // Every named type needs to be checked against a value until one matches
+
+            let known_schemata_if_none = HashMap::new();
+            let known_schemata = known_schemata.unwrap_or(&known_schemata_if_none);
+
+            self.named_index
+                .iter()
+                .copied()
+                .map(|i| (i, &self.schemas[i]))
+                .filter(|(_i, s)| s.discriminant() == kind || s.discriminant() == SchemaKind::Ref)
+                .find(|(_i, schema)| {
+                    let namespace = if schema.namespace().is_some() {
+                        &schema.namespace()
+                    } else {
+                        enclosing_namespace
+                    };
+
+                    // TODO: Do this without the clone
+                    value
+                        .clone()
+                        .resolve_internal(schema, known_schemata, namespace, &None)
+                        .is_ok()
                })
-                .unwrap_or_default();
-
-            self.schemas.iter().enumerate().find(|(_, schema)| {
-                let resolved_schema = ResolvedSchema::new_with_known_schemata(
-                    vec![*schema],
-                    enclosing_namespace,
-                    &collected_names,
-                )
-                .expect("Schema didn't successfully parse");
-                let resolved_names = resolved_schema.names_ref;
-
-                // extend known schemas with just resolved names
-                collected_names.extend(resolved_names);
-                let namespace = &schema.namespace().or_else(|| enclosing_namespace.clone());
-
-                value
-                    .clone()
-                    .resolve_internal(schema, &collected_names, namespace, &None)
-                    .is_ok()
-            })
+        });
+
+        match (unnamed, named) {
+            (Some((u_i, _)), Some((n_i, _))) if u_i < n_i => unnamed,
+            (Some(_), Some(_)) => named,
+            (Some(_), None) => unnamed,
+            (None, Some(_)) => named,
+            (None, None) => None,
+        }
+    }


The find_schema_with_known_schemata implementation introduces critical logic flaws that break Avro specification compliance and pose performance/security risks.

Incorrect Resolution Order: The named_index is populated from a BTreeMap's values in UnionSchemaBuilder::build (line 344), which results in indices being sorted by the alphabetical order of the schema names. When find_schema_with_known_schemata iterates over named_index (line 128), it will match the alphabetically first record even if it appears later in the union definition. Avro requires that the first matching variant in the union's defined order be selected.

Missing Type Promotion: The "fast path" for unnamed types (lines 97-98) only checks for an exact SchemaKind match in variant_index. It fails to account for type promotion rules (e.g., an int value matching a long schema), which were previously handled by the slow path using resolve_internal. Since there is no fallback to a full resolution check for unnamed types that don't match exactly, valid Avro data may be incorrectly rejected.

Inefficient Cloning / DoS Risk: As noted in the TODO, cloning value (e.g., lines 112 and 141) can be inefficient, especially for large values, and contributes to a Denial of Service risk. To address this, consider refactoring resolve_internal (which currently consumes self) to introduce a new function, such as validate_internal(&self, ...) -> AvroResult<()>, which checks for validity without consuming or returning a modified Value. This would allow calling value.validate_internal(...) without needing a clone, improving performance and mitigating the DoS risk.

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! 1) The iteration over the named schemas depends on the alphabetical order of the schemas' names, not their insertion order. This may lead to wrong resolving of named schema. 2) For unnamed schemas there is no "slow path" resolving anymore and this breaks the "type promotion" support. Prevents regressions in the union schema support.

gemini-code-assist · 2026-02-26T11:41:53Z

avro/src/schema/union.rs

+                    return Err(
+                        Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into(),
+                    );


There's a small copy-paste error here. When handling a duplicate Schema::Array, the error returned is Details::GetUnionDuplicateMap. This should be Details::GetUnionDuplicateArray to provide the correct error message.

Suggested change

return Err(

Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into(),

);

return Err(

Details::GetUnionDuplicateArray(self.schemas.remove(index), schema).into(),

);

value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There is a copy/paste error in the returned error for duplicate Schema::Array's. Prevents confusing the developer by using a wrong error

gemini-code-assist · 2026-02-26T11:41:53Z

avro/src/schema/union.rs

+                    value
+                        .clone()
+                        .resolve_internal(schema, known_schemata, namespace, &None)
+                        .ok()
+                        .map(|_| (index, schema))
+                } else {
+                    Some((index, schema))
+                }
+            });
+        let named = named.and_then(|kind| {
+            // Every named type needs to be checked against a value until one matches
+
+            let known_schemata_if_none = HashMap::new();
+            let known_schemata = known_schemata.unwrap_or(&known_schemata_if_none);
+
+            self.named_index
+                .iter()
+                .copied()
+                .map(|i| (i, &self.schemas[i]))
+                .filter(|(_i, s)| s.discriminant() == kind || s.discriminant() == SchemaKind::Ref)
+                .find(|(_i, schema)| {
+                    let namespace = if schema.namespace().is_some() {
+                        &schema.namespace()
+                    } else {
+                        enclosing_namespace
+                    };
+
+                    // TODO: Do this without the clone
+                    value
+                        .clone()
+                        .resolve_internal(schema, known_schemata, namespace, &None)
+                        .is_ok()
                })
-                .unwrap_or_default();
-
-            self.schemas.iter().enumerate().find(|(_, schema)| {
-                let resolved_schema = ResolvedSchema::new_with_known_schemata(
-                    vec![*schema],
-                    enclosing_namespace,
-                    &collected_names,
-                )
-                .expect("Schema didn't successfully parse");
-                let resolved_names = resolved_schema.names_ref;
-
-                // extend known schemas with just resolved names
-                collected_names.extend(resolved_names);
-                let namespace = &schema.namespace().or_else(|| enclosing_namespace.clone());
-
-                value
-                    .clone()
-                    .resolve_internal(schema, &collected_names, namespace, &None)
-                    .is_ok()
-            })
+        });
+
+        match (unnamed, named) {
+            (Some((u_i, _)), Some((n_i, _))) if u_i < n_i => unnamed,
+            (Some(_), Some(_)) => named,


The implementation of find_schema_with_known_schemata performs a deep clone of the value for every variant in the union during the resolution process (lines 114 and 142). If a union has many variants and the input Value is large or deeply nested, this can lead to excessive memory allocation and CPU consumption, potentially enabling a Denial of Service (DoS) attack.

As noted in the TODO comments, this should be implemented without cloning the value, for example by modifying resolve_internal to accept a reference.

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! The deep cloning could be avoided by changing the way the resolve() function is implemented. It would be good to use a reference instead.

augmentcode · 2026-02-26T11:42:48Z

🤖 Augment PR Summary

Summary: This PR introduces a new UnionSchemaBuilder and refactors union construction/matching logic to support incremental building and improved handling of logical/named types.

Changes:

Added UnionSchemaBuilder and refactored UnionSchema::new to build unions via the builder
Improved union variant indexing by separating unnamed (variant_index) and named (named_index) variant lookups
Updated union schema selection (find_schema_with_known_schemata) to fast-path unnamed kinds and validate map/array candidates via resolution
Expanded error reporting for union duplicates, including more specific duplicate map/array variants
Exported UnionSchemaBuilder from schema::mod and added ordering derives to Name to support ordered maps
Adjusted serde derive for Option<T> when T is null to avoid creating invalid duplicate-null unions
Added targeted tests covering builder behavior and union schema selection edge cases

Technical Notes: Logical types are reduced to their base representation for indexing, and map/array matches now verify item compatibility by attempting schema resolution against the candidate variant.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 3 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-02-26T11:42:48Z

avro/src/schema/union.rs

+            if let Some(index) = self.variant_index.get(&SchemaKind::Map).copied() {
+                if self.schemas[index] != schema {
+                    return Err(
+                        Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into(),


In variant_ignore_duplicates, building the error using self.schemas.remove(index) mutates the builder on the error path, which can invalidate indices stored in variant_index/names if the caller continues using the builder after handling the error.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The removal of the duplicate schema suggests that the caller could re-attempt to insert the schema after correcting it, but the indices (names and variant_index) should also be updated. Prevents leaving the UnionSchemaBuilder in inconsistent state.

augmentcode · 2026-02-26T11:42:48Z

avro/src/schema/union.rs

+            if let Some(index) = self.variant_index.get(&SchemaKind::Array).copied() {
+                if self.schemas[index] != schema {
+                    return Err(
+                        Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into(),


This array-duplicate branch returns Details::GetUnionDuplicateMap(...) (and message) even though it’s handling arrays; it looks like it should report GetUnionDuplicateArray instead.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! There is a copy/paste error in the returned error for duplicate Schema::Array's. Prevents confusing the developer by using a wrong error

augmentcode · 2026-02-26T11:42:49Z

avro/src/schema/union.rs

+        self.schemas.shrink_to_fit();
+        UnionSchema {
+            variant_index: self.variant_index,
+            named_index: self.names.into_values().collect(),


named_index is derived from self.names.into_values(), so its iteration order is by Name (BTreeMap key order) rather than union index order; since find_schema_with_known_schemata scans named_index with .find(), union precedence for named types may depend on Name ordering instead of the declared union order.

Severity: medium

Other Locations

avro/src/schema/union.rs:128

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The iteration over the named schemas depends on the alphabetical order of the schemas' names, not their insertion order. This may lead to wrong resolving of named schema.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

avro/src/serde/derive.rs (1)

640-650: Please add a regression test for the Option<Null> branch.

This new edge-case path is important and easy to regress; a focused test for generated schema behavior here would be valuable.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/derive.rs` around lines 640 - 650, Add a regression test that
exercises the Option<Null> branch: call T::get_schema_in_ctxt (or whatever
public helper you use in tests) for Option<Null> so the code path that checks if
let Schema::Null = schema is taken, then assert the returned Schema::Union (from
UnionSchema::new) contains exactly the single Schema::Null variant (no
nested/extra variants). Place the test alongside other serde derive tests, use
the same context setup used by derive tests, and reference the Option handling
code paths around T::get_schema_in_ctxt, Schema::Null and UnionSchema::new to
guide locating the correct behavior to verify.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@avro/src/schema/union.rs`:
- Around line 256-273: The variant_ignore_duplicates branch mutates builder
state by calling self.schemas.remove(index) before returning Err and also uses
the wrong error enum for arrays; update both the Map and Array branches in
variant_ignore_duplicates to avoid mutating self.schemas or self.variant_index
when encountering a duplicate (do not call remove), instead construct and return
the appropriate error using the existing values (clone or reference as needed)
so internal indices remain valid, and change the Array branch to return
Details::GetUnionDuplicateArray(...) while the Map branch returns
Details::GetUnionDuplicateMap(...).
- Around line 132-133: The current filter in union.rs uses s.discriminant() ==
kind (with SchemaKind::Ref) which incorrectly excludes named logical-type
variants; update the predicate to accept schemas whose effective/base kind
matches kind as well as SchemaKind::Ref — e.g., replace the s.discriminant()
check with a call that compares the schema's base/effective kind (or a helper
like is_logical_variant_of or base_kind) against kind so
duration/decimal/uuid-over-fixed and other logical variants are included when
matching.
- Around line 340-345: The named_index is being built from
self.names.into_values(), which yields BTreeMap key order instead of the union
declaration/schema order; change build() so named_index is constructed by
iterating schema-order (the order of self.schemas or using self.variant_index)
and for each schema index find and collect the corresponding name from
self.names (matching its stored index/value), so the first matching named branch
follows the original union declaration order; update the build() function to
produce UnionSchema { named_index: /* names collected in schema order */ , ... }
rather than using into_values().

---

Nitpick comments:
In `@avro/src/serde/derive.rs`:
- Around line 640-650: Add a regression test that exercises the Option<Null>
branch: call T::get_schema_in_ctxt (or whatever public helper you use in tests)
for Option<Null> so the code path that checks if let Schema::Null = schema is
taken, then assert the returned Schema::Union (from UnionSchema::new) contains
exactly the single Schema::Null variant (no nested/extra variants). Place the
test alongside other serde derive tests, use the same context setup used by
derive tests, and reference the Option handling code paths around
T::get_schema_in_ctxt, Schema::Null and UnionSchema::new to guide locating the
correct behavior to verify.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a3548a and 7205975.

📒 Files selected for processing (6)

avro/src/error.rs
avro/src/schema/mod.rs
avro/src/schema/name.rs
avro/src/schema/union.rs
avro/src/serde/derive.rs
avro/src/types.rs

coderabbitai · 2026-02-26T11:45:26Z

avro/src/schema/union.rs

+                .filter(|(_i, s)| s.discriminant() == kind || s.discriminant() == SchemaKind::Ref)
+                .find(|(_i, schema)| {


⚠️ Potential issue | 🟠 Major

Named logical-type variants can be skipped during matching.

This filter compares raw discriminants against a base kind (Fixed, Bytes, etc.). For named logical schemas (for example duration/decimal/uuid-over-fixed), raw discriminants won’t match the base kind and valid variants can be excluded.

🔧 Proposed fix

- .filter(|(_i, s)| s.discriminant() == kind || s.discriminant() == SchemaKind::Ref) + .filter(|(_i, s)| { + s.discriminant() == SchemaKind::Ref + || UnionSchemaBuilder::schema_to_base_schemakind(s) == kind + })

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@avro/src/schema/union.rs` around lines 132 - 133, The current filter in union.rs uses s.discriminant() == kind (with SchemaKind::Ref) which incorrectly excludes named logical-type variants; update the predicate to accept schemas whose effective/base kind matches kind as well as SchemaKind::Ref — e.g., replace the s.discriminant() check with a call that compares the schema's base/effective kind (or a helper like is_logical_variant_of or base_kind) against kind so duration/decimal/uuid-over-fixed and other logical variants are included when matching.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! Since the logical schemas are based on another type a direct comparison of their SchemaKind won't match. They should be compared by using their base kinds.

coderabbitai · 2026-02-26T11:45:26Z

avro/src/schema/union.rs

+        } else if let Schema::Map(_) = &schema {
+            if let Some(index) = self.variant_index.get(&SchemaKind::Map).copied() {
+                if self.schemas[index] != schema {
+                    return Err(
+                        Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into(),
+                    );
+                }
+            } else {
+                self.variant_index
+                    .insert(SchemaKind::Map, self.schemas.len());
+                self.schemas.push(schema);
+            }
+        } else if let Schema::Array(_) = &schema {
+            if let Some(index) = self.variant_index.get(&SchemaKind::Array).copied() {
+                if self.schemas[index] != schema {
+                    return Err(
+                        Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into(),
+                    );


⚠️ Potential issue | 🟠 Major

variant_ignore_duplicates corrupts builder state on error and uses the wrong error variant for arrays.

On mismatch, this path calls self.schemas.remove(index) before returning Err, which mutates internal state (and stale indices remain in maps). Also, the array branch returns GetUnionDuplicateMap instead of GetUnionDuplicateArray.

🔧 Proposed fix

} else if let Schema::Map(_) = &schema { if let Some(index) = self.variant_index.get(&SchemaKind::Map).copied() { if self.schemas[index] != schema { return Err( - Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into(), + Details::GetUnionDuplicateMap(self.schemas[index].clone(), schema).into(), ); } } else { self.variant_index .insert(SchemaKind::Map, self.schemas.len()); self.schemas.push(schema); } } else if let Schema::Array(_) = &schema { if let Some(index) = self.variant_index.get(&SchemaKind::Array).copied() { if self.schemas[index] != schema { return Err( - Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into(), + Details::GetUnionDuplicateArray(self.schemas[index].clone(), schema).into(), ); } } else { self.variant_index .insert(SchemaKind::Array, self.schemas.len()); self.schemas.push(schema); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@avro/src/schema/union.rs` around lines 256 - 273, The variant_ignore_duplicates branch mutates builder state by calling self.schemas.remove(index) before returning Err and also uses the wrong error enum for arrays; update both the Map and Array branches in variant_ignore_duplicates to avoid mutating self.schemas or self.variant_index when encountering a duplicate (do not call remove), instead construct and return the appropriate error using the existing values (clone or reference as needed) so internal indices remain valid, and change the Array branch to return Details::GetUnionDuplicateArray(...) while the Map branch returns Details::GetUnionDuplicateMap(...).

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The removal of the duplicate schema suggests that the caller could re-attempt to insert the schema after correcting it, but the indices (names and variant_index) should also be updated. Prevents leaving the UnionSchemaBuilder in inconsistent state.

coderabbitai · 2026-02-26T11:45:27Z

avro/src/schema/union.rs

+    pub fn build(mut self) -> UnionSchema {
+        self.schemas.shrink_to_fit();
+        UnionSchema {
+            variant_index: self.variant_index,
+            named_index: self.names.into_values().collect(),
+            schemas: self.schemas,


⚠️ Potential issue | 🟠 Major

Preserve schema-order precedence when building named_index.

self.names is a BTreeMap, so into_values() is keyed-name order, not union declaration/index order. This can change which named branch is picked first when multiple candidates match.

🔧 Proposed fix

pub fn build(mut self) -> UnionSchema { self.schemas.shrink_to_fit(); + let mut named_index: Vec<_> = self.names.into_values().collect(); + named_index.sort_unstable(); UnionSchema { variant_index: self.variant_index, - named_index: self.names.into_values().collect(), + named_index, schemas: self.schemas, } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub fn build(mut self) -> UnionSchema {

self.schemas.shrink_to_fit();

UnionSchema {

variant_index: self.variant_index,

named_index: self.names.into_values().collect(),

schemas: self.schemas,

pub fn build(mut self) -> UnionSchema {

self.schemas.shrink_to_fit();

let mut named_index: Vec<_> = self.names.into_values().collect();

named_index.sort_unstable();

UnionSchema {

variant_index: self.variant_index,

named_index,

schemas: self.schemas,

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@avro/src/schema/union.rs` around lines 340 - 345, The named_index is being built from self.names.into_values(), which yields BTreeMap key order instead of the union declaration/schema order; change build() so named_index is constructed by iterating schema-order (the order of self.schemas or using self.variant_index) and for each schema index find and collect the corresponding name from self.names (matching its stored index/value), so the first matching named branch follows the original union declaration order; update the build() function to produce UnionSchema { named_index: /* names collected in schema order */ , ... } rather than using into_values().

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The iteration over the named schemas depends on the alphabetical order of the schemas' names, not their insertion order. This may lead to wrong resolving of named schema.

claude · 2026-02-26T11:46:55Z

Code Review

This PR introduces a UnionSchemaBuilder to build unions incrementally, refactors find_schema_with_known_schemata to handle logical types properly, and adds several improvements. The overall approach is solid but there are a few bugs and design concerns worth addressing before merging.

Bug 1: Wrong error variant for duplicate arrays (high severity)

In variant_ignore_duplicates (union.rs line ~271), when a duplicate Schema::Array with a different element type is detected, the code emits GetUnionDuplicateMap instead of GetUnionDuplicateArray.

The error arm reads:

Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into()

but should use GetUnionDuplicateArray. This causes the error message to say "Found two different maps..." for an array conflict. Since GetUnionDuplicateArray was introduced in this PR specifically for this case, it should be used here.

Bug 2: Builder state corruption on Map/Array duplicate error (high severity)

In the same variant_ignore_duplicates method, self.schemas.remove(index) is called before returning Err:

return Err(Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into());

Vec::remove shifts all subsequent elements left, invalidating every index stored in variant_index and names. Because the method signature is -> Result<&mut Self, Error>, callers are allowed to catch the error and continue using the builder, which is now in an inconsistent state.

The fix is to clone instead of remove:

Details::GetUnionDuplicateArray(self.schemas[index].clone(), schema).into()

Bug 3: `named_index` is in alphabetical name order, not union order (medium severity)

build() does:

named_index: self.names.into_values().collect(),

self.names is a BTreeMap<Name, usize>, so into_values() yields values sorted by key (alphabetical by Name) rather than by insertion/union order.

In find_schema_with_known_schemata, tie-breaking between unnamed and named matches compares positions:

(Some((u_i, _)), Some((n_i, _))) if u_i < n_i => unnamed,

But if multiple named schemas could match a value, find on named_index returns the alphabetically-first name's schema position rather than the union-order-first position. The Avro spec requires the first matching type in a union to win, so this can produce incorrect results.

Fix: sort named_index by value before storing it:

let mut named_index: Vec<usize> = self.names.into_values().collect();
named_index.sort_unstable();

Design concern: `PartialOrd + Ord` derived on `Name` unnecessarily (low severity)

Ord was derived on Name solely to use it as a BTreeMap key in UnionSchemaBuilder. However, Name already implements Hash + Eq, and HashMap<Name, usize> is used elsewhere in the codebase (type Names = HashMap<Name, Schema>). Adding Ord to a widely-used type is an API change that could be avoided by using HashMap<Name, usize> for the builder's names field.

Missing test coverage

No test for the duplicate Array path in variant_ignore_duplicates -- this is how Bug 1 went undetected. A test should verify the error is GetUnionDuplicateArray, not GetUnionDuplicateMap.
No test for contains() -- the new public method has zero test coverage.
No test for the union-order priority scenario where a Value::Map could match both an unnamed Map schema and a named Record schema.

Minor nits

Attribute ordering: #[doc(hidden)] appears before the doc comment on variant_ignore_duplicates. Rust convention is doc comment first, then #[doc(hidden)] immediately before pub fn.
Asymmetric branch structure: variant_ignore_duplicates uses explicit if let Schema::Map / if let Schema::Array branches before the generic discriminant path, while variant() does not. The reason for the different treatment is not documented.
Two // TODO: Do this without the clone comments inside find_schema_with_known_schemata represent allocations on a hot resolution path -- worth tracking as issues.

Summary

Issue	Severity
Wrong error variant for array duplicates (`GetUnionDuplicateMap` used instead of `GetUnionDuplicateArray`)	High
Builder state corruption via `schemas.remove` on error path	High
`named_index` sorted by name, not union order	Medium
Missing tests (array duplicate path, `contains`, ordering)	Medium
`Ord` on `Name` avoidable with `HashMap`	Low

The overall direction is a genuine improvement -- the builder API and better logical-type handling in schema matching are both valuable additions. The two high-severity bugs are straightforward to fix before merging.

martin-augment · 2026-02-27T08:10:20Z

640-650: Please add a regression test for the Option<Null> branch.

This new edge-case path is important and easy to regress; a focused test for generated schema behavior here would be valuable.

value:good-to-have; category:bug; feedback: The CodeRabbit AI reviewer is correct! The fix for supporting derivation of AvroSchema for Option<()> should also include a unit test, otherwise there is a chance that the feature/fix may regress in the future without being noticed.

martin-augment · 2026-02-27T08:22:53Z

Bug 1: Wrong error variant for duplicate arrays (high severity)

In variant_ignore_duplicates (union.rs line ~271), when a duplicate Schema::Array with a different element type is detected, the code emits GetUnionDuplicateMap instead of GetUnionDuplicateArray.

The error arm reads:
Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into()
but should use GetUnionDuplicateArray. This causes the error message to say "Found two different maps..." for an array conflict. Since GetUnionDuplicateArray was introduced in this PR specifically for this case, it should be used here.

value:good-to-have; category:documentation; feedback: The Claude AI reviewer is correct! There is a copy/paste error in the returned error for duplicate Schema::Array's. Prevents confusing the developer by using a wrong error

martin-augment · 2026-02-27T08:23:49Z

Bug 2: Builder state corruption on Map/Array duplicate error (high severity)

In the same variant_ignore_duplicates method, self.schemas.remove(index) is called before returning Err:
return Err(Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into());
Vec::remove shifts all subsequent elements left, invalidating every index stored in variant_index and names. Because the method signature is -> Result<&mut Self, Error>, callers are allowed to catch the error and continue using the builder, which is now in an inconsistent state.

The fix is to clone instead of remove:
Details::GetUnionDuplicateArray(self.schemas[index].clone(), schema).into()

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The removal of the duplicate schema suggests that the caller could re-attempt to insert the schema after correcting it, but the indices (names and variant_index) should also be updated. Prevents leaving the UnionSchemaBuilder in inconsistent state.

martin-augment · 2026-02-27T08:24:30Z

Bug 3: named_index is in alphabetical name order, not union order (medium severity)

build() does:
named_index: self.names.into_values().collect(),
self.names is a BTreeMap<Name, usize>, so into_values() yields values sorted by key (alphabetical by Name) rather than by insertion/union order.

In find_schema_with_known_schemata, tie-breaking between unnamed and named matches compares positions:
(Some((u_i, _)), Some((n_i, _))) if u_i < n_i => unnamed,
But if multiple named schemas could match a value, find on named_index returns the alphabetically-first name's schema position rather than the union-order-first position. The Avro spec requires the first matching type in a union to win, so this can produce incorrect results.

Fix: sort named_index by value before storing it:
let mut named_index: Vec<usize> = self.names.into_values().collect();
named_index.sort_unstable();

value:useful; category:bug; feedback: The Claude AI reviewer is correct! The iteration over the named schemas depends on the alphabetical order of the schemas' names, not their insertion order. This may lead to wrong resolving of named schema.

gemini-code-assist bot reviewed Feb 26, 2026

View reviewed changes

augmentcode bot reviewed Feb 26, 2026

View reviewed changes

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

		.filter(\|(_i, s)\| s.discriminant() == kind \|\| s.discriminant() == SchemaKind::Ref)
		.find(\|(_i, schema)\| {

Conversation

martin-augment commented Feb 26, 2026

Uh oh!

coderabbitai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

gemini-code-assist bot commented Feb 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

augmentcode bot commented Feb 26, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude bot commented Feb 26, 2026

Code Review

Bug 1: Wrong error variant for duplicate arrays (high severity)

Bug 2: Builder state corruption on Map/Array duplicate error (high severity)

Bug 3: named_index is in alphabetical name order, not union order (medium severity)

Design concern: PartialOrd + Ord derived on Name unnecessarily (low severity)

Missing test coverage

Minor nits

Summary

Uh oh!

martin-augment commented Feb 27, 2026

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

Bug 3: `named_index` is in alphabetical name order, not union order (medium severity)

Design concern: `PartialOrd + Ord` derived on `Name` unnecessarily (low severity)

Bug 3: `named_index` is in alphabetical name order, not union order (medium severity)