Conversation
This also fixes a issue with the original `new` implementation where it would insert named types in the `variant_index` and then `find_schema_with_known_schemata` would use the fast path without checking the schema. `find_schema_with_known_schemata` has also been simplified to use `known_schemata` directly instead of rebuilding it with the current schema, as this would cause duplicate schema errors after the incorrect fast path was removed. Another issue fixed is that logical types cannot be added if their base type is already in the union, and the same in reverse. The `UnionSchemaBuilder::variant_ignore_duplicates` and `UnionSchemaBuilder::contains` are needed for `avro_derive` to implement full support for enums.
WalkthroughThis pull request introduces a builder pattern for union schema construction and refactors error handling for union duplicates. The ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a UnionSchemaBuilder to construct UnionSchema instances and refactors validation logic for a cleaner API and improved error handling. However, the new resolution logic in UnionSchema::find_schema_with_known_schemata introduces critical regressions, breaking Avro specification compliance due to incorrect variant matching order and a lack of type promotion for unnamed types. The repeated deep cloning of untrusted input values during union resolution also poses a Denial of Service risk and is a general performance concern. Additionally, a minor bug exists where an incorrect error type is returned for duplicate array schemas.
| ) -> Option<(usize, &Schema)> { | ||
| let schema_kind = SchemaKind::from(value); | ||
| if let Some(&i) = self.variant_index.get(&schema_kind) { | ||
| // fast path | ||
| Some((i, &self.schemas[i])) | ||
| } else { | ||
| // slow path (required for matching logical or named types) | ||
|
|
||
| // first collect what schemas we already know | ||
| let mut collected_names: HashMap<Name, &Schema> = known_schemata | ||
| .map(|names| { | ||
| names | ||
| .iter() | ||
| .map(|(name, schema)| (name.clone(), schema.borrow())) | ||
| .collect() | ||
| let ValueSchemaKind { unnamed, named } = Self::value_to_base_schemakind(value); | ||
| // Unnamed schema types can be looked up directly using the variant_index | ||
| let unnamed = unnamed | ||
| .and_then(|kind| self.variant_index.get(&kind).copied()) | ||
| .map(|index| (index, &self.schemas[index])) | ||
| .and_then(|(index, schema)| { | ||
| let kind = schema.discriminant(); | ||
| // Maps and arrays need to be checked if they actually match the value | ||
| if kind == SchemaKind::Map || kind == SchemaKind::Array { | ||
| let known_schemata_if_none = HashMap::new(); | ||
| let known_schemata = known_schemata.unwrap_or(&known_schemata_if_none); | ||
| let namespace = if schema.namespace().is_some() { | ||
| &schema.namespace() | ||
| } else { | ||
| enclosing_namespace | ||
| }; | ||
|
|
||
| // TODO: Do this without the clone | ||
| value | ||
| .clone() | ||
| .resolve_internal(schema, known_schemata, namespace, &None) | ||
| .ok() | ||
| .map(|_| (index, schema)) | ||
| } else { | ||
| Some((index, schema)) | ||
| } | ||
| }); | ||
| let named = named.and_then(|kind| { | ||
| // Every named type needs to be checked against a value until one matches | ||
|
|
||
| let known_schemata_if_none = HashMap::new(); | ||
| let known_schemata = known_schemata.unwrap_or(&known_schemata_if_none); | ||
|
|
||
| self.named_index | ||
| .iter() | ||
| .copied() | ||
| .map(|i| (i, &self.schemas[i])) | ||
| .filter(|(_i, s)| s.discriminant() == kind || s.discriminant() == SchemaKind::Ref) | ||
| .find(|(_i, schema)| { | ||
| let namespace = if schema.namespace().is_some() { | ||
| &schema.namespace() | ||
| } else { | ||
| enclosing_namespace | ||
| }; | ||
|
|
||
| // TODO: Do this without the clone | ||
| value | ||
| .clone() | ||
| .resolve_internal(schema, known_schemata, namespace, &None) | ||
| .is_ok() | ||
| }) | ||
| .unwrap_or_default(); | ||
|
|
||
| self.schemas.iter().enumerate().find(|(_, schema)| { | ||
| let resolved_schema = ResolvedSchema::new_with_known_schemata( | ||
| vec![*schema], | ||
| enclosing_namespace, | ||
| &collected_names, | ||
| ) | ||
| .expect("Schema didn't successfully parse"); | ||
| let resolved_names = resolved_schema.names_ref; | ||
|
|
||
| // extend known schemas with just resolved names | ||
| collected_names.extend(resolved_names); | ||
| let namespace = &schema.namespace().or_else(|| enclosing_namespace.clone()); | ||
|
|
||
| value | ||
| .clone() | ||
| .resolve_internal(schema, &collected_names, namespace, &None) | ||
| .is_ok() | ||
| }) | ||
| }); | ||
|
|
||
| match (unnamed, named) { | ||
| (Some((u_i, _)), Some((n_i, _))) if u_i < n_i => unnamed, | ||
| (Some(_), Some(_)) => named, | ||
| (Some(_), None) => unnamed, | ||
| (None, Some(_)) => named, | ||
| (None, None) => None, | ||
| } | ||
| } |
There was a problem hiding this comment.
The find_schema_with_known_schemata implementation introduces critical logic flaws that break Avro specification compliance and pose performance/security risks.
- Incorrect Resolution Order: The
named_indexis populated from aBTreeMap's values inUnionSchemaBuilder::build(line 344), which results in indices being sorted by the alphabetical order of the schema names. Whenfind_schema_with_known_schemataiterates overnamed_index(line 128), it will match the alphabetically first record even if it appears later in the union definition. Avro requires that the first matching variant in the union's defined order be selected. - Missing Type Promotion: The "fast path" for unnamed types (lines 97-98) only checks for an exact
SchemaKindmatch invariant_index. It fails to account for type promotion rules (e.g., anintvalue matching alongschema), which were previously handled by the slow path usingresolve_internal. Since there is no fallback to a full resolution check for unnamed types that don't match exactly, valid Avro data may be incorrectly rejected. - Inefficient Cloning / DoS Risk: As noted in the TODO, cloning
value(e.g., lines 112 and 141) can be inefficient, especially for large values, and contributes to a Denial of Service risk. To address this, consider refactoringresolve_internal(which currently consumesself) to introduce a new function, such asvalidate_internal(&self, ...) -> AvroResult<()>, which checks for validity without consuming or returning a modifiedValue. This would allow callingvalue.validate_internal(...)without needing a clone, improving performance and mitigating the DoS risk.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Gemini AI reviewer is correct! 1) The iteration over the named schemas depends on the alphabetical order of the schemas' names, not their insertion order. This may lead to wrong resolving of named schema. 2) For unnamed schemas there is no "slow path" resolving anymore and this breaks the "type promotion" support. Prevents regressions in the union schema support.
| return Err( | ||
| Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into(), | ||
| ); |
There was a problem hiding this comment.
There's a small copy-paste error here. When handling a duplicate Schema::Array, the error returned is Details::GetUnionDuplicateMap. This should be Details::GetUnionDuplicateArray to provide the correct error message.
| return Err( | |
| Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into(), | |
| ); | |
| return Err( | |
| Details::GetUnionDuplicateArray(self.schemas.remove(index), schema).into(), | |
| ); |
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There is a copy/paste error in the returned error for duplicate Schema::Array's. Prevents confusing the developer by using a wrong error
| value | ||
| .clone() | ||
| .resolve_internal(schema, known_schemata, namespace, &None) | ||
| .ok() | ||
| .map(|_| (index, schema)) | ||
| } else { | ||
| Some((index, schema)) | ||
| } | ||
| }); | ||
| let named = named.and_then(|kind| { | ||
| // Every named type needs to be checked against a value until one matches | ||
|
|
||
| let known_schemata_if_none = HashMap::new(); | ||
| let known_schemata = known_schemata.unwrap_or(&known_schemata_if_none); | ||
|
|
||
| self.named_index | ||
| .iter() | ||
| .copied() | ||
| .map(|i| (i, &self.schemas[i])) | ||
| .filter(|(_i, s)| s.discriminant() == kind || s.discriminant() == SchemaKind::Ref) | ||
| .find(|(_i, schema)| { | ||
| let namespace = if schema.namespace().is_some() { | ||
| &schema.namespace() | ||
| } else { | ||
| enclosing_namespace | ||
| }; | ||
|
|
||
| // TODO: Do this without the clone | ||
| value | ||
| .clone() | ||
| .resolve_internal(schema, known_schemata, namespace, &None) | ||
| .is_ok() | ||
| }) | ||
| .unwrap_or_default(); | ||
|
|
||
| self.schemas.iter().enumerate().find(|(_, schema)| { | ||
| let resolved_schema = ResolvedSchema::new_with_known_schemata( | ||
| vec![*schema], | ||
| enclosing_namespace, | ||
| &collected_names, | ||
| ) | ||
| .expect("Schema didn't successfully parse"); | ||
| let resolved_names = resolved_schema.names_ref; | ||
|
|
||
| // extend known schemas with just resolved names | ||
| collected_names.extend(resolved_names); | ||
| let namespace = &schema.namespace().or_else(|| enclosing_namespace.clone()); | ||
|
|
||
| value | ||
| .clone() | ||
| .resolve_internal(schema, &collected_names, namespace, &None) | ||
| .is_ok() | ||
| }) | ||
| }); | ||
|
|
||
| match (unnamed, named) { | ||
| (Some((u_i, _)), Some((n_i, _))) if u_i < n_i => unnamed, | ||
| (Some(_), Some(_)) => named, |
There was a problem hiding this comment.
The implementation of find_schema_with_known_schemata performs a deep clone of the value for every variant in the union during the resolution process (lines 114 and 142). If a union has many variants and the input Value is large or deeply nested, this can lead to excessive memory allocation and CPU consumption, potentially enabling a Denial of Service (DoS) attack.
As noted in the TODO comments, this should be implemented without cloning the value, for example by modifying resolve_internal to accept a reference.
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! The deep cloning could be avoided by changing the way the resolve() function is implemented. It would be good to use a reference instead.
🤖 Augment PR SummarySummary: This PR introduces a new Changes:
Technical Notes: Logical types are reduced to their base representation for indexing, and map/array matches now verify item compatibility by attempting schema resolution against the candidate variant. 🤖 Was this summary useful? React with 👍 or 👎 |
| if let Some(index) = self.variant_index.get(&SchemaKind::Map).copied() { | ||
| if self.schemas[index] != schema { | ||
| return Err( | ||
| Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into(), |
There was a problem hiding this comment.
In variant_ignore_duplicates, building the error using self.schemas.remove(index) mutates the builder on the error path, which can invalidate indices stored in variant_index/names if the caller continues using the builder after handling the error.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Augment AI reviewer is correct! The removal of the duplicate schema suggests that the caller could re-attempt to insert the schema after correcting it, but the indices (names and variant_index) should also be updated. Prevents leaving the UnionSchemaBuilder in inconsistent state.
| if let Some(index) = self.variant_index.get(&SchemaKind::Array).copied() { | ||
| if self.schemas[index] != schema { | ||
| return Err( | ||
| Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into(), |
There was a problem hiding this comment.
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! There is a copy/paste error in the returned error for duplicate Schema::Array's. Prevents confusing the developer by using a wrong error
| self.schemas.shrink_to_fit(); | ||
| UnionSchema { | ||
| variant_index: self.variant_index, | ||
| named_index: self.names.into_values().collect(), |
There was a problem hiding this comment.
named_index is derived from self.names.into_values(), so its iteration order is by Name (BTreeMap key order) rather than union index order; since find_schema_with_known_schemata scans named_index with .find(), union precedence for named types may depend on Name ordering instead of the declared union order.
Severity: medium
Other Locations
avro/src/schema/union.rs:128
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Augment AI reviewer is correct! The iteration over the named schemas depends on the alphabetical order of the schemas' names, not their insertion order. This may lead to wrong resolving of named schema.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
avro/src/serde/derive.rs (1)
640-650: Please add a regression test for theOption<Null>branch.This new edge-case path is important and easy to regress; a focused test for generated schema behavior here would be valuable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@avro/src/serde/derive.rs` around lines 640 - 650, Add a regression test that exercises the Option<Null> branch: call T::get_schema_in_ctxt (or whatever public helper you use in tests) for Option<Null> so the code path that checks if let Schema::Null = schema is taken, then assert the returned Schema::Union (from UnionSchema::new) contains exactly the single Schema::Null variant (no nested/extra variants). Place the test alongside other serde derive tests, use the same context setup used by derive tests, and reference the Option handling code paths around T::get_schema_in_ctxt, Schema::Null and UnionSchema::new to guide locating the correct behavior to verify.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@avro/src/schema/union.rs`:
- Around line 256-273: The variant_ignore_duplicates branch mutates builder
state by calling self.schemas.remove(index) before returning Err and also uses
the wrong error enum for arrays; update both the Map and Array branches in
variant_ignore_duplicates to avoid mutating self.schemas or self.variant_index
when encountering a duplicate (do not call remove), instead construct and return
the appropriate error using the existing values (clone or reference as needed)
so internal indices remain valid, and change the Array branch to return
Details::GetUnionDuplicateArray(...) while the Map branch returns
Details::GetUnionDuplicateMap(...).
- Around line 132-133: The current filter in union.rs uses s.discriminant() ==
kind (with SchemaKind::Ref) which incorrectly excludes named logical-type
variants; update the predicate to accept schemas whose effective/base kind
matches kind as well as SchemaKind::Ref — e.g., replace the s.discriminant()
check with a call that compares the schema's base/effective kind (or a helper
like is_logical_variant_of or base_kind) against kind so
duration/decimal/uuid-over-fixed and other logical variants are included when
matching.
- Around line 340-345: The named_index is being built from
self.names.into_values(), which yields BTreeMap key order instead of the union
declaration/schema order; change build() so named_index is constructed by
iterating schema-order (the order of self.schemas or using self.variant_index)
and for each schema index find and collect the corresponding name from
self.names (matching its stored index/value), so the first matching named branch
follows the original union declaration order; update the build() function to
produce UnionSchema { named_index: /* names collected in schema order */ , ... }
rather than using into_values().
---
Nitpick comments:
In `@avro/src/serde/derive.rs`:
- Around line 640-650: Add a regression test that exercises the Option<Null>
branch: call T::get_schema_in_ctxt (or whatever public helper you use in tests)
for Option<Null> so the code path that checks if let Schema::Null = schema is
taken, then assert the returned Schema::Union (from UnionSchema::new) contains
exactly the single Schema::Null variant (no nested/extra variants). Place the
test alongside other serde derive tests, use the same context setup used by
derive tests, and reference the Option handling code paths around
T::get_schema_in_ctxt, Schema::Null and UnionSchema::new to guide locating the
correct behavior to verify.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
avro/src/error.rsavro/src/schema/mod.rsavro/src/schema/name.rsavro/src/schema/union.rsavro/src/serde/derive.rsavro/src/types.rs
| .filter(|(_i, s)| s.discriminant() == kind || s.discriminant() == SchemaKind::Ref) | ||
| .find(|(_i, schema)| { |
There was a problem hiding this comment.
Named logical-type variants can be skipped during matching.
This filter compares raw discriminants against a base kind (Fixed, Bytes, etc.). For named logical schemas (for example duration/decimal/uuid-over-fixed), raw discriminants won’t match the base kind and valid variants can be excluded.
🔧 Proposed fix
- .filter(|(_i, s)| s.discriminant() == kind || s.discriminant() == SchemaKind::Ref)
+ .filter(|(_i, s)| {
+ s.discriminant() == SchemaKind::Ref
+ || UnionSchemaBuilder::schema_to_base_schemakind(s) == kind
+ })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/schema/union.rs` around lines 132 - 133, The current filter in
union.rs uses s.discriminant() == kind (with SchemaKind::Ref) which incorrectly
excludes named logical-type variants; update the predicate to accept schemas
whose effective/base kind matches kind as well as SchemaKind::Ref — e.g.,
replace the s.discriminant() check with a call that compares the schema's
base/effective kind (or a helper like is_logical_variant_of or base_kind)
against kind so duration/decimal/uuid-over-fixed and other logical variants are
included when matching.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! Since the logical schemas are based on another type a direct comparison of their SchemaKind won't match. They should be compared by using their base kinds.
| } else if let Schema::Map(_) = &schema { | ||
| if let Some(index) = self.variant_index.get(&SchemaKind::Map).copied() { | ||
| if self.schemas[index] != schema { | ||
| return Err( | ||
| Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into(), | ||
| ); | ||
| } | ||
| } else { | ||
| self.variant_index | ||
| .insert(SchemaKind::Map, self.schemas.len()); | ||
| self.schemas.push(schema); | ||
| } | ||
| } else if let Schema::Array(_) = &schema { | ||
| if let Some(index) = self.variant_index.get(&SchemaKind::Array).copied() { | ||
| if self.schemas[index] != schema { | ||
| return Err( | ||
| Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into(), | ||
| ); |
There was a problem hiding this comment.
variant_ignore_duplicates corrupts builder state on error and uses the wrong error variant for arrays.
On mismatch, this path calls self.schemas.remove(index) before returning Err, which mutates internal state (and stale indices remain in maps). Also, the array branch returns GetUnionDuplicateMap instead of GetUnionDuplicateArray.
🔧 Proposed fix
} else if let Schema::Map(_) = &schema {
if let Some(index) = self.variant_index.get(&SchemaKind::Map).copied() {
if self.schemas[index] != schema {
return Err(
- Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into(),
+ Details::GetUnionDuplicateMap(self.schemas[index].clone(), schema).into(),
);
}
} else {
self.variant_index
.insert(SchemaKind::Map, self.schemas.len());
self.schemas.push(schema);
}
} else if let Schema::Array(_) = &schema {
if let Some(index) = self.variant_index.get(&SchemaKind::Array).copied() {
if self.schemas[index] != schema {
return Err(
- Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into(),
+ Details::GetUnionDuplicateArray(self.schemas[index].clone(), schema).into(),
);
}
} else {
self.variant_index
.insert(SchemaKind::Array, self.schemas.len());
self.schemas.push(schema);
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/schema/union.rs` around lines 256 - 273, The
variant_ignore_duplicates branch mutates builder state by calling
self.schemas.remove(index) before returning Err and also uses the wrong error
enum for arrays; update both the Map and Array branches in
variant_ignore_duplicates to avoid mutating self.schemas or self.variant_index
when encountering a duplicate (do not call remove), instead construct and return
the appropriate error using the existing values (clone or reference as needed)
so internal indices remain valid, and change the Array branch to return
Details::GetUnionDuplicateArray(...) while the Map branch returns
Details::GetUnionDuplicateMap(...).
There was a problem hiding this comment.
value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The removal of the duplicate schema suggests that the caller could re-attempt to insert the schema after correcting it, but the indices (names and variant_index) should also be updated. Prevents leaving the UnionSchemaBuilder in inconsistent state.
| pub fn build(mut self) -> UnionSchema { | ||
| self.schemas.shrink_to_fit(); | ||
| UnionSchema { | ||
| variant_index: self.variant_index, | ||
| named_index: self.names.into_values().collect(), | ||
| schemas: self.schemas, |
There was a problem hiding this comment.
Preserve schema-order precedence when building named_index.
self.names is a BTreeMap, so into_values() is keyed-name order, not union declaration/index order. This can change which named branch is picked first when multiple candidates match.
🔧 Proposed fix
pub fn build(mut self) -> UnionSchema {
self.schemas.shrink_to_fit();
+ let mut named_index: Vec<_> = self.names.into_values().collect();
+ named_index.sort_unstable();
UnionSchema {
variant_index: self.variant_index,
- named_index: self.names.into_values().collect(),
+ named_index,
schemas: self.schemas,
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pub fn build(mut self) -> UnionSchema { | |
| self.schemas.shrink_to_fit(); | |
| UnionSchema { | |
| variant_index: self.variant_index, | |
| named_index: self.names.into_values().collect(), | |
| schemas: self.schemas, | |
| pub fn build(mut self) -> UnionSchema { | |
| self.schemas.shrink_to_fit(); | |
| let mut named_index: Vec<_> = self.names.into_values().collect(); | |
| named_index.sort_unstable(); | |
| UnionSchema { | |
| variant_index: self.variant_index, | |
| named_index, | |
| schemas: self.schemas, | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/schema/union.rs` around lines 340 - 345, The named_index is being
built from self.names.into_values(), which yields BTreeMap key order instead of
the union declaration/schema order; change build() so named_index is constructed
by iterating schema-order (the order of self.schemas or using
self.variant_index) and for each schema index find and collect the corresponding
name from self.names (matching its stored index/value), so the first matching
named branch follows the original union declaration order; update the build()
function to produce UnionSchema { named_index: /* names collected in schema
order */ , ... } rather than using into_values().
There was a problem hiding this comment.
value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The iteration over the named schemas depends on the alphabetical order of the schemas' names, not their insertion order. This may lead to wrong resolving of named schema.
Code ReviewThis PR introduces a Bug 1: Wrong error variant for duplicate arrays (high severity)In The error arm reads: Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into()but should use Bug 2: Builder state corruption on Map/Array duplicate error (high severity)In the same return Err(Details::GetUnionDuplicateMap(self.schemas.remove(index), schema).into());
The fix is to clone instead of remove: Details::GetUnionDuplicateArray(self.schemas[index].clone(), schema).into()Bug 3:
|
| Issue | Severity |
|---|---|
Wrong error variant for array duplicates (GetUnionDuplicateMap used instead of GetUnionDuplicateArray) |
High |
Builder state corruption via schemas.remove on error path |
High |
named_index sorted by name, not union order |
Medium |
Missing tests (array duplicate path, contains, ordering) |
Medium |
Ord on Name avoidable with HashMap |
Low |
The overall direction is a genuine improvement -- the builder API and better logical-type handling in schema matching are both valuable additions. The two high-severity bugs are straightforward to fix before merging.
value:good-to-have; category:bug; feedback: The CodeRabbit AI reviewer is correct! The fix for supporting derivation of AvroSchema for Option<()> should also include a unit test, otherwise there is a chance that the feature/fix may regress in the future without being noticed. |
value:good-to-have; category:documentation; feedback: The Claude AI reviewer is correct! There is a copy/paste error in the returned error for duplicate Schema::Array's. Prevents confusing the developer by using a wrong error |
value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The removal of the duplicate schema suggests that the caller could re-attempt to insert the schema after correcting it, but the indices (names and variant_index) should also be updated. Prevents leaving the UnionSchemaBuilder in inconsistent state. |
value:useful; category:bug; feedback: The Claude AI reviewer is correct! The iteration over the named schemas depends on the alphabetical order of the schemas' names, not their insertion order. This may lead to wrong resolving of named schema. |
489: To review by AI