448: fix: flatten no longer causes duplicate names#44
448: fix: flatten no longer causes duplicate names#44martin-augment wants to merge 3 commits intomainfrom
flatten no longer causes duplicate names#44Conversation
When a type was used both via `flatten` and directly, the schema generated would contain duplicate names (and schemas). This is because `flatten` would use an empty `named_schemas` to get the record schema. If the existing `named_schemas` was used, `flatten` might get a `Schema::Ref` if the type was already used. Or when `flatten` was used first, if the type was used after it would create a `Schema::Ref` to a schema that does not exist. This is solved by adding a new function to the `AvroSchemaComponent` that returns the fields directly. To not break code currently implementing this trait, it has a default implementation that will work around the issues above. This default implementation is also used for fields with the `#[avro(with = ||)]` and `#[avro(with = path)]` attributes, as they don't have a way to provide the field directly. Users of `#[avro(with)]` will need to implement `get_record_fields_in_ctxt` in their module.
…hemaComponent` The implementation for `serde_json::Map<String, T>` has been removed. This is **not** a breaking change, as `serde_json::Map` can only be constructed for `<String, Value>` and `Value` does not implement `AvroSchemaComponent`.
WalkthroughThis pull request adds support for extracting record field metadata from Avro schemas. Changes include introducing a new public function ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 🧪 Unit Test Generation v2 is now available!We have significantly improved our unit test generation capabilities. To enable: Add this to your reviews:
finishing_touches:
unit_tests:
enabled: trueTry it out by using the Have feedback? Share your thoughts on our Discord thread! Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical issue where the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively resolves the issue of flatten causing duplicate name errors by introducing a new, non-polluting mechanism for retrieving record fields, utilizing get_record_fields_in_ctxt for sound schema flattening. The changes are well-structured and include comprehensive tests. No vulnerabilities or significant security issues were found. There is one suggestion to improve the diagnostics in the derive macro for a better developer experience.
| schema_fields.extend(flattened_fields) | ||
| } else { | ||
| panic!("Can only flatten RecordSchema, got {:?}", #flatten_ty::get_schema()) | ||
| panic!("#field does not have any fields to flatten to") |
There was a problem hiding this comment.
The panic message here could be more explicit about why flattening failed. The current message is a bit indirect. A more direct message stating that the field must be a record type would be more helpful for users.
| panic!("#field does not have any fields to flatten to") | |
| panic!("Field {} cannot be flattened because it is not a record type.", stringify!(#field)) |
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Gemini AI reviewer is correct! The quote!() macro does not interpolate inside strings, so a normal string interpolation will have to be used instead. The stringify!() macro has to be used to get a string view of the TokenTree. If it is too verbose then only the #field's Ident could be used.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@avro_derive/src/lib.rs`:
- Around line 167-175: The panic message inside the quoted block uses the
literal "#field" instead of the actual field identifier; update the code that
pushes into record_field_exprs (the quote! block that references
get_record_fields and the variable get_record_fields from
get_field_get_record_fields_expr) so the panic prints the real field name by
interpolating the field identifier (e.g., via stringify!(`#field`) or
concat!(stringify!(`#field`), "...")) instead of the raw "#field" string; ensure
the change is applied where record_field_exprs.push(...) is constructed so
failures show which field failed to flatten.
| let get_record_fields = | ||
| get_field_get_record_fields_expr(&field, field_attrs.with)?; | ||
| record_field_exprs.push(quote! { | ||
| if let ::apache_avro::schema::Schema::Record(::apache_avro::schema::RecordSchema { fields, .. }) = #flatten_ty::get_schema() { | ||
| for mut field in fields { | ||
| field.position = schema_fields.len(); | ||
| schema_fields.push(field) | ||
| } | ||
| if let Some(flattened_fields) = #get_record_fields { | ||
| schema_fields.extend(flattened_fields) | ||
| } else { | ||
| panic!("Can only flatten RecordSchema, got {:?}", #flatten_ty::get_schema()) | ||
| panic!("#field does not have any fields to flatten to") | ||
| } | ||
| }); |
There was a problem hiding this comment.
Panic message does not interpolate the field identifier.
The #field in the panic message is a literal string inside quote!, not a Rust string interpolation. This will print "#field does not have any fields to flatten to" instead of showing which field failed.
🐛 Proposed fix to include the field name in the panic message
+ let field_name = field
+ .ident
+ .as_ref()
+ .map(|i| i.to_string())
+ .unwrap_or_else(|| "unnamed".to_string());
record_field_exprs.push(quote! {
if let Some(flattened_fields) = `#get_record_fields` {
schema_fields.extend(flattened_fields)
} else {
- panic!("#field does not have any fields to flatten to")
+ panic!("{} does not have any fields to flatten to", `#field_name`)
}
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let get_record_fields = | |
| get_field_get_record_fields_expr(&field, field_attrs.with)?; | |
| record_field_exprs.push(quote! { | |
| if let ::apache_avro::schema::Schema::Record(::apache_avro::schema::RecordSchema { fields, .. }) = #flatten_ty::get_schema() { | |
| for mut field in fields { | |
| field.position = schema_fields.len(); | |
| schema_fields.push(field) | |
| } | |
| if let Some(flattened_fields) = #get_record_fields { | |
| schema_fields.extend(flattened_fields) | |
| } else { | |
| panic!("Can only flatten RecordSchema, got {:?}", #flatten_ty::get_schema()) | |
| panic!("#field does not have any fields to flatten to") | |
| } | |
| }); | |
| let get_record_fields = | |
| get_field_get_record_fields_expr(&field, field_attrs.with)?; | |
| let field_name = field | |
| .ident | |
| .as_ref() | |
| .map(|i| i.to_string()) | |
| .unwrap_or_else(|| "unnamed".to_string()); | |
| record_field_exprs.push(quote! { | |
| if let Some(flattened_fields) = `#get_record_fields` { | |
| schema_fields.extend(flattened_fields) | |
| } else { | |
| panic!("{} does not have any fields to flatten to", `#field_name`) | |
| } | |
| }); |
🤖 Prompt for AI Agents
In `@avro_derive/src/lib.rs` around lines 167 - 175, The panic message inside the
quoted block uses the literal "#field" instead of the actual field identifier;
update the code that pushes into record_field_exprs (the quote! block that
references get_record_fields and the variable get_record_fields from
get_field_get_record_fields_expr) so the panic prints the real field name by
interpolating the field identifier (e.g., via stringify!(`#field`) or
concat!(stringify!(`#field`), "...")) instead of the raw "#field" string; ensure
the change is applied where record_field_exprs.push(...) is constructed so
failures show which field failed to flatten.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! The quote!() macro does not interpolate inside strings, so a normal string interpolation will have to be used instead. The stringify!() macro has to be used to get a string view of the TokenTree. If it is too verbose then only the #field's Ident could be used.
PR Review: Fix
|
🤖 Augment PR SummarySummary: This PR fixes AvroSchema derive behavior where Changes:
Technical Notes: The new helper tries to preserve 🤖 Was this summary useful? React with 👍 or 👎 |
| schema_fields.push(field) | ||
| } | ||
| if let Some(flattened_fields) = #get_record_fields { | ||
| schema_fields.extend(flattened_fields) |
There was a problem hiding this comment.
When flattening, schema_fields.extend(flattened_fields) preserves the inner fields’ existing RecordField.position values, which can become inconsistent with their new index in the parent schema_fields and break the lookup map built from position.
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Augment AI reviewer is correct! The positions of the flattened fields are calculated for their inner RecordSchema but once they are moved to the parent schema and appended to its fields the positions of the flattened field need to be re-calculated, i.e. increased with the number of the parent's fields.
| match with { | ||
| With::Trait => Ok(type_to_get_record_fields_expr(&field.ty)?), | ||
| With::Serde(path) => { | ||
| Ok(quote! { #path::get_record_fields_in_ctxt(named_schemas, enclosing_namespace) }) |
There was a problem hiding this comment.
With::Serde here generates #path::get_record_fields_in_ctxt(...), but existing apache_avro::serde::* “with” modules appear to only provide get_schema_in_ctxt; this can cause derive-time compile failures for transparent/flatten cases that use #[avro(with)] + #[serde(with = "...")].
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Augment AI reviewer is correct! The new code assumes that provided module for with has a function named get_record_fields_in_ctxt()and tries to use it but even the default implementations of With::Serde in avro/src/serde/with.rs do not provide this new function. Prevents compilation error when trying to use #[serde(flatten, with = ...)]
| } | ||
|
|
||
| impl<T> AvroSchemaComponent for Vec<T> | ||
| impl<T> AvroSchemaComponent for HashMap<String, T> |
There was a problem hiding this comment.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Augment AI reviewer is correct! The old implementation of AvroSchemaComponent for serde_json::Map<String, T> is lost in the refactoring. Prevents broken builds for external users of this implementation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| schema_fields.push(field) | ||
| } | ||
| if let Some(flattened_fields) = #get_record_fields { | ||
| schema_fields.extend(flattened_fields) |
There was a problem hiding this comment.
Flattened fields retain original positions instead of updated ones
High Severity
The new flatten handling uses schema_fields.extend(flattened_fields) without updating each field's position value. The old code had field.position = schema_fields.len() before pushing each field. When a regular field appears before a flatten field, the flattened fields will have incorrect positions (their original positions from the nested schema rather than their actual indices in the combined schema). This causes the lookup table to have wrong mappings, since it's built using field.position values.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The positions of the flattened fields are calculated for their inner RecordSchema but once they are moved to the parent schema and appended to its fields the positions of the flattened field need to be re-calculated, i.e. increased with the number of the parent's fields.
| schema_fields.extend(flattened_fields) | ||
| } else { | ||
| panic!("Can only flatten RecordSchema, got {:?}", #flatten_ty::get_schema()) | ||
| panic!("#field does not have any fields to flatten to") |
There was a problem hiding this comment.
Panic message shows literal text instead of diagnostic info
Low Severity
The panic message "#field does not have any fields to flatten to" is misleading because #field inside a string literal in quote! is not interpolated—it appears as literal text in the generated code. The old code used panic!("Can only flatten RecordSchema, got {:?}", #flatten_ty::get_schema()) which showed the actual schema type at runtime, providing useful diagnostic information for debugging. The regression makes it harder to understand what went wrong when flattening fails.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The quote!() macro does not interpolate inside strings, so a normal string interpolation will have to be used instead. The stringify!() macro has to be used to get a string view of the TokenTree. If it is too verbose then only the #field's Ident could be used.
value:useful; category:bug; feedback:The Claude AI reviewer is correct! The quote!() macro does not interpolate inside strings, so a normal string interpolation will have to be used instead. The stringify!() macro has to be used to get a string view of the TokenTree. If it is too verbose then only the |
value:useful; category:bug; feedback:The Claude AI reviewer is correct! The positions of the flattened fields are calculated for their inner RecordSchema but once they are moved to the parent schema and appended to its fields the positions of the flattened field need to be re-calculated, i.e. increased with the number of the parent's fields. |


448: To review by AI