Skip to content

448: fix: flatten no longer causes duplicate names#44

Open
martin-augment wants to merge 3 commits intomainfrom
pr-448-2026-01-30-04-20-09
Open

448: fix: flatten no longer causes duplicate names#44
martin-augment wants to merge 3 commits intomainfrom
pr-448-2026-01-30-04-20-09

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

448: To review by AI

Kriskras99 and others added 3 commits January 28, 2026 22:59
When a type was used both via `flatten` and directly, the schema generated would
contain duplicate names (and schemas). This is because `flatten` would use an empty
`named_schemas` to get the record schema. If the existing `named_schemas` was used,
`flatten` might get a `Schema::Ref` if the type was already used. Or when `flatten`
was used first, if the type was used after it would create a `Schema::Ref` to a schema
that does not exist.

This is solved by adding a new function to the `AvroSchemaComponent` that returns
the fields directly. To not break code currently implementing this trait, it has
a default implementation that will work around the issues above. This default
implementation is also used for fields with the `#[avro(with = ||)]` and `#[avro(with = path)]`
attributes, as they don't have a way to provide the field directly. Users of `#[avro(with)]`
will need to implement `get_record_fields_in_ctxt` in their module.
…hemaComponent`

The implementation for `serde_json::Map<String, T>` has been removed. This is **not** a breaking change, as `serde_json::Map` can only be constructed for `<String, Value>` and `Value` does not implement `AvroSchemaComponent`.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 30, 2026

Walkthrough

This pull request adds support for extracting record field metadata from Avro schemas. Changes include introducing a new public function get_record_fields_in_ctxt and expanding the AvroSchemaComponent trait with a corresponding method. Helper macros impl_passthrough_schema! and impl_array_schema! reduce boilerplate for wrapper types and collections. The derive macro code generator was refactored to emit separate implementations for schema and record-field extraction. Collection handling was updated to use map-based schemas. Comprehensive tests validate the new functionality for both non-recursive and recursive record types, as well as flattening and transparent serde scenarios.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-448-2026-01-30-04-20-09

Tip

🧪 Unit Test Generation v2 is now available!

We have significantly improved our unit test generation capabilities.

To enable: Add this to your .coderabbit.yaml configuration:

reviews:
  finishing_touches:
    unit_tests:
      enabled: true

Try it out by using the @coderabbitai generate unit tests command on your code files or under ✨ Finishing Touches on the walkthrough!

Have feedback? Share your thoughts on our Discord thread!


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical issue where the #[serde(flatten)] attribute could lead to duplicate field names in generated Avro schemas, particularly when dealing with recursive data structures. The solution involves introducing a new get_record_fields_in_ctxt method to the AvroSchemaComponent trait and a corresponding helper function. This new mechanism allows for the precise extraction and management of record fields during schema derivation, ensuring that named schemas are correctly referenced and no duplicate names are introduced, thereby enhancing the robustness of Avro schema generation for complex Rust types.

Highlights

  • Introduced get_record_fields_in_ctxt: A new trait method and a public helper function get_record_fields_in_ctxt were added to AvroSchemaComponent to facilitate the correct extraction of record fields, specifically addressing issues with #[serde(flatten)] and recursive types.
  • Fixed flatten name duplication: The derivation logic for #[serde(flatten)] now leverages the new get_record_fields_in_ctxt mechanism, which prevents the generation of duplicate field names in derived Avro schemas, especially in scenarios involving recursive data structures.
  • Improved schema derivation for complex types: The changes ensure that named schemas are properly managed within the named_schemas context during derivation, avoiding pollution and incorrect schema generation for types that involve references and recursion.
  • Expanded test coverage: New test cases have been added to validate the correct behavior of flatten with recurring and transparent types, and to ensure the get_record_fields_in_ctxt function operates as expected without introducing name conflicts.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves the issue of flatten causing duplicate name errors by introducing a new, non-polluting mechanism for retrieving record fields, utilizing get_record_fields_in_ctxt for sound schema flattening. The changes are well-structured and include comprehensive tests. No vulnerabilities or significant security issues were found. There is one suggestion to improve the diagnostics in the derive macro for a better developer experience.

schema_fields.extend(flattened_fields)
} else {
panic!("Can only flatten RecordSchema, got {:?}", #flatten_ty::get_schema())
panic!("#field does not have any fields to flatten to")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The panic message here could be more explicit about why flattening failed. The current message is a bit indirect. A more direct message stating that the field must be a record type would be more helpful for users.

Suggested change
panic!("#field does not have any fields to flatten to")
panic!("Field {} cannot be flattened because it is not a record type.", stringify!(#field))

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Gemini AI reviewer is correct! The quote!() macro does not interpolate inside strings, so a normal string interpolation will have to be used instead. The stringify!() macro has to be used to get a string view of the TokenTree. If it is too verbose then only the #field's Ident could be used.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@avro_derive/src/lib.rs`:
- Around line 167-175: The panic message inside the quoted block uses the
literal "#field" instead of the actual field identifier; update the code that
pushes into record_field_exprs (the quote! block that references
get_record_fields and the variable get_record_fields from
get_field_get_record_fields_expr) so the panic prints the real field name by
interpolating the field identifier (e.g., via stringify!(`#field`) or
concat!(stringify!(`#field`), "...")) instead of the raw "#field" string; ensure
the change is applied where record_field_exprs.push(...) is constructed so
failures show which field failed to flatten.

Comment on lines +167 to 175
let get_record_fields =
get_field_get_record_fields_expr(&field, field_attrs.with)?;
record_field_exprs.push(quote! {
if let ::apache_avro::schema::Schema::Record(::apache_avro::schema::RecordSchema { fields, .. }) = #flatten_ty::get_schema() {
for mut field in fields {
field.position = schema_fields.len();
schema_fields.push(field)
}
if let Some(flattened_fields) = #get_record_fields {
schema_fields.extend(flattened_fields)
} else {
panic!("Can only flatten RecordSchema, got {:?}", #flatten_ty::get_schema())
panic!("#field does not have any fields to flatten to")
}
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Panic message does not interpolate the field identifier.

The #field in the panic message is a literal string inside quote!, not a Rust string interpolation. This will print "#field does not have any fields to flatten to" instead of showing which field failed.

🐛 Proposed fix to include the field name in the panic message
+                    let field_name = field
+                        .ident
+                        .as_ref()
+                        .map(|i| i.to_string())
+                        .unwrap_or_else(|| "unnamed".to_string());
                     record_field_exprs.push(quote! {
                         if let Some(flattened_fields) = `#get_record_fields` {
                             schema_fields.extend(flattened_fields)
                         } else {
-                            panic!("#field does not have any fields to flatten to")
+                            panic!("{} does not have any fields to flatten to", `#field_name`)
                         }
                     });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let get_record_fields =
get_field_get_record_fields_expr(&field, field_attrs.with)?;
record_field_exprs.push(quote! {
if let ::apache_avro::schema::Schema::Record(::apache_avro::schema::RecordSchema { fields, .. }) = #flatten_ty::get_schema() {
for mut field in fields {
field.position = schema_fields.len();
schema_fields.push(field)
}
if let Some(flattened_fields) = #get_record_fields {
schema_fields.extend(flattened_fields)
} else {
panic!("Can only flatten RecordSchema, got {:?}", #flatten_ty::get_schema())
panic!("#field does not have any fields to flatten to")
}
});
let get_record_fields =
get_field_get_record_fields_expr(&field, field_attrs.with)?;
let field_name = field
.ident
.as_ref()
.map(|i| i.to_string())
.unwrap_or_else(|| "unnamed".to_string());
record_field_exprs.push(quote! {
if let Some(flattened_fields) = `#get_record_fields` {
schema_fields.extend(flattened_fields)
} else {
panic!("{} does not have any fields to flatten to", `#field_name`)
}
});
🤖 Prompt for AI Agents
In `@avro_derive/src/lib.rs` around lines 167 - 175, The panic message inside the
quoted block uses the literal "#field" instead of the actual field identifier;
update the code that pushes into record_field_exprs (the quote! block that
references get_record_fields and the variable get_record_fields from
get_field_get_record_fields_expr) so the panic prints the real field name by
interpolating the field identifier (e.g., via stringify!(`#field`) or
concat!(stringify!(`#field`), "...")) instead of the raw "#field" string; ensure
the change is applied where record_field_exprs.push(...) is constructed so
failures show which field failed to flatten.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! The quote!() macro does not interpolate inside strings, so a normal string interpolation will have to be used instead. The stringify!() macro has to be used to get a string view of the TokenTree. If it is too verbose then only the #field's Ident could be used.

@claude
Copy link
Copy Markdown

claude bot commented Jan 30, 2026

PR Review: Fix flatten no longer causes duplicate names

Summary

This PR introduces a new get_record_fields_in_ctxt method to the AvroSchemaComponent trait to properly handle flattened fields without polluting the named_schemas registry or causing duplicate name errors. The implementation is well-thought-out and addresses issue apache#448.

Code Quality & Architecture

Strengths:

  • Clean separation of concerns with the new get_record_fields_in_ctxt helper function
  • Good use of macros to reduce code duplication
  • Comprehensive test coverage for edge cases
  • Proper documentation with clear explanations

Concerns:

  1. Complex Logic in get_record_fields_in_ctxt (avro/src/serde/derive.rs:106-214)

    • Handles three distinct cases with complex state management
    • Consider adding inline comments explaining state transitions around lines 187-210
  2. Panic in Production Code (avro/src/serde/derive.rs:118, 203)

    • Panics assume invariants that might not hold if schema is constructed manually
    • Consider Result returns instead
  3. Error Message Quality (avro_derive/src/lib.rs:173)

    • The error message uses #field which will literally print "#field"
    • Should interpolate the actual field name

Potential Bugs

  1. Missing Field Position Updates (avro_derive/src/lib.rs:164-175)

    • When flattening, schema_fields.extend(flattened_fields) doesn't update position field
    • Critical: This could cause incorrect field positions in flattened schemas
  2. Schema Mutation in find_first_ref

    • The function modifies the schema tree while searching
    • Add a test for calling get_record_fields_in_ctxt multiple times (idempotence)

Test Coverage

Excellent coverage for: recursion, transparent wrappers, custom with handlers

Missing:

  1. Deeply nested flattening
  2. Field position verification for flattened schemas
  3. Error cases when flattening non-record types
  4. Idempotence tests

Recommendations

Critical:

  • Fix field position issue in flattened records
  • Fix error message placeholder #field

Important:

  • Add field position test for flattened schemas
  • Add inline comments explaining state management
  • Add idempotence test

Overall Assessment

This is a solid PR that solves a real problem. The architecture is sound and test coverage is good. However, the field position issue needs to be addressed before merging.

Verdict: Approve with changes requested (primarily the field position issue)

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Jan 30, 2026

🤖 Augment PR Summary

Summary: This PR fixes AvroSchema derive behavior where #[serde(flatten)] could lead to duplicate named schema definitions.

Changes:

  • Extends AvroSchemaComponent with get_record_fields_in_ctxt for extracting record fields without building/retaining the full record schema.
  • Adds a (doc-hidden) helper apache_avro::serde::get_record_fields_in_ctxt to retrieve record fields while avoiding Names pollution and handling recursive records.
  • Refactors several schema impls (passthrough wrappers, arrays, maps, option, logical types) to implement the new method.
  • Updates the derive macro to generate both get_schema_in_ctxt and get_record_fields_in_ctxt for structs/enums.
  • Changes flatten expansion to use record-field extraction instead of calling get_schema() and pattern-matching the resulting schema.
  • Re-exports the helper from apache_avro::serde (hidden) for proc-macro usage with #[avro(with = ...)] expressions.
  • Adds new tests covering record-field extraction with/without recursion and several flatten/transparent scenarios.

Technical Notes: The new helper tries to preserve Names state while still ensuring recursive schemas introduce a concrete record definition before subsequent self-references.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

schema_fields.push(field)
}
if let Some(flattened_fields) = #get_record_fields {
schema_fields.extend(flattened_fields)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When flattening, schema_fields.extend(flattened_fields) preserves the inner fields’ existing RecordField.position values, which can become inconsistent with their new index in the parent schema_fields and break the lookup map built from position.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The positions of the flattened fields are calculated for their inner RecordSchema but once they are moved to the parent schema and appended to its fields the positions of the flattened field need to be re-calculated, i.e. increased with the number of the parent's fields.

match with {
With::Trait => Ok(type_to_get_record_fields_expr(&field.ty)?),
With::Serde(path) => {
Ok(quote! { #path::get_record_fields_in_ctxt(named_schemas, enclosing_namespace) })
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With::Serde here generates #path::get_record_fields_in_ctxt(...), but existing apache_avro::serde::* “with” modules appear to only provide get_schema_in_ctxt; this can cause derive-time compile failures for transparent/flatten cases that use #[avro(with)] + #[serde(with = "...")].

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The new code assumes that provided module for with has a function named get_record_fields_in_ctxt()and tries to use it but even the default implementations of With::Serde in avro/src/serde/with.rs do not provide this new function. Prevents compilation error when trying to use #[serde(flatten, with = ...)]

}

impl<T> AvroSchemaComponent for Vec<T>
impl<T> AvroSchemaComponent for HashMap<String, T>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change drops the AvroSchemaComponent impl for serde_json::Map<String, T> (only HashMap<String, T> remains); if downstream users relied on serde_json::Map schema generation, this may be an unintended regression.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The old implementation of AvroSchemaComponent for serde_json::Map<String, T> is lost in the refactoring. Prevents broken builds for external users of this implementation

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

schema_fields.push(field)
}
if let Some(flattened_fields) = #get_record_fields {
schema_fields.extend(flattened_fields)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flattened fields retain original positions instead of updated ones

High Severity

The new flatten handling uses schema_fields.extend(flattened_fields) without updating each field's position value. The old code had field.position = schema_fields.len() before pushing each field. When a regular field appears before a flatten field, the flattened fields will have incorrect positions (their original positions from the nested schema rather than their actual indices in the combined schema). This causes the lookup table to have wrong mappings, since it's built using field.position values.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The positions of the flattened fields are calculated for their inner RecordSchema but once they are moved to the parent schema and appended to its fields the positions of the flattened field need to be re-calculated, i.e. increased with the number of the parent's fields.

schema_fields.extend(flattened_fields)
} else {
panic!("Can only flatten RecordSchema, got {:?}", #flatten_ty::get_schema())
panic!("#field does not have any fields to flatten to")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panic message shows literal text instead of diagnostic info

Low Severity

The panic message "#field does not have any fields to flatten to" is misleading because #field inside a string literal in quote! is not interpolated—it appears as literal text in the generated code. The old code used panic!("Can only flatten RecordSchema, got {:?}", #flatten_ty::get_schema()) which showed the actual schema type at runtime, providing useful diagnostic information for debugging. The regression makes it harder to understand what went wrong when flattening fails.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The quote!() macro does not interpolate inside strings, so a normal string interpolation will have to be used instead. The stringify!() macro has to be used to get a string view of the TokenTree. If it is too verbose then only the #field's Ident could be used.

@martin-augment
Copy link
Copy Markdown
Owner Author

Error Message Quality (avro_derive/src/lib.rs:173)

  • The error message uses #field which will literally print "#field"
  • Should interpolate the actual field name

value:useful; category:bug; feedback:The Claude AI reviewer is correct! The quote!() macro does not interpolate inside strings, so a normal string interpolation will have to be used instead. The stringify!() macro has to be used to get a string view of the TokenTree. If it is too verbose then only the #field's Ident could be used.

@martin-augment
Copy link
Copy Markdown
Owner Author

Missing Field Position Updates (avro_derive/src/lib.rs:164-175)

  • When flattening, schema_fields.extend(flattened_fields) doesn't update position field
  • Critical: This could cause incorrect field positions in flattened schemas

value:useful; category:bug; feedback:The Claude AI reviewer is correct! The positions of the flattened fields are calculated for their inner RecordSchema but once they are moved to the parent schema and appended to its fields the positions of the flattened field need to be re-calculated, i.e. increased with the number of the parent's fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants