448: fix: `flatten` no longer causes duplicate names by martin-augment · Pull Request #44 · martin-augment/avro-rs

martin-augment · 2026-01-30T04:20:12Z

448: To review by AI

When a type was used both via `flatten` and directly, the schema generated would contain duplicate names (and schemas). This is because `flatten` would use an empty `named_schemas` to get the record schema. If the existing `named_schemas` was used, `flatten` might get a `Schema::Ref` if the type was already used. Or when `flatten` was used first, if the type was used after it would create a `Schema::Ref` to a schema that does not exist. This is solved by adding a new function to the `AvroSchemaComponent` that returns the fields directly. To not break code currently implementing this trait, it has a default implementation that will work around the issues above. This default implementation is also used for fields with the `#[avro(with = ||)]` and `#[avro(with = path)]` attributes, as they don't have a way to provide the field directly. Users of `#[avro(with)]` will need to implement `get_record_fields_in_ctxt` in their module.

…hemaComponent` The implementation for `serde_json::Map<String, T>` has been removed. This is **not** a breaking change, as `serde_json::Map` can only be constructed for `<String, Value>` and `Value` does not implement `AvroSchemaComponent`.

coderabbitai · 2026-01-30T04:20:31Z

Walkthrough

This pull request adds support for extracting record field metadata from Avro schemas. Changes include introducing a new public function get_record_fields_in_ctxt and expanding the AvroSchemaComponent trait with a corresponding method. Helper macros impl_passthrough_schema! and impl_array_schema! reduce boilerplate for wrapper types and collections. The derive macro code generator was refactored to emit separate implementations for schema and record-field extraction. Collection handling was updated to use map-based schemas. Comprehensive tests validate the new functionality for both non-recursive and recursive record types, as well as flattening and transparent serde scenarios.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-448-2026-01-30-04-20-09

Tip

🧪 Unit Test Generation v2 is now available!

We have significantly improved our unit test generation capabilities.

To enable: Add this to your .coderabbit.yaml configuration:

reviews:
  finishing_touches:
    unit_tests:
      enabled: true

Try it out by using the @coderabbitai generate unit tests command on your code files or under ✨ Finishing Touches on the walkthrough!

Have feedback? Share your thoughts on our Discord thread!

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-01-30T04:20:33Z

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical issue where the #[serde(flatten)] attribute could lead to duplicate field names in generated Avro schemas, particularly when dealing with recursive data structures. The solution involves introducing a new get_record_fields_in_ctxt method to the AvroSchemaComponent trait and a corresponding helper function. This new mechanism allows for the precise extraction and management of record fields during schema derivation, ensuring that named schemas are correctly referenced and no duplicate names are introduced, thereby enhancing the robustness of Avro schema generation for complex Rust types.

Highlights

Introduced get_record_fields_in_ctxt: A new trait method and a public helper function get_record_fields_in_ctxt were added to AvroSchemaComponent to facilitate the correct extraction of record fields, specifically addressing issues with #[serde(flatten)] and recursive types.
Fixed flatten name duplication: The derivation logic for #[serde(flatten)] now leverages the new get_record_fields_in_ctxt mechanism, which prevents the generation of duplicate field names in derived Avro schemas, especially in scenarios involving recursive data structures.
Improved schema derivation for complex types: The changes ensure that named schemas are properly managed within the named_schemas context during derivation, avoiding pollution and incorrect schema generation for types that involve references and recursion.
Expanded test coverage: New test cases have been added to validate the correct behavior of flatten with recurring and transparent types, and to ensure the get_record_fields_in_ctxt function operates as expected without introducing name conflicts.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively resolves the issue of flatten causing duplicate name errors by introducing a new, non-polluting mechanism for retrieving record fields, utilizing get_record_fields_in_ctxt for sound schema flattening. The changes are well-structured and include comprehensive tests. No vulnerabilities or significant security issues were found. There is one suggestion to improve the diagnostics in the derive macro for a better developer experience.

gemini-code-assist · 2026-01-30T04:22:51Z

avro_derive/src/lib.rs

+                            schema_fields.extend(flattened_fields)
                        } else {
-                            panic!("Can only flatten RecordSchema, got {:?}", #flatten_ty::get_schema())
+                            panic!("#field does not have any fields to flatten to")


The panic message here could be more explicit about why flattening failed. The current message is a bit indirect. A more direct message stating that the field must be a record type would be more helpful for users.

Suggested change

panic!("#field does not have any fields to flatten to")

panic!("Field {} cannot be flattened because it is not a record type.", stringify!(#field))

value:useful; category:bug; feedback:The Gemini AI reviewer is correct! The quote!() macro does not interpolate inside strings, so a normal string interpolation will have to be used instead. The stringify!() macro has to be used to get a string view of the TokenTree. If it is too verbose then only the #field's Ident could be used.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@avro_derive/src/lib.rs`:
- Around line 167-175: The panic message inside the quoted block uses the
literal "#field" instead of the actual field identifier; update the code that
pushes into record_field_exprs (the quote! block that references
get_record_fields and the variable get_record_fields from
get_field_get_record_fields_expr) so the panic prints the real field name by
interpolating the field identifier (e.g., via stringify!(`#field`) or
concat!(stringify!(`#field`), "...")) instead of the raw "#field" string; ensure
the change is applied where record_field_exprs.push(...) is constructed so
failures show which field failed to flatten.

coderabbitai · 2026-01-30T04:24:12Z

avro_derive/src/lib.rs

+                    let get_record_fields =
+                        get_field_get_record_fields_expr(&field, field_attrs.with)?;
                    record_field_exprs.push(quote! {
-                        if let ::apache_avro::schema::Schema::Record(::apache_avro::schema::RecordSchema { fields, .. }) = #flatten_ty::get_schema() {
-                            for mut field in fields {
-                                field.position = schema_fields.len();
-                                schema_fields.push(field)
-                            }
+                        if let Some(flattened_fields) = #get_record_fields {
+                            schema_fields.extend(flattened_fields)
                        } else {
-                            panic!("Can only flatten RecordSchema, got {:?}", #flatten_ty::get_schema())
+                            panic!("#field does not have any fields to flatten to")
                        }
                    });


⚠️ Potential issue | 🟡 Minor

Panic message does not interpolate the field identifier.

The #field in the panic message is a literal string inside quote!, not a Rust string interpolation. This will print "#field does not have any fields to flatten to" instead of showing which field failed.

🐛 Proposed fix to include the field name in the panic message

+ let field_name = field + .ident + .as_ref() + .map(|i| i.to_string()) + .unwrap_or_else(|| "unnamed".to_string()); record_field_exprs.push(quote! { if let Some(flattened_fields) = `#get_record_fields` { schema_fields.extend(flattened_fields) } else { - panic!("#field does not have any fields to flatten to") + panic!("{} does not have any fields to flatten to", `#field_name`) } });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let get_record_fields =

get_field_get_record_fields_expr(&field, field_attrs.with)?;

record_field_exprs.push(quote! {

if let ::apache_avro::schema::Schema::Record(::apache_avro::schema::RecordSchema { fields, .. }) = #flatten_ty::get_schema() {

for mut field in fields {

field.position = schema_fields.len();

schema_fields.push(field)

}

if let Some(flattened_fields) = #get_record_fields {

schema_fields.extend(flattened_fields)

} else {

panic!("Can only flatten RecordSchema, got {:?}", #flatten_ty::get_schema())

panic!("#field does not have any fields to flatten to")

}

});

let get_record_fields =

get_field_get_record_fields_expr(&field, field_attrs.with)?;

let field_name = field

.ident

.as_ref()

.map(|i| i.to_string())

.unwrap_or_else(|| "unnamed".to_string());

record_field_exprs.push(quote! {

if let Some(flattened_fields) = `#get_record_fields` {

schema_fields.extend(flattened_fields)

} else {

panic!("{} does not have any fields to flatten to", `#field_name`)

}

});

🤖 Prompt for AI Agents

In `@avro_derive/src/lib.rs` around lines 167 - 175, The panic message inside the quoted block uses the literal "#field" instead of the actual field identifier; update the code that pushes into record_field_exprs (the quote! block that references get_record_fields and the variable get_record_fields from get_field_get_record_fields_expr) so the panic prints the real field name by interpolating the field identifier (e.g., via stringify!(`#field`) or concat!(stringify!(`#field`), "...")) instead of the raw "#field" string; ensure the change is applied where record_field_exprs.push(...) is constructed so failures show which field failed to flatten.

value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! The quote!() macro does not interpolate inside strings, so a normal string interpolation will have to be used instead. The stringify!() macro has to be used to get a string view of the TokenTree. If it is too verbose then only the #field's Ident could be used.

claude · 2026-01-30T04:24:48Z

PR Review: Fix `flatten` no longer causes duplicate names

Summary

This PR introduces a new get_record_fields_in_ctxt method to the AvroSchemaComponent trait to properly handle flattened fields without polluting the named_schemas registry or causing duplicate name errors. The implementation is well-thought-out and addresses issue apache#448.

Code Quality & Architecture

Strengths:

Clean separation of concerns with the new get_record_fields_in_ctxt helper function
Good use of macros to reduce code duplication
Comprehensive test coverage for edge cases
Proper documentation with clear explanations

Concerns:

Complex Logic in get_record_fields_in_ctxt (avro/src/serde/derive.rs:106-214)
- Handles three distinct cases with complex state management
- Consider adding inline comments explaining state transitions around lines 187-210
Panic in Production Code (avro/src/serde/derive.rs:118, 203)
- Panics assume invariants that might not hold if schema is constructed manually
- Consider Result returns instead
Error Message Quality (avro_derive/src/lib.rs:173)
- The error message uses #field which will literally print "#field"
- Should interpolate the actual field name

Potential Bugs

Missing Field Position Updates (avro_derive/src/lib.rs:164-175)
- When flattening, schema_fields.extend(flattened_fields) doesn't update position field
- Critical: This could cause incorrect field positions in flattened schemas
Schema Mutation in find_first_ref
- The function modifies the schema tree while searching
- Add a test for calling get_record_fields_in_ctxt multiple times (idempotence)

Test Coverage

Excellent coverage for: recursion, transparent wrappers, custom with handlers

Missing:

Deeply nested flattening
Field position verification for flattened schemas
Error cases when flattening non-record types
Idempotence tests

Recommendations

Critical:

Fix field position issue in flattened records
Fix error message placeholder #field

Important:

Add field position test for flattened schemas
Add inline comments explaining state management
Add idempotence test

Overall Assessment

This is a solid PR that solves a real problem. The architecture is sound and test coverage is good. However, the field position issue needs to be addressed before merging.

Verdict: Approve with changes requested (primarily the field position issue)

augmentcode · 2026-01-30T04:28:27Z

🤖 Augment PR Summary

Summary: This PR fixes AvroSchema derive behavior where #[serde(flatten)] could lead to duplicate named schema definitions.

Changes:

Extends AvroSchemaComponent with get_record_fields_in_ctxt for extracting record fields without building/retaining the full record schema.
Adds a (doc-hidden) helper apache_avro::serde::get_record_fields_in_ctxt to retrieve record fields while avoiding Names pollution and handling recursive records.
Refactors several schema impls (passthrough wrappers, arrays, maps, option, logical types) to implement the new method.
Updates the derive macro to generate both get_schema_in_ctxt and get_record_fields_in_ctxt for structs/enums.
Changes flatten expansion to use record-field extraction instead of calling get_schema() and pattern-matching the resulting schema.
Re-exports the helper from apache_avro::serde (hidden) for proc-macro usage with #[avro(with = ...)] expressions.
Adds new tests covering record-field extraction with/without recursion and several flatten/transparent scenarios.

Technical Notes: The new helper tries to preserve Names state while still ensuring recursive schemas introduce a concrete record definition before subsequent self-references.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 3 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-01-30T04:28:28Z

avro_derive/src/lib.rs

-                                schema_fields.push(field)
-                            }
+                        if let Some(flattened_fields) = #get_record_fields {
+                            schema_fields.extend(flattened_fields)


When flattening, schema_fields.extend(flattened_fields) preserves the inner fields’ existing RecordField.position values, which can become inconsistent with their new index in the parent schema_fields and break the lookup map built from position.

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The positions of the flattened fields are calculated for their inner RecordSchema but once they are moved to the parent schema and appended to its fields the positions of the flattened field need to be re-calculated, i.e. increased with the number of the parent's fields.

augmentcode · 2026-01-30T04:28:28Z

avro_derive/src/lib.rs

+    match with {
+        With::Trait => Ok(type_to_get_record_fields_expr(&field.ty)?),
+        With::Serde(path) => {
+            Ok(quote! { #path::get_record_fields_in_ctxt(named_schemas, enclosing_namespace) })


With::Serde here generates #path::get_record_fields_in_ctxt(...), but existing apache_avro::serde::* “with” modules appear to only provide get_schema_in_ctxt; this can cause derive-time compile failures for transparent/flatten cases that use #[avro(with)] + #[serde(with = "...")].

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The new code assumes that provided module for with has a function named get_record_fields_in_ctxt()and tries to use it but even the default implementations of With::Serde in avro/src/serde/with.rs do not provide this new function. Prevents compilation error when trying to use #[serde(flatten, with = ...)]

augmentcode · 2026-01-30T04:28:28Z

avro/src/serde/derive.rs

 }

-impl<T> AvroSchemaComponent for Vec<T>
+impl<T> AvroSchemaComponent for HashMap<String, T>


This change drops the AvroSchemaComponent impl for serde_json::Map<String, T> (only HashMap<String, T> remains); if downstream users relied on serde_json::Map schema generation, this may be an unintended regression.

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The old implementation of AvroSchemaComponent for serde_json::Map<String, T> is lost in the refactoring. Prevents broken builds for external users of this implementation

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-01-30T04:32:21Z

avro_derive/src/lib.rs

-                                schema_fields.push(field)
-                            }
+                        if let Some(flattened_fields) = #get_record_fields {
+                            schema_fields.extend(flattened_fields)


Flattened fields retain original positions instead of updated ones

High Severity

The new flatten handling uses schema_fields.extend(flattened_fields) without updating each field's position value. The old code had field.position = schema_fields.len() before pushing each field. When a regular field appears before a flatten field, the flattened fields will have incorrect positions (their original positions from the nested schema rather than their actual indices in the combined schema). This causes the lookup table to have wrong mappings, since it's built using field.position values.

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The positions of the flattened fields are calculated for their inner RecordSchema but once they are moved to the parent schema and appended to its fields the positions of the flattened field need to be re-calculated, i.e. increased with the number of the parent's fields.

cursor · 2026-01-30T04:32:21Z

avro_derive/src/lib.rs

+                            schema_fields.extend(flattened_fields)
                        } else {
-                            panic!("Can only flatten RecordSchema, got {:?}", #flatten_ty::get_schema())
+                            panic!("#field does not have any fields to flatten to")


Panic message shows literal text instead of diagnostic info

Low Severity

The panic message "#field does not have any fields to flatten to" is misleading because #field inside a string literal in quote! is not interpolated—it appears as literal text in the generated code. The old code used panic!("Can only flatten RecordSchema, got {:?}", #flatten_ty::get_schema()) which showed the actual schema type at runtime, providing useful diagnostic information for debugging. The regression makes it harder to understand what went wrong when flattening fails.

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The quote!() macro does not interpolate inside strings, so a normal string interpolation will have to be used instead. The stringify!() macro has to be used to get a string view of the TokenTree. If it is too verbose then only the #field's Ident could be used.

martin-augment · 2026-01-30T06:48:31Z

Error Message Quality (avro_derive/src/lib.rs:173)

The error message uses #field which will literally print "#field"

Should interpolate the actual field name

value:useful; category:bug; feedback:The Claude AI reviewer is correct! The quote!() macro does not interpolate inside strings, so a normal string interpolation will have to be used instead. The stringify!() macro has to be used to get a string view of the TokenTree. If it is too verbose then only the #field's Ident could be used.

martin-augment · 2026-01-30T06:54:07Z

Missing Field Position Updates (avro_derive/src/lib.rs:164-175)

When flattening, schema_fields.extend(flattened_fields) doesn't update position field

Critical: This could cause incorrect field positions in flattened schemas

value:useful; category:bug; feedback:The Claude AI reviewer is correct! The positions of the flattened fields are calculated for their inner RecordSchema but once they are moved to the parent schema and appended to its fields the positions of the flattened field need to be re-calculated, i.e. increased with the number of the parent's fields.

Kriskras99 and others added 3 commits January 28, 2026 22:59

fix: Add more tests

f68a652

gemini-code-assist bot reviewed Jan 30, 2026

View reviewed changes

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

augmentcode bot reviewed Jan 30, 2026

View reviewed changes

cursor bot reviewed Jan 30, 2026

View reviewed changes

	panic!("#field does not have any fields to flatten to")
	panic!("Field {} cannot be flattened because it is not a record type.", stringify!(#field))

Conversation

martin-augment commented Jan 30, 2026

Uh oh!

coderabbitai bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

gemini-code-assist bot commented Jan 30, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Jan 30, 2026

PR Review: Fix flatten no longer causes duplicate names

Summary

Code Quality & Architecture

Potential Bugs

Test Coverage

Recommendations

Overall Assessment

Uh oh!

augmentcode bot commented Jan 30, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 30, 2026

Choose a reason for hiding this comment

Flattened fields retain original positions instead of updated ones

Uh oh!

martin-augment Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 30, 2026

Choose a reason for hiding this comment

Panic message shows literal text instead of diagnostic info

Uh oh!

martin-augment Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Jan 30, 2026 •

edited

Loading

PR Review: Fix `flatten` no longer causes duplicate names