Skip to content

512: feat!: Support enums and tuples in SchemaAwareSerializer and implement SchemaAwareDeserializer#63

Closed
martin-augment wants to merge 6 commits intomainfrom
pr-512-2026-03-20-10-52-29
Closed

512: feat!: Support enums and tuples in SchemaAwareSerializer and implement SchemaAwareDeserializer#63
martin-augment wants to merge 6 commits intomainfrom
pr-512-2026-03-20-10-52-29

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

512: To review by AI

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 20, 2026

Walkthrough

This pull request introduces schema-aware Serde serialization and deserialization support throughout the apache-avro crate. New modules implement custom Serde deserializers (SchemaAwareDeserializer) and serializers (SchemaAwareSerializer) for Avro schema types including arrays, maps, records, tuples, enums, and unions. Reader and writer APIs are extended with human_readable and target_block_size configuration options, new read_deser and read_next_deser methods for direct deserialization, and builder-based constructors. Supporting changes include error type refinements, schema utility methods, Serde derive improvements for arrays and tuples, documentation modules describing Serde-Avro mappings, and test updates using the new schema-aware APIs.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pr-512-2026-03-20-10-52-29
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Schema::Fixed(fixed) if fixed.size == 16 && fixed.name.name() == "u128" => {
visitor.visit_u128(u128::from_le_bytes(self.read_array()?))
}
Schema::Union(union) => self.with_union(union)?.deserialize_i128(visitor),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deserialize_u128 union path incorrectly calls deserialize_i128

High Severity

In deserialize_u128, the Schema::Union branch calls self.with_union(union)?.deserialize_i128(visitor) instead of deserialize_u128(visitor). This is a copy-paste bug that causes u128 values inside unions to be deserialized as i128, leading to incorrect data interpretation or runtime errors.

Fix in Cursor Fix in Web

self.writer,
self.schema,
self.config,
)?)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BufferedBlockSerializer writes to writer instead of buffer

High Severity

BufferedBlockSerializer::serialize_element and serialize_key write to self.writer (the final output) instead of self.buffer. This means data bypasses buffering entirely, self.buffer.len() is always 0 so block splitting never triggers, and end() writes corrupt block headers (with item count > 0 but 0 bytes of data) on top of already-written raw data.

Additional Locations (1)
Fix in Cursor Fix in Web


fn size_hint(&self) -> Option<usize> {
todo!()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ManyTupleDeserializer::size_hint panics with todo!()

Medium Severity

ManyTupleDeserializer::size_hint uses todo!() which will panic at runtime if called. The implementation is straightforward: Some(self.schema.fields.len() - self.current_field), matching the pattern used in RecordDeserializer::size_hint.

Fix in Cursor Fix in Web


fn read_next_deser<T: DeserializeOwned>(&mut self) -> AvroResult<Option<T>> {
self.block.read_next_deser(self.reader_schema)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read_next_deser unconditionally passes reader schema causing panic

Medium Severity

read_next_deser always passes self.reader_schema to block.read_next_deser, unlike read_next which conditionally passes it based on should_resolve_schema. If a reader schema is provided (even one matching the writer schema), this hits a todo!() panic in the block layer.

Fix in Cursor Fix in Web

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the apache-avro crate's integration with Rust's serde framework by introducing a new schema-aware deserializer. It expands support for complex data types like enums, tuples, and fixed-size arrays, ensuring more robust and idiomatic serialization and deserialization. The changes aim to provide a more seamless experience for developers working with Avro data in Rust, making the library more powerful and user-friendly for diverse data structures.

Highlights

  • Schema-Aware Deserializer Implementation: Introduced a new SchemaAwareDeserializer to enable direct deserialization of Avro data into Rust types, significantly improving the serde integration for reading Avro records.
  • Enhanced Serde Integration for Complex Types: Added comprehensive support for serializing and deserializing Rust enums and tuples ((T1, T2, ...)) directly to and from Avro schemas. This includes handling various enum tagging strategies (external, internal, adjacent, untagged) and mapping tuples to Avro records.
  • Array Type Support in AvroSchema Derive: Implemented AvroSchemaComponent for fixed-size arrays ([T; N]), allowing them to be correctly represented in Avro schemas. This also includes new serde::with modules for explicit array serialization/deserialization.
  • Refactored Reader and Writer APIs: Updated GenericSingleObjectReader, Reader, and GenericDatumReader to leverage the new SchemaAwareDeserializer, and GenericDatumWriter and SpecificSingleObjectWriter to use the SchemaAwareSerializer, streamlining the data flow and improving consistency.
  • Improved Error Handling and Documentation: Refined error messages for serialization and deserialization failures, providing more context and clarity. New documentation files (avro_data_model_to_serde.rs, serde_data_model_to_avro.rs) were added to detail the mapping between Avro and Serde data models.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link
Copy Markdown

claude bot commented Mar 20, 2026

Code Review: PR #63 — Enum/Tuple Support in SchemaAwareSerializer + SchemaAwareDeserializer

This is a large and well-structured PR introducing significant new functionality. Below is my detailed review.


Critical Issues (Must Fix)

1. todo!() macros in production code paths

avro/src/reader/block.rs and avro/src/reader/datum.rs

// reader/datum.rs ~line 152
if let Some((_, _)) = &self.reader {
    todo!("Schema aware deserialisation does not resolve schemas yet");
}
// reader/block.rs ~line 219
let item = if read_schema.is_some() {
    todo!("Schema aware deserialisation does not resolve schemas yet");

These todo!() calls will panic at runtime if a reader schema is configured. Since the public API exposes builder patterns that accept a reader schema, users have no indication this path will panic.

Fix: Either return a proper Err(...) with a descriptive message, or add # Panics sections to all relevant public API docs until the feature is implemented.


2. Unimplemented size_hint() in ManyTupleDeserializer

avro/src/serde/deser_schema/tuple.rs ~line 124

fn size_hint(&self) -> Option<usize> {
    todo!()
}

Serde visitors may call size_hint() — this will panic. Should be:

fn size_hint(&self) -> Option<usize> {
    Some(self.schema.fields.len().saturating_sub(self.current_field))
}

Medium Issues (Should Fix)

3. Breaking change in error type structure

avro/src/error.rs ~lines 559–578

SerializeRecordFieldWithSchema changed:

  • record_schema: Schemarecord_schema: RecordSchema
  • error: Box<Error>error: String

The error message format also changed from "Failed to serialize field 'foo' for record" to "Failed to serialize field 'foo' of record". This is a semver breaking change if users pattern-match on these error types.

Please ensure this is documented in the CHANGELOG and that the PR description notes this is breaking.


4. Missing enum variant ordering warning in documentation

avro/src/documentation/serde_data_model_to_avro.rs

The doc says:

"A enum with the same name and unit variants for every symbol"

But there's no explicit warning that variant ordering must match the Avro symbol index. Reordering variants will silently deserialize to wrong values — a very hard-to-debug class of error.

Please add a prominent warning, e.g.:

⚠️ Order matters: The index of each Rust enum variant must match the index of the corresponding Avro enum symbol. Reordering variants will cause incorrect deserialization without any error.


5. Generic ReadBlock error lacks context

avro/src/reader/block.rs ~line 238–241

if b_original != 0 && b_original == block_bytes.len() {
    return Err(Details::ReadBlock.into());
}

This gives no information about expected vs. actual bytes, which field failed, or whether it was a partial vs. complete read failure. Consider a richer error variant or at least a more descriptive message for debuggability.


Minor Issues (Nice to Have)

6. self-referential documentation link

avro/src/documentation/serde_data_model_to_avro.rs line 23

//! see [the documentation for the reverse](super::serde_data_model_to_avro).

This link in serde_data_model_to_avro.rs points to itself. It should link to super::avro_data_model_to_serde.


7. unwrap_or(usize::MAX) pattern is reasonable but uncommented

avro/src/serde/deser_schema/array.rs ~line 98–100

fn size_hint(&self) -> Option<usize> {
    self.remaining
        .map(|x| usize::try_from(x).unwrap_or(usize::MAX))
}

This is a correct and safe pattern (Serde treats usize::MAX as "unknown capacity"), but a brief comment would help future readers understand the intent.


Positive Highlights

  • Builder pattern adoption across GenericSingleObjectReader, GenericDatumWriter, and GenericDatumReader is consistent and well-designed with sensible defaults.
  • Test coverage for the new deserializer is extensive — covers unit/newtype/tuple/struct enum variants, logical types, nested structures, and error cases. Excellent work here.
  • Documentation modules (avro_data_model_to_serde.rs, serde_data_model_to_avro.rs) are comprehensive and will significantly help users avoid common mapping mistakes.
  • The two-strategy block serialization (direct write vs. buffered) is a clean performance optimization.

Summary

Severity Issue
🔴 Critical todo!() panics in reader schema code path
🔴 Critical Unimplemented size_hint() in ManyTupleDeserializer
🟡 Medium Breaking change in error type not noted
🟡 Medium Missing enum variant ordering warning in docs
🟡 Medium Generic ReadBlock error lacks context
🟢 Minor Self-referential doc link in serde_data_model_to_avro.rs
🟢 Minor Missing comment on unwrap_or(usize::MAX) pattern

The core functionality looks solid, but the two critical issues need to be resolved before this is safe to merge. Happy to re-review once those are addressed.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Mar 20, 2026

🤖 Augment PR Summary

Summary: This PR introduces schema-aware Serde decoding for Avro binary data and expands schema-aware Serde encoding to cover more Serde data model shapes.

Changes:

  • Adds SchemaAwareDeserializer (plus array/map/record/enum/tuple helpers) for direct Deserialize from Avro datum bytes.
  • Extends reader APIs with Serde-first entry points (GenericDatumReader::read_deser, Reader::into_deser_iter, and GenericSingleObjectReader::read_deser).
  • Expands schema derivation to support tuples and fixed-size arrays, and revises some derived mappings (e.g., std::time::Duration).
  • Adds new serde::with helpers for arrays and BigDecimal (and optional variants) and updates tests to use the new Serde-aware reader/writer APIs.
  • Introduces new union/record helper methods and refines error variants/messages for schema-aware (de)serialization paths.

Technical Notes: Reader/writer paths now thread human_readable and block-size configuration into schema-aware Serde (de)serialization to better control Avro binary representation.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 6 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

where
T: ?Sized + Serialize,
{
value.serialize(SchemaAwareSerializer::new(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BufferedBlockSerializer currently serializes elements into self.writer, but write_block() later emits block headers based on self.buffer (which never receives those bytes), so buffered array/map encoding will be invalid.

Severity: high

Other Locations
  • avro/src/serde/ser_schema/block.rs:343

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.


fn serialize_u64(mut self, v: u64) -> Result<Self::Ok, Self::Error> {
match self.union.find_named_schema("u64", self.config.names)? {
Some((index, Schema::Fixed(FixedSchema { size: 16, .. }))) => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serialize_u64 matches FixedSchema { size: 16 } but writes u64::to_le_bytes() (8 bytes); this will either never match the intended u64 schema (size 8) or produce an invalid fixed encoding.

Severity: high

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Schema::Fixed(fixed) if fixed.size == 16 && fixed.name.name() == "u128" => {
visitor.visit_u128(u128::from_le_bytes(self.read_array()?))
}
Schema::Union(union) => self.with_union(union)?.deserialize_i128(visitor),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In deserialize_u128, the union branch calls deserialize_i128, so a u128 inside a union will be deserialized via the signed path instead of u128.

Severity: high

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

}

fn size_hint(&self) -> Option<usize> {
todo!()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ManyTupleDeserializer::size_hint() is todo!(), which can panic during normal tuple deserialization if Serde queries the size hint (e.g., for preallocation).

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

//!
//! When manually mapping Rust types to an Avro schema, or the reverse, it is important to understand
//! how the different data models are mapped. When mapping from an Avro schema to Rust types,
//! see [the documentation for the reverse](super::serde_data_model_to_avro).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This “reverse” documentation link points to super::serde_data_model_to_avro (the same module), so it won’t take readers to the Avro→Serde mapping doc.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

GetUnionVariant { index: i64, num_variants: usize },

#[deprecated(since = "0.20.0", note = "This error variant is not generated anymore")]
#[error("Enum symbol index out of bounds: {num_variants}")]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnumSymbolIndex’s display string only includes {num_variants}, not the offending {index}, which can make the error misleading when the index differs from the variant count.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Avro (de)serialization logic to directly use Serde's Serialize and Deserialize traits with schema awareness, replacing the previous from_value and to_value approaches. Key changes include introducing SchemaAwareDeserializer and SchemaAwareSerializer for direct schema-guided (de)serialization, enhancing GenericDatumReader and GenericSingleObjectReader with builder patterns and read_deser methods, and adding new documentation for Avro-Serde data model mapping. Error handling for serialization/deserialization is improved with more specific error types. Additionally, schema derivation for Rust arrays ([T; N]) and tuples ((T1, T2, ...)) is updated to generate Schema::Record types, and core::time::Duration now maps to a Schema::Record. A high-severity review comment indicates a potential regression in avro_derive/tests/serde.rs, where a test now passes despite a mismatch between struct fields and the schema (missing field c without a default, and extra field b), suggesting invalid records might be serialized without error.

Comment on lines +301 to 305
serde_assert(Foo {
a: "spam".to_string(),
b: 321,
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This test was changed from expecting an error to expecting success. However, it seems serialization should still fail. The Foo struct has a field b, which is not present in the schema. The schema has a field c, which is not in the struct and has no default value.

The new RecordSerializer should error when trying to serialize field b as it's not in the schema, or when it finishes and notices field c is missing without a default. This change might be hiding a regression where invalid records are serialized without error.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

🧹 Nitpick comments (7)
avro/tests/serde_human_readable_false.rs (1)

80-83: Make this error assertion less brittle.

Line 82 compares the full error string; this can fail on harmless formatting/message updates. Prefer asserting key fragments.

Suggested test hardening
-    assert_eq!(
-        writer.write(uuid, &mut buffer).unwrap_err().to_string(),
-        "Failed to serialize value of type `bytes` using Schema::Uuid(String): Expected Schema::Bytes | Schema::Fixed | Schema::BigDecimal | Schema::Decimal | Schema::Uuid(Bytes | Fixed) | Schema::Duration"
-    );
+    let err = writer.write(uuid, &mut buffer).unwrap_err().to_string();
+    assert!(err.contains("value of type `bytes`"));
+    assert!(err.contains("using Schema::Uuid(String)"));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/tests/serde_human_readable_false.rs` around lines 80 - 83, The assertion
in avro/tests/serde_human_readable_false.rs is brittle because it compares the
full error string from writer.write(uuid, &mut buffer); update the test to
assert on key fragments instead of exact equality: call
writer.write(...).unwrap_err().to_string() and assert that it contains a short,
stable set of substrings such as "Failed to serialize", "Schema::Uuid(String)"
and one or two expected schema type names like "Schema::Bytes" (or "Expected
Schema::Bytes") so the test verifies the error semantics without depending on
exact formatting or full message text.
avro/tests/serde_human_readable_true.rs (1)

104-107: Avoid exact-match assertions for verbose schema errors.

Lines 106 and 132 assert full error text (including debug-structured schema details). This is brittle for message-shape changes; assert stable substrings instead.

Suggested test hardening
-    assert_eq!(
-        writer.write(uuid, &mut buffer).unwrap_err().to_string(),
-        r#"Failed to serialize value of type `str` using Schema::Uuid(Bytes): Expected Schema::String | Schema::Uuid(String)"#
-    );
+    let err = writer.write(uuid, &mut buffer).unwrap_err().to_string();
+    assert!(err.contains("value of type `str`"));
+    assert!(err.contains("Schema::Uuid(Bytes)"));
+    assert!(err.contains("Schema::Uuid(String)"));
...
-    assert_eq!(
-        writer.write(uuid, &mut buffer).unwrap_err().to_string(),
-        r#"Failed to serialize value of type `str` using Schema::Uuid(Fixed(FixedSchema { name: Name { name: "uuid", .. }, size: 16, .. })): Expected Schema::String | Schema::Uuid(String)"#
-    );
+    let err = writer.write(uuid, &mut buffer).unwrap_err().to_string();
+    assert!(err.contains("value of type `str`"));
+    assert!(err.contains("Schema::Uuid(Fixed("));
+    assert!(err.contains("Schema::Uuid(String)"));

Also applies to: 130-133

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/tests/serde_human_readable_true.rs` around lines 104 - 107, The
assertions that compare the full error string from writer.write(...) are
brittle; replace the exact assert_eq checks with assertions that the error
string contains stable substrings (e.g., check that the returned error includes
"Failed to serialize value" and the expected schema type like "Expected
Schema::String" or "Expected Schema::Uuid") instead of matching the entire
debug-structured schema text — update the failing checks around the
writer.write(uuid, &mut buffer).unwrap_err().to_string() and the similar
assertion later to use contains-based assertions on those stable substrings.
avro/src/serde/deser_schema/record.rs (1)

69-80: Consider adding a comment explaining the union-of-records special case.

The conditional logic checking for org.apache.avro.rust.union_of_records attribute implements a specific deserialization pattern that may not be immediately obvious to readers. A brief comment explaining when this pattern applies and why it's needed would improve maintainability.

📖 Suggested documentation
         let schema = &self.schema.fields[self.current_field].schema;
+        // Special handling for union-of-records pattern: when the field schema is a
+        // single-field record marked with the "org.apache.avro.rust.union_of_records"
+        // attribute, deserialize using the inner field's schema instead of the wrapper.
         let value = if let Schema::Record(record) = schema
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/deser_schema/record.rs` around lines 69 - 80, Add a concise
inline comment above the conditional that matches Schema::Record and the
org.apache.avro.rust.union_of_records attribute explaining this special-case:
when a Record schema has exactly one field and the attribute is true, we treat
the record as a transparent union-of-records and deserialize directly using
SchemaAwareDeserializer for record.fields[0].schema; mention why (to support
union-of-records encoding/compatibility) and reference the attribute and
Schema::Record so future readers know the intent around the
SchemaAwareDeserializer usage.
avro/src/serde/with.rs (2)

656-726: Fix lifetime parameter in doc example.

The doc example declares a lifetime 'a on StructWithBytes<'a> but doesn't use any borrowed types. This could confuse users.

📝 Suggested doc fix
 /// See usage with below example:
 /// ```
 /// # use apache_avro::AvroSchema;
 /// # use serde::{Deserialize, Serialize};
 /// #[derive(AvroSchema, Serialize, Deserialize)]
-/// struct StructWithBytes<'a> {
+/// struct StructWithArray {
 ///     #[avro(with = apache_avro::serde::array::get_schema_in_ctxt::<i32>)]
 ///     #[serde(with = "apache_avro::serde::array")]
 ///     array: [i32; 10],
 /// }
 /// ```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/with.rs` around lines 656 - 726, Doc example declares an
unused lifetime on StructWithBytes; remove the unused lifetime and/or rename the
struct to avoid confusion. Update the example in module array (the doctest
around StructWithBytes and the get_schema_in_ctxt::<i32> / serde(with =
"apache_avro::serde::array") snippet) by changing "struct StructWithBytes<'a>"
to a lifetime-free name such as "struct StructWithArray" (or remove the "'a" if
you keep the name), so the example matches the field types and does not
advertise an unused lifetime.

728-815: Same doc example issue with unused lifetime.

The array_opt module has the same unnecessary lifetime parameter in its doc example.

📝 Suggested doc fix
 /// See usage with below example:
 /// ```
 /// # use apache_avro::AvroSchema;
 /// # use serde::{Deserialize, Serialize};
 /// #[derive(AvroSchema, Serialize, Deserialize)]
-/// struct StructWithBytes<'a> {
+/// struct StructWithOptionalArray {
 ///     #[avro(with = apache_avro::serde::array_opt::get_schema_in_ctxt::<i32>)]
 ///     #[serde(with = "apache_avro::serde::array_opt")]
 ///     array: Option<[i32; 10]>,
 /// }
 /// ```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/with.rs` around lines 728 - 815, The doc example in module
array_opt includes an unused lifetime ('a) on the example struct; update the
example to remove the lifetime (rename StructWithBytes<'a> to
StructWithOptionalArray or similar without lifetimes) so the example compiles
and matches the optional array usage referenced by get_schema_in_ctxt and
serde(with = "apache_avro::serde::array_opt"); keep the field signature array:
Option<[i32; 10]> and the two attributes as-is.
avro/src/serde/deser_schema/mod.rs (1)

728-734: Consider implementing seek optimization for deserialize_ignored_any.

The TODO suggests adding a Seek bound to skip over ignored data efficiently. This would be beneficial for performance when deserializing large ignored fields. Consider tracking this as a follow-up improvement.

Would you like me to open an issue to track implementing the Seek optimization for deserialize_ignored_any?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/deser_schema/mod.rs` around lines 728 - 734, The current
deserialize_ignored_any implementation simply delegates to deserialize_any and
has a TODO to add a Seek bound to skip ignored data; update the function
signature and call sites to accept a reader implementing Seek (or add a generic
bound where DeserializeSeed/Deserializer is parameterized) and implement logic
in deserialize_ignored_any to efficiently advance the underlying reader over the
ignored bytes instead of fully deserializing them; locate the function
deserialize_ignored_any in mod.rs and change its trait bounds to require Seek
(or use a helper that takes &mut (impl Read+Seek)), then use seek/stream
position arithmetic to skip the encoded field lengths/bytes according to Avro
encoding so ignored fields are fast-skipped.
avro/src/serde/ser_schema/tuple.rs (1)

155-158: Consider returning an error instead of asserting on field count mismatch.

Using assert_eq! will panic in release builds if the field count doesn't match. While this indicates a programming error, returning an error would provide better diagnostics in production and maintain the library's error-handling contract.

♻️ Proposed fix to return error instead of panic
     fn end(self) -> Result<Self::Ok, Self::Error> {
-        assert_eq!(self.field_position, self.schema.fields.len());
+        if self.field_position != self.schema.fields.len() {
+            return Err(Error::new(Details::SerializeValueWithSchema {
+                value_type: "tuple",
+                value: format!(
+                    "Expected {} fields but serialized {}",
+                    self.schema.fields.len(),
+                    self.field_position
+                ),
+                schema: Schema::Record(self.schema.clone()),
+            }));
+        }
         Ok(self.bytes_written)
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/tuple.rs` around lines 155 - 158, The end(self)
method currently uses assert_eq!(self.field_position, self.schema.fields.len())
which can panic; change it to validate the field count and return a descriptive
Err(Self::Error) when they differ instead of asserting. In tuple.rs, inside the
end method on the serializer (referencing end, self.field_position,
self.schema.fields.len(), and returning self.bytes_written), construct and
return an appropriate error value of type Self::Error (with a message indicating
expected vs actual field count) when the counts mismatch; otherwise return
Ok(self.bytes_written).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@avro/src/documentation/avro_data_model_to_serde.rs`:
- Line 24: Update the module-level documentation comment in
avro_data_model_to_serde.rs: remove the duplicated word so "Only the the
mapping..." becomes "Only the mapping..." and change the phrase "most match" to
"must match" where it describes required matching; both fixes are in the top
documentation comment for the avro_data_model_to_serde module.

In `@avro/src/documentation/serde_data_model_to_avro.rs`:
- Around line 20-23: The module doc comment contains a broken "reverse"
intra-doc link pointing to super::serde_data_model_to_avro; update that link to
point to super::avro_data_model_to_serde so the Avro→Serde mapping page is
referenced correctly (edit the top-of-file doc comment in
serde_data_model_to_avro.rs where the "reverse" link is defined).

In `@avro/src/schema/mod.rs`:
- Around line 789-847: Summary: unique_normalized_name currently collapses
different named fullnames by mapping all non-ASCII-alphanumerics to '_' which is
not injective; replace that with a reversible escape so different fullnames map
to distinct names. Fix: in unique_normalized_name (the match arm handling
Record/Enum/Fixed/Decimal/Uuid/Duration/Ref where name is converted), replace
the simple chars().map(...) logic with a small escaping routine that leaves
ASCII alphanumerics as-is but encodes every other byte/char into a reversible
token (e.g., percent-encoding or an _xHH_ hex escape); use the escaped string
for both the name and its length in format!("r{}_{}", ...). Update only that
conversion so other schema arms (Array/Map/Union/Decimal bytes/UUID cases)
remain unchanged.

In `@avro/src/schema/union.rs`:
- Around line 105-190: These helpers (find_named_schema, find_fixed_of_size_n,
find_record_with_n_fields) currently return the first match and thus silently
accept ambiguous union branches; change each to detect multiple distinct matches
and return an error instead of the first match: iterate as today but store the
first matched (index and resolved schema pointer); on any subsequent match where
the resolved schema (or index) differs from the stored one, return an
Err(Details::SchemaResolutionError(...) or a new
AmbiguousUnion/SchemaResolutionError variant) describing the ambiguity; if
iteration finishes with exactly one match return Ok(Some(...)) and if zero
matches return Ok(None). Keep the existing name-resolution logic (using
names.get(...).borrow()) when comparing resolved schemas.

In `@avro/src/serde/deser_schema/map.rs`:
- Around line 56-68: The comment inside read_block_header incorrectly mentions
an "array" — update the comment to refer to a "map" (e.g., change "If the block
size is zero the array is finished" to "If the block size is zero the map is
finished") so the documentation matches the map deserializer behavior in
read_block_header.

In `@avro/src/serde/deser_schema/mod.rs`:
- Around line 417-428: In deserialize_u128, the Union branch incorrectly
delegates to deserialize_i128; update the union handling in fn deserialize_u128
(the match arm Schema::Union(union) =>
self.with_union(union)?.deserialize_i128(visitor)) to call deserialize_u128 on
the returned deserializer instead so unsigned 128-bit values are handled
correctly (i.e., replace the deserialize_i128 call with deserialize_u128 while
keeping the with_union(union)? flow and visitor propagation).

In `@avro/src/serde/deser_schema/tuple.rs`:
- Around line 123-125: ManyTupleDeserializer currently panics because size_hint
uses todo!(); replace it with a safe implementation analogous to
OneTupleDeserializer's size_hint: compute and return the remaining number of
elements (or delegate to the inner iterator/vec's size_hint) as Some(remaining)
or propagate the inner iterator's Option<usize>, ensuring size_hint returns a
non-panicking Option<usize> based on ManyTupleDeserializer's internal
state/iterator rather than calling todo!().

In `@avro/src/serde/mod.rs`:
- Around line 30-31: Update the module-level documentation in
avro/src/serde/mod.rs to correct the typo "Sere" to "Serde" (the doc block that
references AvroSchema and the documentation links
`crate::documentation::avro_data_model_to_serde` and
`crate::documentation::serde_data_model_to_avro`); simply replace the misspelled
word so the sentence reads "between Serde and Avro".

In `@avro/src/serde/ser_schema/block.rs`:
- Around line 309-333: The serialize_element implementation of
BufferedBlockSerializer is writing directly to self.writer instead of the
in-memory buffer, so calls to SchemaAwareSerializer::new should be passed a
writer that writes into self.buffer (not self.writer), and similarly
SerializeMap::serialize_key must use the buffer writer; update serialize_element
and serialize_key to create SchemaAwareSerializer with a writer that writes into
self.buffer, keep incrementing items_in_buffer, and ensure buffer length checks
and write_block() work as intended so the block header and sizes are emitted
correctly.
- Around line 335-360: BufferedBlockSerializer's SerializeMap implementation
incorrectly writes map keys/values to self.writer instead of the internal
buffer, breaking buffering; update serialize_key to create SchemaAwareSerializer
with self.buffer (not self.writer) and the string schema (&Schema::String), and
ensure serialize_value/serialize_element also write to self.buffer (adjust any
helper calls to accept/use self.buffer) so all map entries are written into the
buffer before flush; keep the existing method names (serialize_key,
serialize_value, serialize_element, end) and preserve signatures while swapping
writer -> buffer usage.

In `@avro/src/serde/ser_schema/record/field_default.rs`:
- Around line 55-56: The code currently treats string-backed byte/defaults using
UTF-8 bytes (s.as_bytes(), s.len()) and only matches plain Bytes/Fixed in
recursive_type_check, which is wrong for Avro: convert string defaults by
iterating Unicode scalar values (code points) and map each code point to a
single byte only if 0..=255, returning an error if any code point >255; compute
lengths from the number of mapped bytes (code points), not UTF-8 length; for
Fixed defaults ensure the mapped byte count equals the fixed size. Also extend
recursive_type_check and any union-pattern matching to treat Schema::Bytes and
Schema::Fixed unions that contain logical types (Uuid, Decimal, BigDecimal,
Duration) the same as plain Bytes/Fixed so string-backed defaults for those
logical types are handled. Update the match arms that currently use
(Value::String(_), Schema::Bytes | Schema::String) and the functions
recursive_type_check / union handling to implement these changes and return
descriptive errors on out-of-range code points or wrong fixed sizes.
- Around line 165-170: The current branch (Value::Object(o),
Schema::Record(record)) pairs object iteration order with record.fields by
index, which can misalign schemas and panic; change it to iterate record.fields
in schema order and for each field use o.get(&field.name) (or
Value::Null/missing handling as appropriate) to construct the mapped pairs via
Self::new(value, &field.schema) before calling serializer.collect_map; ensure
any extra keys in o are ignored (or handled separately) so indexing past
record.fields cannot occur.
- Around line 57-72: The match guards currently accept only JSON numbers with a
floating internal representation for Schema::Float/Schema::Double, causing
integer defaults to be rejected; update the checks in field_default.rs (the
match arm handling Value::Number for Schema::Float | Schema::Double) and the
analogous checks inside recursive_type_check() so they accept integer JSON
numbers as well (e.g., allow numbers where n.is_i64() or n.is_u64() in addition
to n.is_f64()), and ensure you treat those integer defaults as valid
floats/doubles when performing the type validation/acceptance for
functions/match arms handling float/double schemas.

In `@avro/src/serde/ser_schema/record/mod.rs`:
- Around line 221-230: The expect message in the serialize_value method is
misspelled; update the panic string in the call to self.map_position.expect(...)
inside fn serialize_value to read "serialize_value called without calling
serialize_key" (referencing the serialize_value method, the map_position field,
and the serialize_next_field call) so the error message is spelled correctly.
- Around line 99-147: The error created when serializing a cached/out-of-order
field in serialize_next_field uses self.field_position instead of the actual
field index (position), causing wrong error context; update the call in the
Ordering::Less branch to pass position to field_error (i.e., use position rather
than self.field_position) so errors for cached fields reference the correct
field, and verify this applies to both the map insertion error path and the
propagate .map_err(...) call; check symbols serialize_next_field,
field_position, position, cache, and field_error to locate the change.

In `@avro/src/serde/ser_schema/union.rs`:
- Around line 190-202: In serialize_u64, the pattern match incorrectly expects
FixedSchema { size: 16 } while the code writes v.to_le_bytes() (8 bytes) and the
error message says size: 8; update the match to expect FixedSchema { size: 8 }
(in the match on union.find_named_schema("u64", ...)), ensuring Schema::Fixed /
FixedSchema size matches the 8-byte u64 representation so the success branch and
error message are consistent.

In `@avro/src/writer/datum.rs`:
- Around line 73-77: Fix the documentation typo for the field human_readable in
datum.rs: change "represenation" to "representation" in the doc comment above
the human_readable field (the same typo also appears in writer/mod.rs—update
there too) so the docs read "Should [`Serialize`] implementations pick a human
readable representation." and keep the existing `#[builder(default =
is_human_readable())]` and field name unchanged.

In `@avro/src/writer/mod.rs`:
- Around line 73-77: Fix the typo in the doc comment for the human_readable
field: change "represenation" to "representation" in the comment above the
human_readable field (the comment referencing Serialize and the builder default
is_human_readable()). Keep the rest of the comment text unchanged.

In `@avro/src/writer/single_object.rs`:
- Around line 107-111: Fix the typo in the doc comment for the struct field
`human_readable` in single_object.rs: change "represenation" to "representation"
in the triple-slash doc comment above the `human_readable: bool` field (the
comment starting "Should [`Serialize`] implementations pick a human readable
..."). Ensure the corrected spelling is applied consistently like in the other
files.

---

Nitpick comments:
In `@avro/src/serde/deser_schema/mod.rs`:
- Around line 728-734: The current deserialize_ignored_any implementation simply
delegates to deserialize_any and has a TODO to add a Seek bound to skip ignored
data; update the function signature and call sites to accept a reader
implementing Seek (or add a generic bound where DeserializeSeed/Deserializer is
parameterized) and implement logic in deserialize_ignored_any to efficiently
advance the underlying reader over the ignored bytes instead of fully
deserializing them; locate the function deserialize_ignored_any in mod.rs and
change its trait bounds to require Seek (or use a helper that takes &mut (impl
Read+Seek)), then use seek/stream position arithmetic to skip the encoded field
lengths/bytes according to Avro encoding so ignored fields are fast-skipped.

In `@avro/src/serde/deser_schema/record.rs`:
- Around line 69-80: Add a concise inline comment above the conditional that
matches Schema::Record and the org.apache.avro.rust.union_of_records attribute
explaining this special-case: when a Record schema has exactly one field and the
attribute is true, we treat the record as a transparent union-of-records and
deserialize directly using SchemaAwareDeserializer for record.fields[0].schema;
mention why (to support union-of-records encoding/compatibility) and reference
the attribute and Schema::Record so future readers know the intent around the
SchemaAwareDeserializer usage.

In `@avro/src/serde/ser_schema/tuple.rs`:
- Around line 155-158: The end(self) method currently uses
assert_eq!(self.field_position, self.schema.fields.len()) which can panic;
change it to validate the field count and return a descriptive Err(Self::Error)
when they differ instead of asserting. In tuple.rs, inside the end method on the
serializer (referencing end, self.field_position, self.schema.fields.len(), and
returning self.bytes_written), construct and return an appropriate error value
of type Self::Error (with a message indicating expected vs actual field count)
when the counts mismatch; otherwise return Ok(self.bytes_written).

In `@avro/src/serde/with.rs`:
- Around line 656-726: Doc example declares an unused lifetime on
StructWithBytes; remove the unused lifetime and/or rename the struct to avoid
confusion. Update the example in module array (the doctest around
StructWithBytes and the get_schema_in_ctxt::<i32> / serde(with =
"apache_avro::serde::array") snippet) by changing "struct StructWithBytes<'a>"
to a lifetime-free name such as "struct StructWithArray" (or remove the "'a" if
you keep the name), so the example matches the field types and does not
advertise an unused lifetime.
- Around line 728-815: The doc example in module array_opt includes an unused
lifetime ('a) on the example struct; update the example to remove the lifetime
(rename StructWithBytes<'a> to StructWithOptionalArray or similar without
lifetimes) so the example compiles and matches the optional array usage
referenced by get_schema_in_ctxt and serde(with =
"apache_avro::serde::array_opt"); keep the field signature array: Option<[i32;
10]> and the two attributes as-is.

In `@avro/tests/serde_human_readable_false.rs`:
- Around line 80-83: The assertion in avro/tests/serde_human_readable_false.rs
is brittle because it compares the full error string from writer.write(uuid,
&mut buffer); update the test to assert on key fragments instead of exact
equality: call writer.write(...).unwrap_err().to_string() and assert that it
contains a short, stable set of substrings such as "Failed to serialize",
"Schema::Uuid(String)" and one or two expected schema type names like
"Schema::Bytes" (or "Expected Schema::Bytes") so the test verifies the error
semantics without depending on exact formatting or full message text.

In `@avro/tests/serde_human_readable_true.rs`:
- Around line 104-107: The assertions that compare the full error string from
writer.write(...) are brittle; replace the exact assert_eq checks with
assertions that the error string contains stable substrings (e.g., check that
the returned error includes "Failed to serialize value" and the expected schema
type like "Expected Schema::String" or "Expected Schema::Uuid") instead of
matching the entire debug-structured schema text — update the failing checks
around the writer.write(uuid, &mut buffer).unwrap_err().to_string() and the
similar assertion later to use contains-based assertions on those stable
substrings.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 024abd1f-4810-4acb-bcba-a0e966af6c9a

📥 Commits

Reviewing files that changed from the base of the PR and between 453c251 and 757c25e.

📒 Files selected for processing (41)
  • avro/examples/test_interop_single_object_encoding.rs
  • avro/src/bigdecimal.rs
  • avro/src/documentation/avro_data_model_to_serde.rs
  • avro/src/documentation/mod.rs
  • avro/src/documentation/serde_data_model_to_avro.rs
  • avro/src/error.rs
  • avro/src/reader/block.rs
  • avro/src/reader/datum.rs
  • avro/src/reader/mod.rs
  • avro/src/reader/single_object.rs
  • avro/src/schema/mod.rs
  • avro/src/schema/union.rs
  • avro/src/serde/derive.rs
  • avro/src/serde/deser_schema/array.rs
  • avro/src/serde/deser_schema/enums.rs
  • avro/src/serde/deser_schema/identifier.rs
  • avro/src/serde/deser_schema/map.rs
  • avro/src/serde/deser_schema/mod.rs
  • avro/src/serde/deser_schema/record.rs
  • avro/src/serde/deser_schema/tuple.rs
  • avro/src/serde/mod.rs
  • avro/src/serde/ser_schema/block.rs
  • avro/src/serde/ser_schema/mod.rs
  • avro/src/serde/ser_schema/record/field_default.rs
  • avro/src/serde/ser_schema/record/mod.rs
  • avro/src/serde/ser_schema/tuple.rs
  • avro/src/serde/ser_schema/union.rs
  • avro/src/serde/with.rs
  • avro/src/writer/datum.rs
  • avro/src/writer/mod.rs
  • avro/src/writer/single_object.rs
  • avro/tests/avro-rs-226.rs
  • avro/tests/avro-rs-285-bytes_deserialization.rs
  • avro/tests/schema.rs
  • avro/tests/serde_human_readable_false.rs
  • avro/tests/serde_human_readable_true.rs
  • avro/tests/union_schema.rs
  • avro_derive/tests/derive.proptest-regressions
  • avro_derive/tests/derive.rs
  • avro_derive/tests/serde.rs
  • licenserc.toml

//! how the different data models are mapped. When mapping from Rust types to an Avro schema,
//! see [the documentation for the reverse](super::serde_data_model_to_avro).
//!
//! Only the the mapping as defined here is supported. Any other behavior might change in a minor version.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix two documentation typos.

Line 24 has duplicated “the” (Only the the mapping...), and Line 41 says “most match” instead of “must match”.

Suggested doc fix
-//! Only the the mapping as defined here is supported. Any other behavior might change in a minor version.
+//! Only the mapping as defined here is supported. Any other behavior might change in a minor version.
...
-//!     - The index of the symbol most match the index of the variant
+//!     - The index of the symbol must match the index of the variant

Also applies to: 41-41

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/documentation/avro_data_model_to_serde.rs` at line 24, Update the
module-level documentation comment in avro_data_model_to_serde.rs: remove the
duplicated word so "Only the the mapping..." becomes "Only the mapping..." and
change the phrase "most match" to "must match" where it describes required
matching; both fixes are in the top documentation comment for the
avro_data_model_to_serde module.

Comment on lines +20 to +23
//! When manually mapping Rust types to an Avro schema, or the reverse, it is important to understand
//! how the different data models are mapped. When mapping from an Avro schema to Rust types,
//! see [the documentation for the reverse](super::serde_data_model_to_avro).
//!
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the “reverse” doc link.

This points back to serde_data_model_to_avro, so readers never reach the Avro→Serde mapping page. It should link to super::avro_data_model_to_serde.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/documentation/serde_data_model_to_avro.rs` around lines 20 - 23, The
module doc comment contains a broken "reverse" intra-doc link pointing to
super::serde_data_model_to_avro; update that link to point to
super::avro_data_model_to_serde so the Avro→Serde mapping page is referenced
correctly (edit the top-of-file doc comment in serde_data_model_to_avro.rs where
the "reverse" link is defined).

Comment on lines +789 to +847
/// Derive a name for this schema.
///
/// The name is a valid schema name and will be unique if the named
/// schemas in this schema have unique names.
pub(crate) fn unique_normalized_name(&self) -> Cow<'static, str> {
match self {
Schema::Null => Cow::Borrowed("n"),
Schema::Boolean => Cow::Borrowed("B"),
Schema::Int => Cow::Borrowed("i"),
Schema::Long => Cow::Borrowed("l"),
Schema::Float => Cow::Borrowed("f"),
Schema::Double => Cow::Borrowed("d"),
Schema::Bytes => Cow::Borrowed("b"),
Schema::String => Cow::Borrowed("s"),
Schema::Array(array) => {
Cow::Owned(format!("a_{}", array.items.unique_normalized_name()))
}
Schema::Map(map) => Cow::Owned(format!("m_{}", map.types.unique_normalized_name())),
Schema::Union(union) => {
let mut name = format!("u{}", union.schemas.len());
for schema in &union.schemas {
name.push('_');
name.push_str(&schema.unique_normalized_name());
}
Cow::Owned(name)
}
Schema::BigDecimal => Cow::Borrowed("bd"),
Schema::Date => Cow::Borrowed("D"),
Schema::TimeMillis => Cow::Borrowed("t"),
Schema::TimeMicros => Cow::Borrowed("tm"),
Schema::TimestampMillis => Cow::Borrowed("T"),
Schema::TimestampMicros => Cow::Borrowed("TM"),
Schema::TimestampNanos => Cow::Borrowed("TN"),
Schema::LocalTimestampMillis => Cow::Borrowed("L"),
Schema::LocalTimestampMicros => Cow::Borrowed("LM"),
Schema::LocalTimestampNanos => Cow::Borrowed("LN"),
Schema::Decimal(DecimalSchema {
inner: InnerDecimalSchema::Bytes,
precision,
scale,
}) => Cow::Owned(format!("db_{precision}_{scale}")),
Schema::Uuid(UuidSchema::Bytes) => Cow::Borrowed("ub"),
Schema::Uuid(UuidSchema::String) => Cow::Borrowed("us"),
Schema::Record(RecordSchema { name, .. })
| Schema::Enum(EnumSchema { name, .. })
| Schema::Fixed(FixedSchema { name, .. })
| Schema::Decimal(DecimalSchema {
inner: InnerDecimalSchema::Fixed(FixedSchema { name, .. }),
..
})
| Schema::Uuid(UuidSchema::Fixed(FixedSchema { name, .. }))
| Schema::Duration(FixedSchema { name, .. })
| Schema::Ref { name } => {
let name: String = name
.to_string()
.chars()
.map(|c| if c.is_ascii_alphanumeric() { c } else { '_' })
.collect();
Cow::Owned(format!("r{}_{}", name.len(), name))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make unique_normalized_name() injective for named fullnames.

Flattening every non-ASCII alphanumeric to _ collapses distinct fullnames such as a.b_c and a_b.c into the same r5_a_b_c. That breaks the uniqueness guarantee in the doc comment and can generate duplicate synthetic names for different schema branches. Use a reversible escape for separators instead of a blanket replacement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/schema/mod.rs` around lines 789 - 847, Summary:
unique_normalized_name currently collapses different named fullnames by mapping
all non-ASCII-alphanumerics to '_' which is not injective; replace that with a
reversible escape so different fullnames map to distinct names. Fix: in
unique_normalized_name (the match arm handling
Record/Enum/Fixed/Decimal/Uuid/Duration/Ref where name is converted), replace
the simple chars().map(...) logic with a small escaping routine that leaves
ASCII alphanumerics as-is but encodes every other byte/char into a reversible
token (e.g., percent-encoding or an _xHH_ hex escape); use the escaped string
for both the name and its length in format!("r{}_{}", ...). Update only that
conversion so other schema arms (Array/Map/Union/Decimal bytes/UUID cases)
remain unchanged.

Comment on lines +105 to +190
pub(crate) fn find_named_schema<'s>(
&'s self,
name: &str,
names: &'s HashMap<Name, impl Borrow<Schema>>,
) -> Result<Option<(usize, &'s Schema)>, Error> {
for index in self.named_index.iter().copied() {
let schema = &self.schemas[index];
if let Some(schema_name) = schema.name()
&& schema_name.name() == name
{
let schema = if let Schema::Ref { name } = schema {
names
.get(name)
.ok_or_else(|| Details::SchemaResolutionError(name.clone()))?
.borrow()
} else {
schema
};
return Ok(Some((index, schema)));
}
}
Ok(None)
}

/// Find a [`Schema::Fixed`] with the given size.
///
/// Will use `names` to resolve references.
pub(crate) fn find_fixed_of_size_n<'s>(
&'s self,
size: usize,
names: &'s HashMap<Name, impl Borrow<Schema>>,
) -> Result<Option<(usize, &'s FixedSchema)>, Error> {
for index in self.named_index.iter().copied() {
let schema = &self.schemas[index];
let schema = if let Schema::Ref { name } = schema {
names
.get(name)
.ok_or_else(|| Details::SchemaResolutionError(name.clone()))?
.borrow()
} else {
schema
};
match schema {
Schema::Fixed(fixed)
| Schema::Uuid(UuidSchema::Fixed(fixed))
| Schema::Decimal(DecimalSchema {
inner: InnerDecimalSchema::Fixed(fixed),
..
})
| Schema::Duration(fixed)
if fixed.size == size =>
{
return Ok(Some((index, fixed)));
}
_ => {}
}
}
Ok(None)
}

/// Find a [`Schema::Record`] with `n` fields.
///
/// Will use `names` to resolve references.
pub(crate) fn find_record_with_n_fields<'s>(
&'s self,
n_fields: usize,
names: &'s HashMap<Name, impl Borrow<Schema>>,
) -> Result<Option<(usize, &'s RecordSchema)>, Error> {
for index in self.named_index.iter().copied() {
let schema = &self.schemas[index];
let schema = if let Schema::Ref { name } = schema {
names
.get(name)
.ok_or_else(|| Details::SchemaResolutionError(name.clone()))?
.borrow()
} else {
schema
};
match schema {
Schema::Record(record) if record.fields.len() == n_fields => {
return Ok(Some((index, record)));
}
_ => {}
}
}
Ok(None)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't let these union lookups silently pick the first ambiguous branch.

All three helpers return on the first match, but unions may legally contain ns1.Foo/ns2.Foo, multiple fixeds with the same size, or multiple records with the same field count. In those cases schema-aware deserialization becomes union-order dependent and can resolve to the wrong branch instead of failing deterministically. Please detect multiple matches and error out here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/schema/union.rs` around lines 105 - 190, These helpers
(find_named_schema, find_fixed_of_size_n, find_record_with_n_fields) currently
return the first match and thus silently accept ambiguous union branches; change
each to detect multiple distinct matches and return an error instead of the
first match: iterate as today but store the first matched (index and resolved
schema pointer); on any subsequent match where the resolved schema (or index)
differs from the stored one, return an Err(Details::SchemaResolutionError(...)
or a new AmbiguousUnion/SchemaResolutionError variant) describing the ambiguity;
if iteration finishes with exactly one match return Ok(Some(...)) and if zero
matches return Ok(None). Keep the existing name-resolution logic (using
names.get(...).borrow()) when comparing resolved schemas.

Comment on lines +56 to +68
fn read_block_header(reader: &mut R) -> Result<Option<u32>, Error> {
let remaining = zag_i32(reader)?;
if remaining < 0 {
// If the block size is negative the next number is the size of the block in bytes
let _bytes = zag_i64(reader)?;
}
if remaining == 0 {
// If the block size is zero the array is finished
Ok(None)
} else {
Ok(Some(remaining.unsigned_abs()))
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor typo in comment.

Line 63 says "array is finished" but this is a map deserializer.

📝 Suggested fix
         if remaining == 0 {
-            // If the block size is zero the array is finished
+            // If the block size is zero the map is finished
             Ok(None)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn read_block_header(reader: &mut R) -> Result<Option<u32>, Error> {
let remaining = zag_i32(reader)?;
if remaining < 0 {
// If the block size is negative the next number is the size of the block in bytes
let _bytes = zag_i64(reader)?;
}
if remaining == 0 {
// If the block size is zero the array is finished
Ok(None)
} else {
Ok(Some(remaining.unsigned_abs()))
}
}
fn read_block_header(reader: &mut R) -> Result<Option<u32>, Error> {
let remaining = zag_i32(reader)?;
if remaining < 0 {
// If the block size is negative the next number is the size of the block in bytes
let _bytes = zag_i64(reader)?;
}
if remaining == 0 {
// If the block size is zero the map is finished
Ok(None)
} else {
Ok(Some(remaining.unsigned_abs()))
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/deser_schema/map.rs` around lines 56 - 68, The comment inside
read_block_header incorrectly mentions an "array" — update the comment to refer
to a "map" (e.g., change "If the block size is zero the array is finished" to
"If the block size is zero the map is finished") so the documentation matches
the map deserializer behavior in read_block_header.

Comment on lines +221 to +230
fn serialize_value<T>(&mut self, value: &T) -> Result<(), Self::Error>
where
T: ?Sized + Serialize,
{
self.serialize_next_field(
self.map_position
.expect("serialze_value called without calling serialize_key"),
value,
)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in expect message.

"serialze_value" should be "serialize_value".

📝 Suggested fix
     fn serialize_value<T>(&mut self, value: &T) -> Result<(), Self::Error>
     where
         T: ?Sized + Serialize,
     {
         self.serialize_next_field(
             self.map_position
-                .expect("serialze_value called without calling serialize_key"),
+                .expect("serialize_value called without calling serialize_key"),
             value,
         )
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn serialize_value<T>(&mut self, value: &T) -> Result<(), Self::Error>
where
T: ?Sized + Serialize,
{
self.serialize_next_field(
self.map_position
.expect("serialze_value called without calling serialize_key"),
value,
)
}
fn serialize_value<T>(&mut self, value: &T) -> Result<(), Self::Error>
where
T: ?Sized + Serialize,
{
self.serialize_next_field(
self.map_position
.expect("serialize_value called without calling serialize_key"),
value,
)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/record/mod.rs` around lines 221 - 230, The expect
message in the serialize_value method is misspelled; update the panic string in
the call to self.map_position.expect(...) inside fn serialize_value to read
"serialize_value called without calling serialize_key" (referencing the
serialize_value method, the map_position field, and the serialize_next_field
call) so the error message is spelled correctly.

Comment on lines +190 to +202
fn serialize_u64(mut self, v: u64) -> Result<Self::Ok, Self::Error> {
match self.union.find_named_schema("u64", self.config.names)? {
Some((index, Schema::Fixed(FixedSchema { size: 16, .. }))) => {
let mut bytes_written = zig_i32(index as i32, &mut *self.writer)?;
bytes_written += self.write_array(v.to_le_bytes())?;
Ok(bytes_written)
}
_ => Err(self.error(
"u64",
r#"Expected Schema::Fixed(name: "u64", size: 8) in variants"#,
)),
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Size mismatch between pattern matching and error message for u64 serialization.

The code matches Schema::Fixed with size: 16, but u64::to_le_bytes() produces 8 bytes, and the error message references size: 8. The pattern should match size: 8 to be consistent.

🐛 Proposed fix
     fn serialize_u64(mut self, v: u64) -> Result<Self::Ok, Self::Error> {
         match self.union.find_named_schema("u64", self.config.names)? {
-            Some((index, Schema::Fixed(FixedSchema { size: 16, .. }))) => {
+            Some((index, Schema::Fixed(FixedSchema { size: 8, .. }))) => {
                 let mut bytes_written = zig_i32(index as i32, &mut *self.writer)?;
                 bytes_written += self.write_array(v.to_le_bytes())?;
                 Ok(bytes_written)
             }
             _ => Err(self.error(
                 "u64",
                 r#"Expected Schema::Fixed(name: "u64", size: 8) in variants"#,
             )),
         }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn serialize_u64(mut self, v: u64) -> Result<Self::Ok, Self::Error> {
match self.union.find_named_schema("u64", self.config.names)? {
Some((index, Schema::Fixed(FixedSchema { size: 16, .. }))) => {
let mut bytes_written = zig_i32(index as i32, &mut *self.writer)?;
bytes_written += self.write_array(v.to_le_bytes())?;
Ok(bytes_written)
}
_ => Err(self.error(
"u64",
r#"Expected Schema::Fixed(name: "u64", size: 8) in variants"#,
)),
}
}
fn serialize_u64(mut self, v: u64) -> Result<Self::Ok, Self::Error> {
match self.union.find_named_schema("u64", self.config.names)? {
Some((index, Schema::Fixed(FixedSchema { size: 8, .. }))) => {
let mut bytes_written = zig_i32(index as i32, &mut *self.writer)?;
bytes_written += self.write_array(v.to_le_bytes())?;
Ok(bytes_written)
}
_ => Err(self.error(
"u64",
r#"Expected Schema::Fixed(name: "u64", size: 8) in variants"#,
)),
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/union.rs` around lines 190 - 202, In serialize_u64,
the pattern match incorrectly expects FixedSchema { size: 16 } while the code
writes v.to_le_bytes() (8 bytes) and the error message says size: 8; update the
match to expect FixedSchema { size: 8 } (in the match on
union.find_named_schema("u64", ...)), ensuring Schema::Fixed / FixedSchema size
matches the 8-byte u64 representation so the success branch and error message
are consistent.

Comment on lines +73 to +77
/// Should [`Serialize`] implementations pick a human readable represenation.
///
/// It is recommended to set this to `false`.
#[builder(default = is_human_readable())]
human_readable: bool,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in documentation: "represenation" → "representation"

Same typo as in writer/mod.rs.

📝 Proposed fix
-        /// Should [`Serialize`] implementations pick a human readable represenation.
+        /// Should [`Serialize`] implementations pick a human readable representation.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Should [`Serialize`] implementations pick a human readable represenation.
///
/// It is recommended to set this to `false`.
#[builder(default = is_human_readable())]
human_readable: bool,
/// Should [`Serialize`] implementations pick a human readable representation.
///
/// It is recommended to set this to `false`.
#[builder(default = is_human_readable())]
human_readable: bool,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/writer/datum.rs` around lines 73 - 77, Fix the documentation typo
for the field human_readable in datum.rs: change "represenation" to
"representation" in the doc comment above the human_readable field (the same
typo also appears in writer/mod.rs—update there too) so the docs read "Should
[`Serialize`] implementations pick a human readable representation." and keep
the existing `#[builder(default = is_human_readable())]` and field name
unchanged.

Comment on lines +73 to +77
/// Should [`Serialize`] implementations pick a human readable represenation.
///
/// It is recommended to set this to `false`.
#[builder(default = is_human_readable())]
human_readable: bool,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in documentation: "represenation" → "representation"

📝 Proposed fix
-        /// Should [`Serialize`] implementations pick a human readable represenation.
+        /// Should [`Serialize`] implementations pick a human readable representation.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Should [`Serialize`] implementations pick a human readable represenation.
///
/// It is recommended to set this to `false`.
#[builder(default = is_human_readable())]
human_readable: bool,
/// Should [`Serialize`] implementations pick a human readable representation.
///
/// It is recommended to set this to `false`.
#[builder(default = is_human_readable())]
human_readable: bool,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/writer/mod.rs` around lines 73 - 77, Fix the typo in the doc comment
for the human_readable field: change "represenation" to "representation" in the
comment above the human_readable field (the comment referencing Serialize and
the builder default is_human_readable()). Keep the rest of the comment text
unchanged.

Comment on lines +107 to +111
/// Should [`Serialize`] implementations pick a human readable represenation.
///
/// It is recommended to set this to `false`.
#[builder(default = is_human_readable())]
human_readable: bool,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in documentation: "represenation" → "representation"

Same typo as in other files.

📝 Proposed fix
-    /// Should [`Serialize`] implementations pick a human readable represenation.
+    /// Should [`Serialize`] implementations pick a human readable representation.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Should [`Serialize`] implementations pick a human readable represenation.
///
/// It is recommended to set this to `false`.
#[builder(default = is_human_readable())]
human_readable: bool,
/// Should [`Serialize`] implementations pick a human readable representation.
///
/// It is recommended to set this to `false`.
#[builder(default = is_human_readable())]
human_readable: bool,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/writer/single_object.rs` around lines 107 - 111, Fix the typo in the
doc comment for the struct field `human_readable` in single_object.rs: change
"represenation" to "representation" in the triple-slash doc comment above the
`human_readable: bool` field (the comment starting "Should [`Serialize`]
implementations pick a human readable ..."). Ensure the corrected spelling is
applied consistently like in the other files.

@martin-augment
Copy link
Copy Markdown
Owner Author

Superseded by #65

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants