Skip to content

512: feat!: Support enums and tuples in SchemaAwareSerializer and implement SchemaAwareDeserializer#65

Open
martin-augment wants to merge 12 commits intomainfrom
pr-512-2026-03-23-07-11-45
Open

512: feat!: Support enums and tuples in SchemaAwareSerializer and implement SchemaAwareDeserializer#65
martin-augment wants to merge 12 commits intomainfrom
pr-512-2026-03-23-07-11-45

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

512: To review by AI

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 23, 2026

Walkthrough

This pull request adds schema-aware serialization and deserialization infrastructure to the apache_avro crate. It introduces new deser_schema and ser_schema modules that implement Serde integration for encoding and decoding Avro data with explicit schema validation. Reader and writer APIs are extended with builder patterns, configuration fields for human_readable and target_block_size settings, and new methods (read_deser, write_ser_to_vec) to directly convert between Rust types and Avro bytes without intermediate Value representations. Error types are refined with new contextual variants. Support is added for tuples, fixed-size arrays, and BigDecimal serialization through dedicated serde helper modules. Documentation files describing type mappings between Avro and Serde data models are introduced. Tests and examples throughout the crate are refactored to use the new deserialization paths.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pr-512-2026-03-23-07-11-45

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

);

self.block.read_next_deser(self.reader_schema)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reader passes reader_schema causing unexpected panic

High Severity

read_next_deser passes self.reader_schema directly to block.read_next_deser, but block.read_next_deser panics whenever read_schema.is_some(). When a user provides a reader_schema that equals the writer_schema, should_resolve_schema is false (so the assert passes), but the block still receives Some(...) and panics. The existing read_next method correctly passes None when !should_resolve_schema, but read_next_deser does not replicate this filtering logic.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Bugbot AI reviewer is correct! The new serde deserializer does not support reader schema and this is explained in a TODO. But instead of panicking and crashing the application it would be better to return an Err.

&& union.is_nullable()
{
let index = zag_i32(self.reader)?;
let schema = union.get_variant(index as usize)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe as usize cast on potentially negative index

Medium Severity

In deserialize_option, index as usize is used to convert the result of zag_i32 without checking for negative values. Elsewhere in this file (e.g., with_union on line 137), usize::try_from(index) is correctly used to produce a proper error on negative indices. With corrupt data, the as usize cast silently wraps a negative value, leading to a confusing out-of-bounds error message instead of the expected ConvertI32ToUsize error.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Bugbot AI reviewer is correct! Any casting from a wider integer type to a narrower one should be done by using the checked APIs and report Err in case the value does not fit in the target type. Prevents silently overflowing the value and using a totally wrong value for any following operations with that number.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces schema awareness to the serialization and deserialization processes, enabling more robust and reliable data handling. It includes support for enums and tuples, along with comprehensive documentation to guide developers in mapping Avro schemas to Rust types and vice versa. The changes enhance the library's ability to work with complex data structures while maintaining type safety.

Highlights

  • Schema Aware Serializer: Adds support for serializing enums and tuples with schema awareness, ensuring data is written according to the defined schema.
  • Schema Aware Deserializer: Implements a schema aware deserializer, enabling the reading of Avro data into Rust types based on a provided schema.
  • Documentation: Adds documentation for mapping between Avro and Serde data models, improving clarity on how different types are handled.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Mar 23, 2026

🤖 Augment PR Summary

Summary: This PR expands schema-aware Serde support by adding a raw-Avro SchemaAwareDeserializer and extending SchemaAwareSerializer to cover enums, tuples, and fixed-size arrays.

Changes:

  • Introduced serde::deser_schema (schema-aware deserialization directly from Avro binary) and wired it into datum readers, container readers, and single-object readers.
  • Refactored writers to use the new schema-aware serializer config (including human_readable and block sizing for arrays/maps).
  • Added tuple and fixed-size array schema generation (record-based) and support for serializing/deserializing these shapes.
  • Added new serde::with helpers for Avro array encoding of Rust arrays and byte-based BigDecimal encoding.
  • Extended UnionSchema helpers and improved/expanded error variants/messages for schema-aware (de)serialization.
  • Updated and added tests across reader/writer/derive/interop to exercise the new serde paths and mappings.

Technical Notes: Several reader-schema resolution paths are still marked WIP (panicking when resolution would be required), and documentation modules were added to clarify Serde↔Avro data model mappings.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 4 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

"Schema aware deserialisation does not resolve schemas yet"
);

self.block.read_next_deser(self.reader_schema)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read_next_deser always forwards self.reader_schema into Block::read_next_deser, but Block::read_next_deser panics whenever read_schema.is_some() (even when should_resolve_schema is false because schemas match). This makes Reader::into_deser_iter panic for a valid configured reader_schema that happens to equal the writer schema.

Severity: high

Other Locations
  • avro/src/reader/block.rs:232

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The new serde deserializer does not support reader schema and this is explained in a TODO. But instead of panicking and crashing the application it would be better to return an Err.

&& union.is_nullable()
{
let index = zag_i32(self.reader)?;
let schema = union.get_variant(index as usize)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index is an i32 but is cast to usize (index as usize), which can wrap negative values and yield confusing out-of-bounds behavior. Other union paths here use usize::try_from(index) with a dedicated conversion error, which seems safer/consistent.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! Any casting from a wider integer type to a narrower one should be done by using the checked APIs and report Err in case the value does not fit in the target type. Prevents silently overflowing the value and using a totally wrong value for any following operations with that number.

{
match self.schema {
Schema::Record(record) if record.name.name() == name && record.fields.len() == len => {
visitor.visit_map(RecordDeserializer::new(self.reader, record, self.config))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deserialize_tuple_struct calls visitor.visit_map(...), but tuple-struct Deserialize implementations typically expect sequence access from deserialize_tuple_struct. If the visitor doesn’t implement visit_map, tuple-struct deserialization from Avro records will fail at runtime.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

| Schema::BigDecimal
| Schema::Decimal(_)
| Schema::Duration(_)
| Schema::Date,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema::Date is included in the branch that serializes string defaults as bytes, which would accept an invalid default type for date and write an incompatible encoding. It seems Schema::Date should only accept numeric defaults (like the Schema::Int | Schema::Date | Schema::TimeMillis arm above).

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The Avro specification does not support encoding dates (and times) as strings, so this branch should be removed. Dates are encoded as numbers (seconds since epoch), so the default value must be an integer

@claude
Copy link
Copy Markdown

claude bot commented Mar 23, 2026

PR Review: feat!: Support enums and tuples in SchemaAwareSerializer and implement SchemaAwareDeserializer

This is a substantial PR adding enum/tuple support to the serializer and implementing the full schema-aware deserializer. The overall design is solid and the modular decomposition (separate files for enums, record, tuple, block, identifier) makes the code easier to follow. Here are my findings:

Bugs / Correctness

1. Unsafe cast in deserialize_option

In deser_schema/mod.rs, the option handling does:

let index = zag_i32(self.reader)?;
let schema = union.get_variant(index as usize)?;

Every other place in this PR uses usize::try_from(index).map_err(...) before the bounds check. A negative i32 cast to usize wraps to a huge number on 64-bit; get_variant will still catch it, but the error will show a nonsensical large index rather than a clear "negative index" message. Be consistent with the rest of the codebase.

2. usize as u32 truncation in UnionEnumDeserializer::variant_seed

seed.deserialize(IdentifierDeserializer::index(index as u32))? — union indices are usize here but IdentifierDeserializer takes u32. Indices above u32::MAX would silently truncate. Low-priority but worth a try_from.

API Design Concerns

3. Panics in public iterator path — these should return Err

reader/mod.rs and reader/block.rs have assert!/panic! for the not-yet-implemented schema-resolution path. ReaderDeser is a public iterator. A user who calls Reader::builder(...).reader_schema(&schema).build()?.into_deser_iter::<T>() and then iterates will get a panic rather than Some(Err(...)). The builder-level doc comment says "will panic during reading" which is unusual and user-hostile for an iterator. Consider returning Err(Details::NotYetImplemented(...)) instead, or making into_deser_iter a compile error when a reader schema is set.

4. Breaking changes need prominent documentation

Two silent breaking changes are buried in the diff:

  • std::time::Duration schema changed from Schema::Duration (a 12-byte fixed) to a Schema::Record with secs: Fixed(u64) and nanos: Long fields. Any existing serialized data using Duration is no longer readable without migration.
  • [T; N] no longer implements AvroSchemaComponent by default (the manual impl was removed). Users who had array fields will get compile errors until they add the appropriate #[serde(with = ...)] attribute.

Both should be prominently documented in the PR description / CHANGELOG / migration guide.

Minor Issues

5. Error message regression

Old error messages included actionable hints like:

"Did you mean to use Schema::Uuid(UuidSchema::String) or utils::serde_set_human_readable(false)?"

The new messages drop these hints. Tests were updated to match the new (less helpful) wording. Consider preserving the actionability of these errors.

6. TODOs that affect correctness

  • deserialize_ignored_any always errors in IdentifierDeserializer — this breaks #[serde(flatten)] and similar patterns.
  • // TODO: Add Seek bound and skip over ignored data — unknown/extra fields during deserialization will error rather than be skipped gracefully.

Both are worth tracking as follow-up issues if not in scope for this PR.

7. #[expect(private_interfaces)] on public enum

TupleSerializer is pub with variants referencing private structs. Since it is a Serializer associated type it must be pub — but a comment explaining this forced design choice would help future readers.

8. unwrap() calls in field_default.rs

Lines like n.as_i64().unwrap() and n.as_f64().unwrap() are safe due to surrounding match guards, but prefer expect("unreachable: guarded by match arm") for clarity.

Positive Highlights

  • Modular decomposition into block, enums, record, tuple, identifier sub-modules is clean.
  • Out-of-order field caching in RecordSerializer is well thought out.
  • Config being Copy via manual impl (to avoid unnecessary bounds) is a nice touch.
  • SchemaAwareRecordFieldDefault for handling skipped fields with defaults is thorough.
  • New documentation modules (avro_data_model_to_serde, serde_data_model_to_avro) are excellent additions.
  • #[track_caller] on serde_assert improves test diagnostics.

Summary

Main blocking issues: #3 (panics instead of errors in the public iterator) and #4 (undocumented breaking changes). Issues #1 and #5 are moderate. The rest are minor polish.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a substantial pull request that introduces a new schema-aware serializer and deserializer, refactors many parts of the library to use them, and improves the Avro schema mapping for several Rust types like [T; N] and core::time::Duration. The changes are well-structured and accompanied by extensive tests and new documentation, which is commendable.

My main feedback is regarding a regression in the quality of diagnostic error messages for mismatched types during serialization, particularly for uuid::Uuid. The new error messages are more generic and less helpful than before. I've left specific comments with suggestions on how to improve them.

Comment on lines 81 to 83
writer.write(uuid, &mut buffer).unwrap_err().to_string(),
"Failed to serialize value of type bytes using schema Uuid(String): 55e840e29b41d4a7164466554400. Cause: Expected a string, but got 16 bytes. Did you mean to use `Schema::Uuid(UuidSchema::Fixed)` or `utils::serde_set_human_readable(true)`?"
"Failed to serialize value of type `bytes` using Schema::Uuid(String): Expected Schema::Bytes | Schema::Fixed | Schema::BigDecimal | Schema::Decimal | Schema::Uuid(Bytes | Fixed) | Schema::Duration"
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The updated error message is less informative than the previous one. The old message was very specific, guiding the user on how to fix the issue by suggesting Schema::Uuid(UuidSchema::Fixed) or setting human_readable to true. The new message is a generic list of expected schemas for byte serialization.

While this is a consequence of the new generic serializer, it's a regression in diagnostics. Could we restore a more specific error message for this case? For example, by adding a special check for Schema::Uuid in serialize_bytes within avro/src/serde/ser_schema/mod.rs when the schema doesn't match.

Comment on lines 105 to 107
writer.write(uuid, &mut buffer).unwrap_err().to_string(),
r#"Failed to serialize value of type string using schema Uuid(Bytes): 550e8400-e29b-41d4-a716-446655440000. Cause: Expected bytes but got a string. Did you mean to use `Schema::Uuid(UuidSchema::String)` or `utils::serde_set_human_readable(false)`?"#
r#"Failed to serialize value of type `str` using Schema::Uuid(Bytes): Expected Schema::String | Schema::Uuid(String)"#
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to my other comment, the new error message here is a regression in terms of helpfulness. The previous error message clearly stated the problem (expected bytes, got a string) and suggested solutions (Schema::Uuid(UuidSchema::String) or setting human_readable to false). The new message is a generic list of schemas that can handle a string.

It would be great to improve the diagnostics here to be as helpful as they were before. This could likely be addressed by adding more specific checks within serialize_str in avro/src/serde/ser_schema/mod.rs for Uuid-related schemas.

Comment on lines 131 to 133
writer.write(uuid, &mut buffer).unwrap_err().to_string(),
r#"Failed to serialize value of type string using schema Uuid(Fixed(FixedSchema { name: Name { name: "uuid", .. }, size: 16, .. })): 550e8400-e29b-41d4-a716-446655440000. Cause: Expected bytes but got a string. Did you mean to use `Schema::Uuid(UuidSchema::String)` or `utils::serde_set_human_readable(false)`?"#
r#"Failed to serialize value of type `str` using Schema::Uuid(Fixed(FixedSchema { name: Name { name: "uuid", .. }, size: 16, .. })): Expected Schema::String | Schema::Uuid(String)"#
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This is another instance where the error message has become less helpful. The old error message was very specific about the mismatch and how to resolve it. The new one is generic.

Improving the error message to be more specific about the Uuid type mismatch would enhance the developer experience.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (2)
avro_derive/src/attributes/mod.rs (1)

321-344: Consider clarifying the error message when default is auto-disabled.

The logic to auto-disable field_default when #[avro(with)] is used is sound—calling T::field_default() doesn't make sense when the schema comes from a different source.

However, the interaction with the subsequent validation (lines 336-344) could produce a confusing error. If a user writes:

#[avro(with)]
#[serde(with = "my_module", skip_serializing_if = "Option::is_none")]
field: Option<MyType>,

They'll receive an error stating skip_serializing_if is incompatible with #[avro(default = false)], even though they never explicitly set default = false. The auto-disabling behavior isn't obvious from the error message.

Consider either:

  1. Providing a more specific error message when the default was auto-disabled due to #[avro(with)]
  2. Or documenting this interaction in the TODO mentioned on line 330
💡 Sketch of a clearer error message
+        let default_was_auto_disabled = with != With::Trait && avro.default == FieldDefault::Trait;
         // TODO: Implement a better way to do this (maybe if user specifies `#[avro(with)]` also use that for the default)
         // Disable getting the field default, if the schema is not retrieved from the field type
         if with != With::Trait && avro.default == FieldDefault::Trait {
             avro.default = FieldDefault::Disabled;
         }
 
         if ((serde.skip_serializing && !serde.skip_deserializing)
             || serde.skip_serializing_if.is_some())
             && avro.default == FieldDefault::Disabled
         {
-            errors.push(syn::Error::new(
-                span,
-                "`#[serde(skip_serializing)]` and `#[serde(skip_serializing_if)]` are incompatible with `#[avro(default = false)]`"
-            ));
+            let msg = if default_was_auto_disabled {
+                "`#[serde(skip_serializing)]` and `#[serde(skip_serializing_if)]` require a default value, but `#[avro(with)]` disables auto-default. Please provide an explicit `#[avro(default = \"...\")]`"
+            } else {
+                "`#[serde(skip_serializing)]` and `#[serde(skip_serializing_if)]` are incompatible with `#[avro(default = false)]`"
+            };
+            errors.push(syn::Error::new(span, msg));
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro_derive/src/attributes/mod.rs` around lines 321 - 344, The validation
error about incompatibility with `#[avro(default = false)]` can be misleading
when the default was auto-disabled because `#[avro(with)]` was used; update the
check around serde.skip_serializing/skip_serializing_if to detect that `with !=
With::Trait` and that the field default was originally `FieldDefault::Trait`
(i.e., auto-disabled to `FieldDefault::Disabled`) and, in that case, push a
clearer error via `errors.push(syn::Error::new(span, ...))` that mentions the
default was auto-disabled due to `#[avro(with)]` (or alternatively include that
context in the existing message), using the same symbols
(`With::from_avro_and_serde`, `With::Trait`, `avro.default`,
`FieldDefault::Trait`/`Disabled`, `serde.skip_serializing`,
`serde.skip_serializing_if`, and `span`) so callers know why `default` became
disabled.
avro/src/reader/block.rs (1)

218-254: Consider extracting duplicated infinite-loop protection logic.

Lines 247-250 duplicate the same check from read_next (lines 209-212). Consider extracting this into a helper method to reduce duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/reader/block.rs` around lines 218 - 254, Duplicate infinite-loop
protection logic in read_next_deser and read_next should be extracted into a
shared helper: create a method (e.g., check_no_progress_or_error or
ensure_progress) that takes the original byte count and the current slice length
(or both as usize), performs the b_original != 0 && b_original == current_len
check, and returns AvroResult<()> with Err(Details::ReadBlock.into()) on
failure; call this helper from read_next_deser (replacing lines 247-250) and
from read_next (replacing lines 209-212), and update callers to adjust buf_idx
and message_count only after the helper confirms progress.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@avro/src/documentation/serde_data_model_to_avro.rs`:
- Around line 56-57: The doc comment in serde_data_model_to_avro.rs contains a
stray backtick in the sentence "This is different from normal tuples`."—remove
the stray backtick so the sentence ends "normal tuples." (Locate the comment
block in serde_data_model_to_avro.rs around the tuple-struct note and edit the
text accordingly.)

In `@avro/src/reader/block.rs`:
- Around line 232-234: Replace the panic! in the branch that checks read_schema
(the SchemaAwareResolvingDeserializer TODO) with returning a proper error result
instead; add a new Details enum variant like SchemaResolutionNotImplemented with
#[error("Schema-aware deserialization does not support schema resolution yet")]
and then change the code in block.rs (the branch that currently calls panic!) to
return Err(Details::SchemaResolutionNotImplemented.into()) (or construct the
crate's error type expected by the surrounding function) so callers can handle
the unimplemented feature gracefully.

In `@avro/src/reader/datum.rs`:
- Around line 145-160: read_deser currently panics any time a reader schema is
present; change it to only error when the reader schema actually differs from
the writer schema. In read_deser, inspect self.reader (the Some((reader_schema,
_)) tuple) and compare reader_schema to self.writer; if they are identical,
proceed with the same SchemaAwareDeserializer path you use in the else branch
(using Config { names: self.resolved.get_names(), human_readable:
self.human_readable }), otherwise return an AvroResult::Err (e.g. an AvroError
indicating schema resolution is not implemented) instead of unconditionally
panicking; references: read_deser, self.reader, self.writer,
SchemaAwareDeserializer, self.resolved.get_names().

In `@avro/src/reader/mod.rs`:
- Around line 133-140: Replace the panic assert in read_next_deser with
returning an AvroResult::Err so the iterator yields an error instead of
panicking: check the should_resolve_schema flag in fn read_next_deser and if
true return an appropriate Avro error (e.g., a variant indicating
schema-resolving deserialization is not implemented or unsupported) wrapped in
AvroResult::Err; otherwise proceed to call
self.block.read_next_deser(self.reader_schema). Ensure the returned error type
matches the crate's AvroError/AvroResult conventions so callers receive a
recoverable Err instead of a panic.

In `@avro/src/serde/derive.rs`:
- Around line 809-815: The synthesized helper record names (created where
T::get_schema_in_ctxt is used and via Name::new_with_enclosing_namespace) can
collide with user-defined types because they live in the caller's namespace and
are directly checked against named_schemas; change the name-generation to
guarantee uniqueness by either: (a) creating the helper names in a dedicated
internal/synthetic namespace instead of enclosing_namespace, or (b)
loop-appending a deterministic collision suffix (e.g., incrementing counter or
short hash/UUID) until Name::new_with_enclosing_namespace returns a name not
present in named_schemas; apply this change to both the tuple/array helper
generation sites (the block using format!("A{N}_{}",
t_schema.unique_normalized_name()) and the analogous T{len} generation) and
ensure subsequent code emits the newly-reserved name rather than producing a
Schema::Ref to an existing user type.

In `@avro/src/serde/deser_schema/mod.rs`:
- Around line 1474-1476: The test constructs value2 using
TupleExternalEnum::Val1 again, so the tuple-variant path for the three-field
variant is never exercised; update the TestTupleExternalEnum construction for
value2 to use TupleExternalEnum::Val2 with three fields (matching the enum's
three-field tuple variant) so the Val2 branch is executed (refer to
TestTupleExternalEnum, TupleExternalEnum::Val2, and value2).
- Around line 633-645: In deserialize_tuple_struct, the code incorrectly calls
visitor.visit_map for tuple-structs; change it to call visitor.visit_seq so
Serde's tuple-struct visitors are satisfied. Specifically, in the match arm
handling Schema::Record where record.name.name() == name && record.fields.len()
== len, replace the visitor.visit_map(RecordDeserializer::new(self.reader,
record, self.config)) call with visitor.visit_seq using the same
RecordDeserializer instance (same params), mirroring how deserialize_tuple uses
visit_seq with ManyTupleDeserializer. Ensure the rest of the method's branches
remain unchanged.

In `@avro/src/serde/deser_schema/tuple.rs`:
- Around line 105-120: In next_element_seed, avoid incrementing current_field
before deserialization: retrieve schema using
self.schema.fields[self.current_field].schema, create the
SchemaAwareDeserializer and call seed.deserialize first (propagating any error),
and only after deserialize succeeds increment self.current_field and return
Some(value); this ensures SchemaAwareDeserializer::new or seed.deserialize
failures do not advance current_field and preserves caller state for retries or
inspection.

In `@avro/src/serde/ser_schema/record/field_default.rs`:
- Around line 83-92: The match on Value::String for byte-like schemas (the arm
matching Schema::Fixed, Decimal with InnerDecimalSchema::Fixed, Uuid::Fixed, and
Duration) should validate length by Unicode code points (s.chars().count()) not
UTF-8 bytes, and the serialization should convert each char to a raw byte by
mapping char as u32 & 0xFF into a Vec<u8> (instead of using s.as_bytes()); also
remove Schema::Date from the string-handling pattern so date logical types are
not treated as byte-strings. Locate and update the match arm(s) in
field_default.rs where Value::String(s) is validated and serialized (the
fixed/decimal/uuid/duration string branches) to use chars().count() for length
checks and iterate chars to produce bytes, and drop Schema::Date from that
pattern.

In `@avro/src/serde/ser_schema/tuple.rs`:
- Around line 282-297: The OneUnionTupleSerializer SerializeTuple impl currently
doesn't enforce exactly one element; add an element counter field (e.g.,
elements_written or element_count) to OneUnionTupleSerializer, increment it
inside serialize_element (alongside bytes_written), and return an appropriate
Error if serialize_element is called when the counter is already 1 (prevent >1
elements). In end(), validate the counter is exactly 1 and return an Error if
it's 0; update serialize_element and end to use this counter to enforce the
single-element requirement while keeping bytes_written logic intact.
- Around line 235-262: OneTupleSerializer currently doesn't enforce that exactly
one element was serialized; add a counter field (e.g., elements_serialized:
usize) to OneTupleSerializer, increment it in serialize_element (alongside
updating bytes_written), and change end() to validate that elements_serialized
== 1 returning an appropriate Err(Error::... ) when it's 0 or >1 (mirroring the
validation logic used by ManyTupleSerializer) so invalid tuple lengths produce
an error instead of silently succeeding.

In `@avro/src/serde/ser_schema/union.rs`:
- Around line 258-310: The serialize_bytes function currently only treats
SchemaKind::Bytes and fixed-size Schema::Fixed as byte-backed union variants, so
Schema::BigDecimal variants are ignored; update the logic in serialize_bytes
(including the BytesType::Bytes branch, the BytesType::Unset matching where
bytes_index is computed, and any error messages) to also consider
SchemaKind::BigDecimal as a bytes-backed variant (e.g., use
union.index_of_schema_kind for both SchemaKind::Bytes and SchemaKind::BigDecimal
or add a helper that checks either kind) so BigDecimal union members are
selected and handled with with_len semantics the same as bytes.

In `@avro/src/serde/with.rs`:
- Around line 546-550: The new helpers bigdecimal, bigdecimal_opt, array, and
array_opt currently define get_record_fields_in_ctxt without the leading usize
parameter, breaking the ABI; update each helper's get_record_fields_in_ctxt
signature to accept the unused leading usize parameter (i.e. change fn
get_record_fields_in_ctxt(_: &mut HashSet<Name>, _: NamespaceRef) ->
Option<Vec<RecordField>> to fn get_record_fields_in_ctxt(_: usize, _: &mut
HashSet<Name>, _: NamespaceRef) -> Option<Vec<RecordField>>) so they match the
existing helpers' signature and remain backward compatible with callers and the
macro-generated 3-argument form.

---

Nitpick comments:
In `@avro_derive/src/attributes/mod.rs`:
- Around line 321-344: The validation error about incompatibility with
`#[avro(default = false)]` can be misleading when the default was auto-disabled
because `#[avro(with)]` was used; update the check around
serde.skip_serializing/skip_serializing_if to detect that `with != With::Trait`
and that the field default was originally `FieldDefault::Trait` (i.e.,
auto-disabled to `FieldDefault::Disabled`) and, in that case, push a clearer
error via `errors.push(syn::Error::new(span, ...))` that mentions the default
was auto-disabled due to `#[avro(with)]` (or alternatively include that context
in the existing message), using the same symbols (`With::from_avro_and_serde`,
`With::Trait`, `avro.default`, `FieldDefault::Trait`/`Disabled`,
`serde.skip_serializing`, `serde.skip_serializing_if`, and `span`) so callers
know why `default` became disabled.

In `@avro/src/reader/block.rs`:
- Around line 218-254: Duplicate infinite-loop protection logic in
read_next_deser and read_next should be extracted into a shared helper: create a
method (e.g., check_no_progress_or_error or ensure_progress) that takes the
original byte count and the current slice length (or both as usize), performs
the b_original != 0 && b_original == current_len check, and returns
AvroResult<()> with Err(Details::ReadBlock.into()) on failure; call this helper
from read_next_deser (replacing lines 247-250) and from read_next (replacing
lines 209-212), and update callers to adjust buf_idx and message_count only
after the helper confirms progress.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 66191e49-13e3-4c2e-a621-d0c50719d431

📥 Commits

Reviewing files that changed from the base of the PR and between fbec01b and ecaf964.

📒 Files selected for processing (42)
  • avro/examples/test_interop_single_object_encoding.rs
  • avro/src/bigdecimal.rs
  • avro/src/documentation/avro_data_model_to_serde.rs
  • avro/src/documentation/mod.rs
  • avro/src/documentation/serde_data_model_to_avro.rs
  • avro/src/error.rs
  • avro/src/lib.rs
  • avro/src/reader/block.rs
  • avro/src/reader/datum.rs
  • avro/src/reader/mod.rs
  • avro/src/reader/single_object.rs
  • avro/src/schema/mod.rs
  • avro/src/schema/union.rs
  • avro/src/serde/derive.rs
  • avro/src/serde/deser_schema/block.rs
  • avro/src/serde/deser_schema/enums.rs
  • avro/src/serde/deser_schema/identifier.rs
  • avro/src/serde/deser_schema/mod.rs
  • avro/src/serde/deser_schema/record.rs
  • avro/src/serde/deser_schema/tuple.rs
  • avro/src/serde/mod.rs
  • avro/src/serde/ser_schema/block.rs
  • avro/src/serde/ser_schema/mod.rs
  • avro/src/serde/ser_schema/record/field_default.rs
  • avro/src/serde/ser_schema/record/mod.rs
  • avro/src/serde/ser_schema/tuple.rs
  • avro/src/serde/ser_schema/union.rs
  • avro/src/serde/with.rs
  • avro/src/writer/datum.rs
  • avro/src/writer/mod.rs
  • avro/src/writer/single_object.rs
  • avro/tests/avro-rs-226.rs
  • avro/tests/avro-rs-285-bytes_deserialization.rs
  • avro/tests/schema.rs
  • avro/tests/serde_human_readable_false.rs
  • avro/tests/serde_human_readable_true.rs
  • avro/tests/union_schema.rs
  • avro_derive/src/attributes/mod.rs
  • avro_derive/tests/derive.proptest-regressions
  • avro_derive/tests/derive.rs
  • avro_derive/tests/serde.rs
  • licenserc.toml

Comment on lines +56 to +57
//! - **Note:** Tuple structs with 0 or 1 elements will also be (de)serialized as a [`Schema::Record`]. This
//! is different from normal tuples`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor typo: stray backtick before period.

Line 57 has "normal tuples`." with a backtick before the period that appears unintentional.

Suggested fix
-//!     - **Note:** Tuple structs with 0 or 1 elements will also be (de)serialized as a [`Schema::Record`]. This
-//!       is different from normal tuples`.
+//!     - **Note:** Tuple structs with 0 or 1 elements will also be (de)serialized as a [`Schema::Record`]. This
+//!       is different from normal tuples.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
//! - **Note:** Tuple structs with 0 or 1 elements will also be (de)serialized as a [`Schema::Record`]. This
//! is different from normal tuples`.
//! - **Note:** Tuple structs with 0 or 1 elements will also be (de)serialized as a [`Schema::Record`]. This
//! is different from normal tuples.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/documentation/serde_data_model_to_avro.rs` around lines 56 - 57, The
doc comment in serde_data_model_to_avro.rs contains a stray backtick in the
sentence "This is different from normal tuples`."—remove the stray backtick so
the sentence ends "normal tuples." (Locate the comment block in
serde_data_model_to_avro.rs around the tuple-struct note and edit the text
accordingly.)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The CodeRabbit AI reviewer is correct! There is a trailing backtick in the sentence and it may break the rendering of the rustdoc. It should be removed.

Comment on lines +232 to +234
let item = if read_schema.is_some() {
// TODO: Implement SchemaAwareResolvingDeserializer
panic!("Schema aware deserialisation does not resolve schemas yet");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace panic! with a proper error return.

Using panic! in library code for an unimplemented feature is problematic — it crashes the application without giving callers a chance to handle the situation gracefully. Return an error instead.

🛡️ Proposed fix
-        let item = if read_schema.is_some() {
-            // TODO: Implement SchemaAwareResolvingDeserializer
-            panic!("Schema aware deserialisation does not resolve schemas yet");
-        } else {
+        let item = if let Some(_read_schema) = read_schema {
+            return Err(crate::error::Details::SchemaResolutionNotImplemented.into());
+        } else {

You'll need to add the corresponding error variant to Details:

#[error("Schema-aware deserialization does not support schema resolution yet")]
SchemaResolutionNotImplemented,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/reader/block.rs` around lines 232 - 234, Replace the panic! in the
branch that checks read_schema (the SchemaAwareResolvingDeserializer TODO) with
returning a proper error result instead; add a new Details enum variant like
SchemaResolutionNotImplemented with #[error("Schema-aware deserialization does
not support schema resolution yet")] and then change the code in block.rs (the
branch that currently calls panic!) to return
Err(Details::SchemaResolutionNotImplemented.into()) (or construct the crate's
error type expected by the surrounding function) so callers can handle the
unimplemented feature gracefully.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The new serde deserializer does not support reader schema and this is explained in a TODO. But instead of panicking and crashing the application it would be better to return an Err.

Comment on lines +145 to +160
pub fn read_deser<T: DeserializeOwned>(&self, reader: &mut impl Read) -> AvroResult<T> {
// `reader` is `impl Read` instead of a generic on the function like T so it's easier to
// specify the type wanted (`read_deser<String>` vs `read_deser<String, _>`)
if let Some((_, _)) = &self.reader {
// TODO: Implement SchemaAwareResolvingDeserializer
panic!("Schema aware deserialisation does not resolve schemas yet");
} else {
T::deserialize(SchemaAwareDeserializer::new(
reader,
self.writer,
Config {
names: self.resolved.get_names(),
human_readable: self.human_readable,
},
)?)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Only reject actual schema adaptation here.

This panics whenever a reader schema was configured, even in the no-op case where the reader schema is equivalent to the writer schema. That makes read_deser stricter than read_value and turns an AvroResult<T> API into a panic path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/reader/datum.rs` around lines 145 - 160, read_deser currently panics
any time a reader schema is present; change it to only error when the reader
schema actually differs from the writer schema. In read_deser, inspect
self.reader (the Some((reader_schema, _)) tuple) and compare reader_schema to
self.writer; if they are identical, proceed with the same
SchemaAwareDeserializer path you use in the else branch (using Config { names:
self.resolved.get_names(), human_readable: self.human_readable }), otherwise
return an AvroResult::Err (e.g. an AvroError indicating schema resolution is not
implemented) instead of unconditionally panicking; references: read_deser,
self.reader, self.writer, SchemaAwareDeserializer, self.resolved.get_names().

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The new serde deserializer does not support reader schema and this is explained in a TODO. But instead of panicking and crashing the application it would be better to return an Err.

Comment on lines +133 to +140
fn read_next_deser<T: DeserializeOwned>(&mut self) -> AvroResult<Option<T>> {
// TODO: Implement SchemaAwareResolvingDeserializer
assert!(
!self.should_resolve_schema,
"Schema aware deserialisation does not resolve schemas yet"
);

self.block.read_next_deser(self.reader_schema)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return an Err here instead of panicking.

ReaderDeser promises Iterator<Item = AvroResult<T>>, but this assert! breaks that contract and panics on an unsupported configuration instead of surfacing the failure as the iterator's first error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/reader/mod.rs` around lines 133 - 140, Replace the panic assert in
read_next_deser with returning an AvroResult::Err so the iterator yields an
error instead of panicking: check the should_resolve_schema flag in fn
read_next_deser and if true return an appropriate Avro error (e.g., a variant
indicating schema-resolving deserialization is not implemented or unsupported)
wrapped in AvroResult::Err; otherwise proceed to call
self.block.read_next_deser(self.reader_schema). Ensure the returned error type
matches the crate's AvroError/AvroResult conventions so callers receive a
recoverable Err instead of a panic.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The new serde deserializer does not support reader schema and this is explained in a TODO. But instead of panicking and crashing the application it would be better to return an Err.

Comment on lines +809 to +815
let t_schema = T::get_schema_in_ctxt(named_schemas, enclosing_namespace);
let name = Name::new_with_enclosing_namespace(
format!("A{N}_{}", t_schema.unique_normalized_name()),
enclosing_namespace,
)
.expect("Name is valid");
if named_schemas.contains(&name) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reserve a collision-proof name for synthesized tuple/array records.

A{N}_... and T{len}_... are created in the caller's namespace and checked against named_schemas like ordinary user types. If a real record already has that name, this code will silently emit a Schema::Ref to the existing type instead of the generated helper record.

Also applies to: 919-924

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/derive.rs` around lines 809 - 815, The synthesized helper
record names (created where T::get_schema_in_ctxt is used and via
Name::new_with_enclosing_namespace) can collide with user-defined types because
they live in the caller's namespace and are directly checked against
named_schemas; change the name-generation to guarantee uniqueness by either: (a)
creating the helper names in a dedicated internal/synthetic namespace instead of
enclosing_namespace, or (b) loop-appending a deterministic collision suffix
(e.g., incrementing counter or short hash/UUID) until
Name::new_with_enclosing_namespace returns a name not present in named_schemas;
apply this change to both the tuple/array helper generation sites (the block
using format!("A{N}_{}", t_schema.unique_normalized_name()) and the analogous
T{len} generation) and ensure subsequent code emits the newly-reserved name
rather than producing a Schema::Ref to an existing user type.

Comment on lines +83 to +92
(
Value::String(s),
Schema::Fixed(fixed)
| Schema::Decimal(DecimalSchema {
inner: InnerDecimalSchema::Fixed(fixed),
..
})
| Schema::Uuid(UuidSchema::Fixed(fixed))
| Schema::Duration(fixed),
) => s.len() == fixed.size,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

According to the Apache Avro specification, how are record field defaults for bytesandfixeddecoded from JSON strings, and aredate logical-type defaults represented as JSON numbers rather than strings?

💡 Result:

According to the Apache Avro specification, record field defaults for bytes and fixed types are provided as JSON strings. These strings use Unicode code points 0-255 mapped directly to unsigned 8-bit byte values 0-255 (e.g., "\u00FF" for the byte 255). When decoding the schema from JSON, these strings are interpreted as the corresponding byte sequences or fixed values. For date logical types (which annotate an underlying int type representing days since the Unix epoch), defaults are represented as JSON numbers (integers), following the general rule for int/long field defaults, rather than strings. Examples from schema usage confirm defaults like 1502250227187 for timestamp logical types (on long) and similar integer defaults for date on int. The specification's field default table lists int,long as JSON integer, with no special override for logical types, which are schema attributes ignored during default value parsing.

Citations:


🏁 Script executed:

cat -n avro/src/serde/ser_schema/record/field_default.rs | sed -n '75,100p'

Repository: martin-augment/avro-rs

Length of output: 1441


🏁 Script executed:

cat -n avro/src/serde/ser_schema/record/field_default.rs | sed -n '145,170p'

Repository: martin-augment/avro-rs

Length of output: 1381


Decode Avro byte defaults by code point, not UTF-8.

For bytes/fixed defaults, Avro maps JSON string code points 0-255 to raw bytes 0-255. The code at line 92 uses s.len() to validate size, which counts UTF-8 bytes rather than code points—so "\u00FF" (1 code point, 2 UTF-8 bytes) would fail validation. Similarly, line 160 uses s.as_bytes() to serialize, which produces UTF-8 bytes instead of the raw byte values specified by Avro.

Line 159 also incorrectly includes Schema::Date in the byte-string pattern. Per the Avro specification, date logical-type defaults are JSON numbers (integers), not strings, and should not be handled by string-based serialization.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/record/field_default.rs` around lines 83 - 92, The
match on Value::String for byte-like schemas (the arm matching Schema::Fixed,
Decimal with InnerDecimalSchema::Fixed, Uuid::Fixed, and Duration) should
validate length by Unicode code points (s.chars().count()) not UTF-8 bytes, and
the serialization should convert each char to a raw byte by mapping char as u32
& 0xFF into a Vec<u8> (instead of using s.as_bytes()); also remove Schema::Date
from the string-handling pattern so date logical types are not treated as
byte-strings. Locate and update the match arm(s) in field_default.rs where
Value::String(s) is validated and serialized (the fixed/decimal/uuid/duration
string branches) to use chars().count() for length checks and iterate chars to
produce bytes, and drop Schema::Date from that pattern.

Comment on lines +235 to +262
impl<'s, 'w, W: Write, S: Borrow<Schema>> SerializeTuple for OneTupleSerializer<'s, 'w, W, S> {
type Ok = usize;
type Error = Error;

fn serialize_element<T>(&mut self, value: &T) -> Result<(), Self::Error>
where
T: ?Sized + Serialize,
{
match self.schema {
Schema::Union(union) => {
self.bytes_written +=
value.serialize(UnionSerializer::new(self.writer, union, self.config))?;
}
schema => {
self.bytes_written += value.serialize(SchemaAwareSerializer::new(
self.writer,
schema,
self.config,
)?)?;
}
}
Ok(())
}

fn end(self) -> Result<Self::Ok, Self::Error> {
Ok(self.bytes_written)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

OneTupleSerializer doesn't validate that exactly one element was serialized.

Unlike ManyTupleSerializer which validates element count in end(), OneTupleSerializer::end() unconditionally returns success. If zero or more than one element is serialized, this could lead to incorrect Avro output without error.

💡 Proposed fix
 struct OneTupleSerializer<'s, 'w, W: Write, S: Borrow<Schema>> {
     writer: &'w mut W,
     schema: &'s Schema,
     config: Config<'s, S>,
     bytes_written: usize,
+    element_written: bool,
 }

 impl<'s, 'w, W: Write, S: Borrow<Schema>> OneTupleSerializer<'s, 'w, W, S> {
     pub fn new(
         writer: &'w mut W,
         schema: &'s Schema,
         config: Config<'s, S>,
         bytes_written: Option<usize>,
     ) -> Self {
         Self {
             writer,
             schema,
             config,
             bytes_written: bytes_written.unwrap_or(0),
+            element_written: false,
         }
     }
 }

 impl<'s, 'w, W: Write, S: Borrow<Schema>> SerializeTuple for OneTupleSerializer<'s, 'w, W, S> {
     // ...
     fn serialize_element<T>(&mut self, value: &T) -> Result<(), Self::Error>
     where
         T: ?Sized + Serialize,
     {
+        if self.element_written {
+            return Err(Error::new(Details::SerializeTupleTooManyElements {
+                expected: 1,
+                actual: 2,
+            }));
+        }
         match self.schema {
             // ... existing code ...
         }
+        self.element_written = true;
         Ok(())
     }

     fn end(self) -> Result<Self::Ok, Self::Error> {
+        if !self.element_written {
+            return Err(Error::new(Details::SerializeTupleMissingElements {
+                position: 0,
+                total_elements: 1,
+            }));
+        }
         Ok(self.bytes_written)
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/tuple.rs` around lines 235 - 262,
OneTupleSerializer currently doesn't enforce that exactly one element was
serialized; add a counter field (e.g., elements_serialized: usize) to
OneTupleSerializer, increment it in serialize_element (alongside updating
bytes_written), and change end() to validate that elements_serialized == 1
returning an appropriate Err(Error::... ) when it's 0 or >1 (mirroring the
validation logic used by ManyTupleSerializer) so invalid tuple lengths produce
an error instead of silently succeeding.

Comment on lines +282 to +297
impl<'s, 'w, W: Write, S: Borrow<Schema>> SerializeTuple for OneUnionTupleSerializer<'s, 'w, W, S> {
type Ok = usize;
type Error = Error;

fn serialize_element<T>(&mut self, value: &T) -> Result<(), Self::Error>
where
T: ?Sized + Serialize,
{
self.bytes_written +=
value.serialize(UnionSerializer::new(self.writer, self.union, self.config))?;
Ok(())
}

fn end(self) -> Result<Self::Ok, Self::Error> {
Ok(self.bytes_written)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same issue: OneUnionTupleSerializer doesn't validate element count.

Similar to OneTupleSerializer, this should validate that exactly one element is serialized.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/tuple.rs` around lines 282 - 297, The
OneUnionTupleSerializer SerializeTuple impl currently doesn't enforce exactly
one element; add an element counter field (e.g., elements_written or
element_count) to OneUnionTupleSerializer, increment it inside serialize_element
(alongside bytes_written), and return an appropriate Error if serialize_element
is called when the counter is already 1 (prevent >1 elements). In end(),
validate the counter is exactly 1 and return an Error if it's 0; update
serialize_element and end to use this counter to enforce the single-element
requirement while keeping bytes_written logic intact.

Comment on lines +258 to +310
fn serialize_bytes(mut self, v: &[u8]) -> Result<Self::Ok, Self::Error> {
let (index, with_len) = match SER_BYTES_TYPE.get() {
BytesType::Bytes => {
if let Some(index) = self.union.index_of_schema_kind(SchemaKind::Bytes) {
(index, true)
} else {
return Err(self.error("bytes", "Expected Schema::Bytes in variants"));
}
}
BytesType::Fixed => {
if let Some((index, _)) = self
.union
.find_fixed_of_size_n(v.len(), self.config.names)?
{
(index, false)
} else {
return Err(self.error(
"bytes",
format!("Expected Schema::Fixed(size: {}) in variants", v.len()),
));
}
}
BytesType::Unset => {
let bytes_index = self.union.index_of_schema_kind(SchemaKind::Bytes);
let fixed_index = self
.union
.find_fixed_of_size_n(v.len(), self.config.names)?;
// Find the first variant that matches the bytes or fixed
match (bytes_index, fixed_index) {
(Some(bytes_index), Some((fixed_index, _))) => {
(bytes_index.min(fixed_index), bytes_index < fixed_index)
}
(Some(bytes_index), None) => (bytes_index, true),
(None, Some((fixed_index, _))) => (fixed_index, false),
(None, None) => {
return Err(self.error(
"bytes",
format!(
"Expected Schema::Bytes | Schema::Fixed(size: {}) in variants",
v.len()
),
));
}
}
}
};
let mut bytes_written = zig_i32(index as i32, &mut *self.writer)?;
if with_len {
bytes_written += self.write_bytes_with_len(v)?;
} else {
bytes_written += self.write_bytes(v)?;
}
Ok(bytes_written)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Schema::BigDecimal union members are unreachable from serialize_bytes.

This selector only matches SchemaKind::Bytes or fixed-size branches. A union containing Schema::BigDecimal therefore never selects its valid branch on the bytes path, so the new big-decimal adapters will still fail once they serialize into a non-null union. Please treat big-decimal as a bytes-backed union member here as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/union.rs` around lines 258 - 310, The
serialize_bytes function currently only treats SchemaKind::Bytes and fixed-size
Schema::Fixed as byte-backed union variants, so Schema::BigDecimal variants are
ignored; update the logic in serialize_bytes (including the BytesType::Bytes
branch, the BytesType::Unset matching where bytes_index is computed, and any
error messages) to also consider SchemaKind::BigDecimal as a bytes-backed
variant (e.g., use union.index_of_schema_kind for both SchemaKind::Bytes and
SchemaKind::BigDecimal or add a helper that checks either kind) so BigDecimal
union members are selected and handled with with_len semantics the same as
bytes.

Comment on lines +546 to +550
pub fn get_record_fields_in_ctxt(
_: &mut HashSet<Name>,
_: NamespaceRef,
) -> Option<Vec<RecordField>> {
None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Current helper signatures in avro/src/serde/with.rs"
rg -n -C1 'pub fn get_record_fields_in_ctxt\s*\(' avro/src/serde/with.rs

echo
echo "Call sites / macro references outside avro/src/serde/with.rs"
rg -n -C2 '\bget_record_fields_in_ctxt\s*\(' -g '!avro/src/serde/with.rs'

Repository: martin-augment/avro-rs

Length of output: 19287


Update new helper signatures to match existing get_record_fields_in_ctxt ABI.

The new bigdecimal, bigdecimal_opt, array, and array_opt helpers at lines 546, 619, 695, and 771 drop the leading usize parameter compared to all existing helpers in the file. This breaks the public API contract: all prior helpers expect (_: usize, _: &mut HashSet<Name>, _: NamespaceRef), but the new ones only expose (_: &mut HashSet<Name>, _: NamespaceRef). The macro generates 2-argument calls, so the new signature technically works with the current derive macro, but it makes these new modules incompatible with the established pattern and any code expecting the historical ABI. For consistency and backward compatibility, add the usize parameter to all new helpers, even if unused.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/with.rs` around lines 546 - 550, The new helpers bigdecimal,
bigdecimal_opt, array, and array_opt currently define get_record_fields_in_ctxt
without the leading usize parameter, breaking the ABI; update each helper's
get_record_fields_in_ctxt signature to accept the unused leading usize parameter
(i.e. change fn get_record_fields_in_ctxt(_: &mut HashSet<Name>, _:
NamespaceRef) -> Option<Vec<RecordField>> to fn get_record_fields_in_ctxt(_:
usize, _: &mut HashSet<Name>, _: NamespaceRef) -> Option<Vec<RecordField>>) so
they match the existing helpers' signature and remain backward compatible with
callers and the macro-generated 3-argument form.

@martin-augment
Copy link
Copy Markdown
Owner Author

1. Unsafe cast in deserialize_option

In deser_schema/mod.rs, the option handling does:

let index = zag_i32(self.reader)?;
let schema = union.get_variant(index as usize)?;

Every other place in this PR uses usize::try_from(index).map_err(...) before the bounds check. A negative i32 cast to usize wraps to a huge number on 64-bit; get_variant will still catch it, but the error will show a nonsensical large index rather than a clear "negative index" message. Be consistent with the rest of the codebase.

value:useful; category:bug; feedback: The Claude AI reviewer is correct! Any casting from a wider integer type to a narrower one should be done by using the checked APIs and report Err in case the value does not fit in the target type. Prevents silently overflowing the value and using a totally wrong value for any following operations with that number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants