Skip to content

512: feat!: Support enums and tuples in SchemaAwareSerializer and implement SchemaAwareDeserializer#64

Closed
martin-augment wants to merge 9 commits intomainfrom
pr-512-2026-03-22-11-16-41
Closed

512: feat!: Support enums and tuples in SchemaAwareSerializer and implement SchemaAwareDeserializer#64
martin-augment wants to merge 9 commits intomainfrom
pr-512-2026-03-22-11-16-41

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

512: To review by AI

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 22, 2026

Walkthrough

This pull request introduces schema-aware serialization and deserialization infrastructure throughout the codebase. New SchemaAwareSerializer and SchemaAwareDeserializer types enable direct serde conversion between Rust types and Avro binary format using schema information. Reader and writer APIs are extended with configurable human_readable and target_block_size parameters. Array and tuple schemas are redefined using record-like representations instead of array or fixed representations. New documentation modules describe mappings between Avro and Serde data models. Error types are refined with schema-specific variants. Tests are updated to use the new deserialization APIs and helpers instead of Value-based intermediate conversion flows.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pr-512-2026-03-22-11-16-41

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can generate a title for your PR based on the changes with custom instructions.

Set the reviews.auto_title_instructions setting to generate a title for your PR based on the changes in the PR with custom instructions.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust schema-aware deserialization mechanism and significantly expands the serialization capabilities to support more complex Rust types like enums and tuples. These changes streamline the process of converting between Rust data structures and Avro's binary format, offering greater flexibility and more informative error messages. The updates also include new documentation to clarify data model mappings and refactorings across reader and writer components to integrate the new serialization/deserialization logic.

Highlights

  • Schema-Aware Deserialization: Implemented a new SchemaAwareDeserializer to directly deserialize Avro data into Rust types, significantly enhancing the deserialization process.
  • Expanded Serialization Support: The SchemaAwareSerializer now supports a wider range of Rust types, including enums and tuples, improving compatibility with complex data structures.
  • Improved Schema Derivation: Updated the AvroSchemaComponent trait to correctly derive schemas for std::time::Duration, fixed-size arrays ([T; N]), and tuples, aligning them more closely with Avro's data model.
  • Enhanced Reader/Writer APIs: Refactored GenericSingleObjectReader, Reader, and GenericDatumReader to leverage the new schema-aware deserialization, and updated GenericDatumWriter and Writer to use the enhanced SchemaAwareSerializer.
  • New Documentation: Added comprehensive documentation detailing the mapping between the Avro data model and the Serde data model, and vice-versa, to guide users in manual schema mapping.
  • Refined Error Handling: Introduced more specific error variants for serialization and deserialization failures, providing clearer diagnostics for issues like missing record fields or invalid enum indices.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

&Schema::String,
self.config,
)?)?;
Ok(())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffered map serializer writes keys to wrong destination

High Severity

In BufferedBlockSerializer's SerializeMap implementation, serialize_key writes the key to &mut self.writer (the actual output writer) instead of &mut self.buffer. Meanwhile, serialize_value calls self.serialize_element(value) which correctly writes to &mut self.buffer. This means map keys bypass the buffering mechanism entirely — they go directly to the output stream outside any block — while values are buffered. The resulting Avro output is corrupt because the block header only accounts for value bytes, and keys aren't tracked in bytes_written or items_in_buffer.

Fix in Cursor Fix in Web

);

self.block.read_next_deser(self.reader_schema)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reader passes reader_schema unconditionally causing spurious panic

Medium Severity

Reader::read_next_deser passes self.reader_schema directly to self.block.read_next_deser(), but Block::read_next_deser panics when read_schema.is_some(). Unlike read_next, which conditionally passes None when should_resolve_schema is false, read_next_deser always passes the raw reader_schema. This causes an unnecessary panic when a user provides a reader_schema that happens to equal the writer_schema.

Fix in Cursor Fix in Web

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Mar 22, 2026

🤖 Augment PR Summary

Summary: This PR adds schema-aware Serde-based Avro binary (de)serialization, expanding support for enums, tuples, and fixed-size arrays.

Changes:

  • Implemented SchemaAwareDeserializer (records/unions/enums/arrays/maps/tuples) and integrated it into GenericDatumReader, Reader::into_deser_iter, and single-object readers.
  • Updated schema-aware serialization path in writers, adding human_readable and block sizing controls and widening supported Serde shapes.
  • Extended AvroSchemaComponent derivations: tuples and [T; N] now map to derived record schemas with deterministic, unique names.
  • Added union-schema lookup helpers plus Schema::unique_normalized_name() to support stable generated names.
  • Added Serde with helpers (array, bigdecimal, and optional variants) and new rustdoc docs describing Avro↔Serde data-model mapping.
  • Refreshed tests to use the new schema-aware datum reader/writer APIs and updated licensing excludes for proptest regression seeds.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 4 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

T: ?Sized + Serialize,
{
key.serialize(SchemaAwareSerializer::new(
&mut self.writer,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BufferedBlockSerializer’s SerializeMap::serialize_key writes keys directly to the underlying writer instead of buffering them, so the block byte-size header won’t include key bytes and the encoded map will be corrupted. Keys need to be serialized into self.buffer and only flushed via write_block().

Severity: high

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

/// This will resolve the read schema if it is a reference.
fn with_union(self, schema: &'s UnionSchema) -> Result<Self, Error> {
let index = zag_i32(self.reader)?;
let variant = schema.get_variant(index as usize)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_union casts the decoded union index from i32 to usize (index as usize), so corrupted data producing a negative index will wrap and yield confusing out-of-bounds behavior. Consider validating with usize::try_from(index) (as done for enums) here and in deserialize_option.

Severity: medium

Other Locations
  • avro/src/serde/deser_schema/mod.rs:532

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

"Schema aware deserialisation does not resolve schemas yet"
);

self.block.read_next_deser(self.reader_schema)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reader::read_next_deser passes self.reader_schema into Block::read_next_deser, but Block::read_next_deser panics whenever read_schema.is_some(), so into_deser_iter() will panic even when reader & writer schemas are identical. Consider passing None when !should_resolve_schema (or relaxing the panic condition) to match the doc comment.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

where
T: ?Sized + Serialize,
{
let schema = &self.schema.fields[self.field_position].schema;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ManyTupleSerializer::serialize_element indexes self.schema.fields[self.field_position] without a bounds check, so serializing more elements than expected will panic instead of returning an Error. Consider returning an error when field_position >= fields.len() to avoid panics from malformed Serialize impls.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

@claude
Copy link
Copy Markdown

claude bot commented Mar 22, 2026


PR Review: 512 — Support enums/tuples in SchemaAwareSerializer + implement SchemaAwareDeserializer

This is a large, well-structured PR. The core idea — adding a schema-aware deserializer that reads binary Avro data directly into Rust types without materializing the intermediate Value representation — is the right direction. The code is generally clean and well-documented.

Below are my findings grouped by severity.


Breaking Changes (no migration path provided)

1. std::time::Duration schema changed

The AvroSchemaComponent impl for core::time::Duration changed from Schema::Duration (a 12-byte fixed Avro type) to a Schema::Record with secs: u64 and nanos: long. Any existing data serialized as Duration will be silently misread as the new format. There is no migration guide or deprecation path.

2. [T; N] schema changed

[T; N] previously derived Schema::Array(element_schema). It now derives a tuple Schema::Record with N fields. This is a silent wire-format break for any existing stored data.

3. GenericSingleObjectReader API change

GenericSingleObjectReader::new(schema) and ::new_with_header_builder(schema, header_builder) are removed. Callers must switch to the builder pattern. This is a breaking API change that should be called out in the changelog.


Panics in public API

4. read_deser and read_next_deser panic instead of returning Err

In reader/datum.rs, read_deser panics with "Schema aware deserialisation does not resolve schemas yet" when a reader schema is configured.

In reader/mod.rs, read_next_deser uses assert! with the same message.

These are exposed public methods. Users who configure a reader schema and call read_deser will get an unrecoverable panic. These should return Err with a clear Details message until the feature is complete.


Error quality regressions

5. SerializeRecordFieldWithSchema.error changed from Box to String

Converting the nested error to a String drops the #[source] chain. Downstream error handlers that walk the error chain (e.g. error.source()) will no longer see the root cause. Consider keeping Box or at least annotating the field with #[source].

6. UUID mismatch error messages are now less actionable

Old: "Expected a string, but got 16 bytes. Did you mean to use Schema::Uuid(UuidSchema::Fixed) or utils::serde_set_human_readable(true)?"

New: "Expected Schema::String | Schema::Uuid(String)"

The old messages guided users toward the fix. The new messages are correct but generic. Consider keeping the actionable hint.


Test regression

7. Schema resolution test converted to an error test

In reader/datum.rs, test_avro_3240_schema_resolution_with_three_levels previously verified that schema resolution worked end-to-end. It is now changed to assert on a decoding error because SchemaAwareResolvingDeserializer is not yet implemented, with a TODO comment.

This silently removes schema resolution coverage. The old read_value + from_value test path should be preserved alongside the new test.


Code quality concerns

8. Err(...)? pattern in union.rs

There are several instances of Err(self.error(...))? which should be return Err(self.error(...)). The ? on an already-Err value works but is misleading.

9. Pointer comparison hack in deser_schema/mod.rs

A static &str is used as a sentinel and compared by pointer address to bypass name checks. This is fragile. Consider using a dedicated *const () sentinel pointer or a newtype.

10. deprecated removed from EnumSymbolIndex

The deprecation note said "This error variant is not generated anymore" since 0.20.0, but it is now regenerated. The since version and the note should be updated.


Edge cases / subtle behavior

11. [T; 0] maps to Schema::Null, [T; 1] maps to the element schema

This asymmetry is surprising: [String; 0] maps to Schema::Null, [String; 1] maps to Schema::String (same as bare String), [String; 2] maps to Schema::Record. A type wrapped in a 1-element array is indistinguishable from the plain type at the schema level. Consider whether always generating a Record (even for N=1) would be safer.

12. deserialize_option only handles 2-variant nullable unions

Multi-variant nullable unions (e.g. ["null", "string", "int"]) will return an error for Option. This is likely intentional but should be documented.


Positive observations

  • The overall architecture of SchemaAwareDeserializer is clean and follows Serde idioms well.
  • The new documentation modules (avro_data_model_to_serde.rs, serde_data_model_to_avro.rs) are excellent and long overdue.
  • The builder refactor for GenericSingleObjectReader improves consistency with the rest of the API.
  • unique_normalized_name for generating schema names for tuples/arrays is a clean approach; the collision-freedom test is good.
  • Test coverage for the new array/tuple schema derivation is thorough.

Summary

The feature direction is correct and the implementation quality is good, but there are a few items to address before merging:

  1. The breaking schema changes for Duration and [T; N] need migration documentation or explicit deprecation.
  2. The panicking paths in read_deser/read_next_deser should return errors instead of panicking.
  3. The schema resolution test coverage should be preserved.
  4. The SerializeRecordFieldWithSchema error source chain loss should be fixed.

The rest are suggestions rather than blockers.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements to the apache-avro crate by implementing schema-aware serialization and deserialization using Serde. Key changes include adding new SchemaAwareDeserializer and SchemaAwareSerializer modules, which allow direct serialization and deserialization of Rust types to and from Avro data without intermediate Value conversions. This involved modifying Reader, Writer, GenericDatumReader, GenericDatumWriter, GenericSingleObjectReader, and SpecificSingleObjectReader to incorporate human_readable flags and target_block_size configurations for better control over the serialization process. Additionally, new documentation files (avro_data_model_to_serde.rs and serde_data_model_to_avro.rs) were added to clarify the mapping between Avro and Serde data models. Error messages were refined for better clarity, and the AvroSchemaComponent trait was updated to handle array and tuple types more robustly, including a new unique_normalized_name method for schema naming. The changes also include new serde::with modules for bigdecimal and array types to facilitate custom serialization behavior. The only remaining issue is that the error message for uuid serialization in serde_human_readable_false.rs is less helpful than the previous version, as it no longer suggests specific fixes.

Comment on lines 81 to 83
writer.write(uuid, &mut buffer).unwrap_err().to_string(),
"Failed to serialize value of type bytes using schema Uuid(String): 55e840e29b41d4a7164466554400. Cause: Expected a string, but got 16 bytes. Did you mean to use `Schema::Uuid(UuidSchema::Fixed)` or `utils::serde_set_human_readable(true)`?"
"Failed to serialize value of type `bytes` using Schema::Uuid(String): Expected Schema::Bytes | Schema::Fixed | Schema::BigDecimal | Schema::Decimal | Schema::Uuid(Bytes | Fixed) | Schema::Duration"
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new error message seems less helpful than the old one. The old message provided a specific hint on how to fix the issue (Did you mean to use \Schema::Uuid(UuidSchema::Fixed)` or `utils::serde_set_human_readable(true)`?), while the new one just lists all possible schemas for bytes. Would it be possible to restore or improve the new error message to be more specific for this uuid` case?

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (5)
avro/src/schema/mod.rs (1)

5187-5209: This regression does not exercise unique_normalized_name().

The assertions only prove the four Schema::Ref values are different. The test still passes if unique_normalized_name() returns the same string for all of them, so it will not catch the collision this helper is meant to prevent.

🧪 Suggested assertion change
-        assert_ne!(one, two);
-        assert_ne!(one, three);
-        assert_ne!(one, four);
-        assert_ne!(two, three);
-        assert_ne!(two, four);
-        assert_ne!(three, four);
+        assert_ne!(one.unique_normalized_name(), two.unique_normalized_name());
+        assert_ne!(one.unique_normalized_name(), three.unique_normalized_name());
+        assert_ne!(one.unique_normalized_name(), four.unique_normalized_name());
+        assert_ne!(two.unique_normalized_name(), three.unique_normalized_name());
+        assert_ne!(two.unique_normalized_name(), four.unique_normalized_name());
+        assert_ne!(three.unique_normalized_name(), four.unique_normalized_name());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/schema/mod.rs` around lines 5187 - 5209, The test
avro_rs_512_unique_normalized_name_must_be_unique currently only compares
Schema::Ref instances for inequality but doesn't call the helper it intends to
test; update the test to call unique_normalized_name() on each Schema::Ref (or
otherwise obtain the normalized name for `one`, `two`, `three`, `four`) and
assert those normalized strings are pairwise distinct, e.g. replace or add
assertions so that unique_normalized_name(&one), unique_normalized_name(&two),
unique_normalized_name(&three), and unique_normalized_name(&four) are all not
equal to each other.
avro/src/reader/mod.rs (1)

133-141: Runtime panic for schema resolution is documented but could surprise users.

The assert! on line 135-138 will cause a runtime panic if reader_schema differs from writer schema. While this is documented in the builder (lines 72-75), consider returning an error instead of panicking to provide a better user experience. The TODO suggests this is temporary, so tracking this for future implementation is reasonable.

♻️ Consider returning an error instead of panicking
     fn read_next_deser<T: DeserializeOwned>(&mut self) -> AvroResult<Option<T>> {
-        // TODO: Implement SchemaAwareResolvingDeserializer
-        assert!(
-            !self.should_resolve_schema,
-            "Schema aware deserialisation does not resolve schemas yet"
-        );
+        // TODO: Implement SchemaAwareResolvingDeserializer
+        if self.should_resolve_schema {
+            return Err(crate::Error::new(crate::error::Details::SchemaResolutionNotSupported));
+        }
 
         self.block.read_next_deser(self.reader_schema)
     }

This would require adding a new error variant SchemaResolutionNotSupported to Details.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/reader/mod.rs` around lines 133 - 141, Replace the runtime panic in
read_next_deser by returning a proper error: instead of asserting on
self.should_resolve_schema, return an AvroResult::Err that uses a new
Details::SchemaResolutionNotSupported variant (include a clear message
referencing reader_schema vs writer schema and that schema-aware resolving
deserialiser is not yet implemented). Add the SchemaResolutionNotSupported
variant to the Details error enum and ensure any construction of
AvroError/Details is used consistently where read_next_deser currently calls
self.block.read_next_deser(self.reader_schema).
avro/src/serde/ser_schema/block.rs (1)

276-293: Consider potential integer overflow for large collections.

Lines 279-280 cast items_in_buffer and buffer.len() to i32. For very large collections (>2^31 items or >2GB buffer), this could overflow. While unlikely in practice, consider using i64 for consistency with Avro's long type for large counts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/block.rs` around lines 276 - 293, The write_block
function can overflow by casting items_in_buffer and buffer.len() to i32; change
the two encode_int calls to the i64-aware encoder (e.g., encode_long) and cast
items_in_buffer and buffer.len() to i64 (using - (self.items_in_buffer as i64)
for the negative count) so the block count and byte size use Avro's long type;
ensure the called encoder signature (encode_long) and any related imports are
used and that bytes_written arithmetic still compiles with the widened integer
type.
avro/src/serde/deser_schema/mod.rs (1)

728-734: Consider tracking the TODO for deserialize_ignored_any optimization.

The current implementation reads and discards the data, which works but is inefficient for large ignored values. Adding a Seek bound would allow skipping without reading.

Would you like me to open an issue to track implementing the Seek-based skip optimization?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/deser_schema/mod.rs` around lines 728 - 734, The TODO in
deserialize_ignored_any should be implemented so we skip ignored data via
seeking instead of reading; update the function to require a Seek-capable input
(add a Seek bound on the underlying reader or Self as appropriate) and implement
logic in deserialize_ignored_any (instead of calling deserialize_any) to
compute/sketch the byte length of the ignored value (for fixed, length-prefixed,
or primitive types) and advance the reader with seek (or stream-position
advance) to skip bytes, falling back to the existing deserialize_any path only
when the value length cannot be determined; target the deserialize_ignored_any
function and the reader abstraction used by the deserializer when making this
change.
avro/src/error.rs (1)

577-582: Keep the underlying field error in the source chain.

error: String makes this variant opaque to source(), so the custom Debug implementation below can no longer surface the original serde/Avro failure for a record field. Keeping the nested error as a real source preserves that context without changing the top-level message.

♻️ Suggested change
     SerializeRecordFieldWithSchema {
         field_name: String,
         record_schema: RecordSchema,
-        error: String,
+        #[source]
+        error: Box<Error>,
     },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/error.rs` around lines 577 - 582, The SerializeRecordFieldWithSchema
error variant currently stores error: String which prevents the original nested
error from being exposed via source(); change the variant to hold a boxed error
type (e.g., error: Box<dyn std::error::Error + Send + Sync + 'static> or the
crate's concrete error type) so the nested serde/Avro error is preserved, update
the #[error(...)] message to reference the nested error appropriately (keeping
the human message) and adjust places that construct
SerializeRecordFieldWithSchema to box/convert the underlying error into the new
field; this ensures source() returns the underlying cause while preserving the
existing display text for SerializeRecordFieldWithSchema.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@avro_derive/src/attributes/mod.rs`:
- Around line 343-346: The normalization that sets avro.default =
FieldDefault::Disabled when with != With::Trait must occur before the
skip-serialization validation (the guard that checks serde skip_serializing /
skip_serializing_if) so the skip-serialization rules see the normalized default;
move the block that checks with != With::Trait and mutates avro.default (the
FieldDefault::Trait -> FieldDefault::Disabled logic) to run earlier (i.e.,
before the skip-serialization check) or, alternatively, re-run the
skip-serialization validation immediately after you mutate avro.default so the
check uses the updated value; adjust references to With, FieldDefault and
avro.default accordingly.

In `@avro/src/documentation/avro_data_model_to_serde.rs`:
- Around line 33-50: Update the intra-doc links so they point to the correct
Serde traits and the correct apache_avro helper: change references of
serialize_bytes, serialize_seq, serialize_map to link to serde::Serializer
(these are methods on Serializer) and change deserialize_bytes, deserialize_seq,
deserialize_map to link to serde::Deserializer (these are methods on
Deserializer); also replace the apache_avro::serde::bytes reference for the
`fixed` type with apache_avro::serde::fixed. Locate mentions of
serialize_bytes/serialize_seq/serialize_map,
deserialize_bytes/deserialize_seq/deserialize_map, and apache_avro::serde::bytes
in the doc block and update their targets to serde::Serializer,
serde::Deserializer, and apache_avro::serde::fixed respectively.

In `@avro/src/documentation/serde_data_model_to_avro.rs`:
- Around line 45-62: Replace the broken markdown links `[Enums](##Enums)` in the
documentation comments (appearing in the list items for "unit variant", "newtype
variant", "tuple_variant", and "struct_variant") so they point to the actual
Enums section; update each occurrence to use a correct fragment link (e.g.,
`#enums`) or an intra-doc reference to the Enums section so the "See Enums"
references are navigable.

In `@avro/src/reader/block.rs`:
- Around line 232-245: Replace the panic! in the read_schema.is_some() branch
with a proper AvroResult error return so callers can recover; instead of
panicking in that branch, return Err(AvroError::new(/* message */)) or an
existing variant appropriate for unimplemented schema-resolving (using the
AvroResult and AvroError types) and include a clear message like "Schema aware
deserialisation not implemented: SchemaAwareResolvingDeserializer required" so
the error is propagated from the branch where read_schema is Some(), referencing
the read_schema check, SchemaAwareResolvingDeserializer and the surrounding
T::deserialize call to locate the change.

In `@avro/src/reader/datum.rs`:
- Around line 143-160: The function GenericDatumReader::read_deser currently
panics when self.reader (the reader schema) is Some; instead return an explicit
AvroResult error: replace the panic in read_deser with returning Err(...) that
clearly indicates schema-aware resolving/deserialization is unimplemented (e.g.
an AvroError::SchemaResolutionNotImplemented or similar new variant/message),
and if that error variant doesn't exist add it to the AvroError enum; keep the
rest of the branch using SchemaAwareDeserializer for the None case and reference
self.reader, read_deser, SchemaAwareDeserializer and Config when making the
change.

In `@avro/src/schema/mod.rs`:
- Around line 842-848: The current Schema::unique_normalized_name() collapses
dots and underscores (e.g., a._b and a_.b both become a___b); make the encoding
bijective by using an escape scheme instead of collapsing: first replace every
'_' with "__", then replace every '.' with "._" (or map '.' -> "_."
consistently), then map any non-alphanumeric characters still to '_' only after
escaping, and keep the final Cow::Owned(format!("r{}_{}", name.len(), name))
output; update the block that builds name (the variable named name and the
mapping that currently does .replace('_', "__').chars().map(...)) to implement
this reversible escape so generated names are unique for distinct fullnames.

In `@avro/src/serde/derive.rs`:
- Around line 624-643: The code now constructs a Schema::record for
core::time::Duration (via Name::new_with_enclosing_namespace("Duration", ...)
and Schema::record(...)) but get_record_fields_in_ctxt still returns None for
Duration; update get_record_fields_in_ctxt to return the corresponding record
field descriptions for core::time::Duration (secs as u64 and nanos as i64/long
per existing Schema::Long use) so callers of AvroSchemaComponent that expect
record fields can flatten/reuse Duration; locate logic that matches on type
names or schemas and add a branch matching
Name::new_with_enclosing_namespace("Duration", ...) or Schema::Ref/Record for
Duration to return the two RecordField descriptors consistent with the record
built above.

In `@avro/src/serde/deser_schema/array.rs`:
- Line 35: The field remaining is declared as Option<u32> but Avro array block
counts are encoded as zigzag 64-bit longs; change remaining to Option<i64> (or
Option<usize> with a checked conversion) and update read_block_header to call
zag_i64 instead of zag_i32, then validate non-negative counts and convert to
usize where used. Update all related usages (including the other occurrences
around lines 58-68) to handle i64 -> usize conversion with bounds checks and
propagate the new type through functions that read/consume block counts.

In `@avro/src/serde/deser_schema/map.rs`:
- Around line 33-67: The MapDeserializer currently stores block counts in
remaining as Option<u32> and reads them with zag_i32, but Avro encodes map block
counts as longs; change the remaining field to Option<u64>, update
MapDeserializer::new to call the updated read_block_header and adjust any
usages, and modify read_block_header to use zag_i64() (returning Ok(None) on
zero, handle negative blocks by reading the following zag_i64() bytes, and
return Ok(Some(count.unsigned_abs() as u64))). Also apply the same change where
the other block-header logic appears (the code referenced around lines 146-148),
ensuring all places use Option<u64> and zag_i64().

In `@avro/src/serde/deser_schema/record.rs`:
- Around line 68-87: The field deserialization currently matches Schema::Record
directly and misses cases where the field schema is a Schema::Ref pointing to a
record; update next_value_seed to first resolve any Schema::Ref for
self.schema.fields[self.current_field].schema (same approach used in
UnionVariantAccess::new) and then perform the Schema::Record pattern match and
union_of_records attribute check so the wrapper record is unwrapped and the
inner field schema is used for SchemaAwareDeserializer.

In `@avro/src/serde/ser_schema/block.rs`:
- Around line 339-349: BufferedBlockSerializer::serialize_key currently
serializes map keys into self.writer instead of the in-memory buffer, breaking
buffering semantics; change BufferedBlockSerializer::serialize_key to use
SchemaAwareSerializer::new(&mut self.buffer, &Schema::String, self.config) so
keys are written into the same buffer as values (mirroring DirectBlockSerializer
behavior which uses self.writer only for direct output), ensure the Result
propagation remains the same and return Ok(()) on success.

In `@avro/src/serde/ser_schema/record/field_default.rs`:
- Around line 52-66: The bytes/fixed/default handling in recursive_type_check is
using UTF-8 length/bytes, causing multi-code-point JSON string defaults to be
mis-decoded and allowing string defaults for int-backed types like Schema::Date;
update the arms that accept Value::String for bytes-like schemas (the match arm
in recursive_type_check and the similar arms at the other locations noted) to
decode the JSON string by iterating its Unicode code points and mapping each
code point 0..=255 to a byte (rejecting any code point > 0xFF), and do not
accept string defaults for Schema::Date (ensure Schema::Date only accepts
numeric Value types); use the existing helper used elsewhere in the module for
this code-point-to-byte validation so behavior is consistent.

In `@avro/src/serde/ser_schema/tuple.rs`:
- Around line 141-165: The serialize_element implementation for SerializeTuple
(function serialize_element) reads self.schema.fields[self.field_position]
without checking bounds, which can panic if too many elements are provided; add
a bounds check at the start of serialize_element that returns a new error
variant (e.g. Details::SerializeTupleTooManyElements) when self.field_position
>= self.schema.fields.len(), and then proceed to serialize and increment
field_position as before; also ensure the end(self) behavior stays (it already
errors on too few elements) and add similar validation for OneTupleSerializer
and OneUnionTupleSerializer to validate that exactly one element was provided
(return SerializeTupleTooManyElements if extra serialize_element calls occur and
missing-elements error if none were provided).

In `@avro/src/serde/ser_schema/union.rs`:
- Around line 344-358: serialize_unit_variant can panic by indexing
schema.symbols[variant_index as usize] without bounds checking; modify
serialize_unit_variant to first match the found schema from
self.union.find_named_schema(name, self.config.names) and then check that
variant_index as usize is within schema.symbols.len() before comparing symbols,
returning the same error via self.error when out of bounds or symbol mismatch;
reference serialize_unit_variant, union.find_named_schema, Schema::Enum, and
schema.symbols to locate and implement the bounds check and safe handling.

In `@avro/src/serde/with.rs`:
- Around line 679-682: The generic bound T: DeserializeOwned used in
array::deserialize (and the similar spots around 713-717) is unnecessarily
restrictive; change the trait bound to T: Deserialize<'de> so the function can
borrow during deserialization (matching array_opt::deserialize), and update the
imports to remove DeserializeOwned if unused (ensure the use
serde::de::DeserializeOwned reference is replaced by using
serde::Deserialize<'de> where required). Locate and modify the
array::deserialize signature and any other functions that currently require
DeserializeOwned to instead require Deserialize<'de>, keeping the existing
deserialization-to-Vec<T> then try_into() flow.

---

Nitpick comments:
In `@avro/src/error.rs`:
- Around line 577-582: The SerializeRecordFieldWithSchema error variant
currently stores error: String which prevents the original nested error from
being exposed via source(); change the variant to hold a boxed error type (e.g.,
error: Box<dyn std::error::Error + Send + Sync + 'static> or the crate's
concrete error type) so the nested serde/Avro error is preserved, update the
#[error(...)] message to reference the nested error appropriately (keeping the
human message) and adjust places that construct SerializeRecordFieldWithSchema
to box/convert the underlying error into the new field; this ensures source()
returns the underlying cause while preserving the existing display text for
SerializeRecordFieldWithSchema.

In `@avro/src/reader/mod.rs`:
- Around line 133-141: Replace the runtime panic in read_next_deser by returning
a proper error: instead of asserting on self.should_resolve_schema, return an
AvroResult::Err that uses a new Details::SchemaResolutionNotSupported variant
(include a clear message referencing reader_schema vs writer schema and that
schema-aware resolving deserialiser is not yet implemented). Add the
SchemaResolutionNotSupported variant to the Details error enum and ensure any
construction of AvroError/Details is used consistently where read_next_deser
currently calls self.block.read_next_deser(self.reader_schema).

In `@avro/src/schema/mod.rs`:
- Around line 5187-5209: The test
avro_rs_512_unique_normalized_name_must_be_unique currently only compares
Schema::Ref instances for inequality but doesn't call the helper it intends to
test; update the test to call unique_normalized_name() on each Schema::Ref (or
otherwise obtain the normalized name for `one`, `two`, `three`, `four`) and
assert those normalized strings are pairwise distinct, e.g. replace or add
assertions so that unique_normalized_name(&one), unique_normalized_name(&two),
unique_normalized_name(&three), and unique_normalized_name(&four) are all not
equal to each other.

In `@avro/src/serde/deser_schema/mod.rs`:
- Around line 728-734: The TODO in deserialize_ignored_any should be implemented
so we skip ignored data via seeking instead of reading; update the function to
require a Seek-capable input (add a Seek bound on the underlying reader or Self
as appropriate) and implement logic in deserialize_ignored_any (instead of
calling deserialize_any) to compute/sketch the byte length of the ignored value
(for fixed, length-prefixed, or primitive types) and advance the reader with
seek (or stream-position advance) to skip bytes, falling back to the existing
deserialize_any path only when the value length cannot be determined; target the
deserialize_ignored_any function and the reader abstraction used by the
deserializer when making this change.

In `@avro/src/serde/ser_schema/block.rs`:
- Around line 276-293: The write_block function can overflow by casting
items_in_buffer and buffer.len() to i32; change the two encode_int calls to the
i64-aware encoder (e.g., encode_long) and cast items_in_buffer and buffer.len()
to i64 (using - (self.items_in_buffer as i64) for the negative count) so the
block count and byte size use Avro's long type; ensure the called encoder
signature (encode_long) and any related imports are used and that bytes_written
arithmetic still compiles with the widened integer type.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f8b41cec-97a5-4238-90f7-f9f50ef9d65b

📥 Commits

Reviewing files that changed from the base of the PR and between fbec01b and faeb5ba.

📒 Files selected for processing (43)
  • avro/examples/test_interop_single_object_encoding.rs
  • avro/src/bigdecimal.rs
  • avro/src/documentation/avro_data_model_to_serde.rs
  • avro/src/documentation/mod.rs
  • avro/src/documentation/serde_data_model_to_avro.rs
  • avro/src/error.rs
  • avro/src/lib.rs
  • avro/src/reader/block.rs
  • avro/src/reader/datum.rs
  • avro/src/reader/mod.rs
  • avro/src/reader/single_object.rs
  • avro/src/schema/mod.rs
  • avro/src/schema/union.rs
  • avro/src/serde/derive.rs
  • avro/src/serde/deser_schema/array.rs
  • avro/src/serde/deser_schema/enums.rs
  • avro/src/serde/deser_schema/identifier.rs
  • avro/src/serde/deser_schema/map.rs
  • avro/src/serde/deser_schema/mod.rs
  • avro/src/serde/deser_schema/record.rs
  • avro/src/serde/deser_schema/tuple.rs
  • avro/src/serde/mod.rs
  • avro/src/serde/ser_schema/block.rs
  • avro/src/serde/ser_schema/mod.rs
  • avro/src/serde/ser_schema/record/field_default.rs
  • avro/src/serde/ser_schema/record/mod.rs
  • avro/src/serde/ser_schema/tuple.rs
  • avro/src/serde/ser_schema/union.rs
  • avro/src/serde/with.rs
  • avro/src/writer/datum.rs
  • avro/src/writer/mod.rs
  • avro/src/writer/single_object.rs
  • avro/tests/avro-rs-226.rs
  • avro/tests/avro-rs-285-bytes_deserialization.rs
  • avro/tests/schema.rs
  • avro/tests/serde_human_readable_false.rs
  • avro/tests/serde_human_readable_true.rs
  • avro/tests/union_schema.rs
  • avro_derive/src/attributes/mod.rs
  • avro_derive/tests/derive.proptest-regressions
  • avro_derive/tests/derive.rs
  • avro_derive/tests/serde.rs
  • licenserc.toml

Comment on lines +343 to +346
// TODO: Implement a better way to do this (maybe if user specifies `#[avro(with)]` also use that for the default)
// Disable getting the field default, if the schema is not retrieved from the field type
if with != With::Trait && avro.default == FieldDefault::Trait {
avro.default = FieldDefault::Disabled;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize FieldDefault before validating skip-serialization rules.

with != With::Trait can flip an implicit FieldDefault::Trait into Disabled after the guard on Line 321 has already run. That lets #[serde(skip_serializing)] / #[serde(skip_serializing_if)] pass with #[avro(with)] and no usable default, which breaks the fallback this derive path expects. Please move this normalization above the skip-serialization check, or rerun that check afterwards.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro_derive/src/attributes/mod.rs` around lines 343 - 346, The normalization
that sets avro.default = FieldDefault::Disabled when with != With::Trait must
occur before the skip-serialization validation (the guard that checks serde
skip_serializing / skip_serializing_if) so the skip-serialization rules see the
normalized default; move the block that checks with != With::Trait and mutates
avro.default (the FieldDefault::Trait -> FieldDefault::Disabled logic) to run
earlier (i.e., before the skip-serialization check) or, alternatively, re-run
the skip-serialization validation immediately after you mutate avro.default so
the check uses the updated value; adjust references to With, FieldDefault and
avro.default accordingly.

Comment on lines +33 to +50
//! - `bytes`: [`Vec<u8>`](std::vec::Vec) (or any type that uses [`Serialize::serialize_bytes`](serde::Serialize), [`Deserialize::deserialize_bytes`](serde::Deserialize), [`Deserialize::deserialize_byte_buf`](serde::Deserialize))
//! - It is required to use [`apache_avro::serde::bytes`] as otherwise Serde will (de)serialize a `Vec` as an array of integers instead.
//! - `string`: [`String`] (or any type that uses [`Serialize::serialize_str`](serde::Serialize), [`Deserialize::deserialize_str`](serde::Deserialize), [`Deserialize::deserialize_string`](serde::Deserialize))
//!
//! ## Complex Types
//! - `records`: A struct with the same name and fields or a tuple with the same fields.
//! - Extra fields can be added to the struct if they are marked with `#[serde(skip)]`
//! - `enums`: A enum with the same name and unit variants for every symbol
//! - The index of the symbol must match the index of the variant
//! - `arrays`: [`Vec`] (or any type that uses [`Serialize::serialize_seq`](serde::Serialize), [`Deserialize::deserialize_seq`](serde::Deserialize))
//! - `[T; N]` is (de)serialized as a tuple by Serde, to (de)serialize them as an `array` use [`apache_avro::serde::array`]
//! - `maps`: [`HashMap<String, _>`](std::collections::HashMap) (or any type that uses [`Serialize::serialize_map`](serde::Serialize), [`Deserialize::deserialize_map`](serde::Deserialize))
//! - `unions`: An enum with a variant for each union variant
//! - The index of the union variant must match the enum variant
//! - A `null` can be a unit variant or a newtype variant with a unit type
//! - All other variants must be newtype variants, struct variants, or tuple variants.
//! - `fixed`: [`Vec<u8>`](std::vec::Vec) (or any type that uses [`Serialize::serialize_bytes`](serde::Serialize), [`Deserialize::deserialize_bytes`](serde::Deserialize), [`Deserialize::deserialize_byte_buf`](serde::Deserialize))
//! - It is required to use [`apache_avro::serde::bytes`] as otherwise Serde will (de)serialize a `Vec` as an array of integers instead.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the file in question
cat -n avro/src/documentation/avro_data_model_to_serde.rs | sed -n '33,50p'

Repository: martin-augment/avro-rs

Length of output: 2260


🏁 Script executed:

# Check the full file to understand context
wc -l avro/src/documentation/avro_data_model_to_serde.rs

Repository: martin-augment/avro-rs

Length of output: 119


🏁 Script executed:

# Check if crate::serde module exists and what it contains
fd -t f 'serde' avro/src/ | grep -E '\.(rs)$' | head -20

Repository: martin-augment/avro-rs

Length of output: 167


🏁 Script executed:

# Search for serde::bytes and serde::fixed modules in the codebase
rg 'pub mod (bytes|fixed)' avro/src/

Repository: martin-augment/avro-rs

Length of output: 229


🏁 Script executed:

# Let's check the actual serde module structure to understand the modules
cat avro/src/serde/with.rs | head -100

Repository: martin-augment/avro-rs

Length of output: 3158


🏁 Script executed:

# Check if there's a fixed module and what it does
rg -A 10 'pub mod fixed' avro/src/serde/with.rs

Repository: martin-augment/avro-rs

Length of output: 544


🏁 Script executed:

# Look at the Serde trait definitions to confirm where these methods live
# Let's check if we can find documentation or references to Serializer vs Serialize
web_search
Serde Serialize vs Serializer trait methods

Repository: martin-augment/avro-rs

Length of output: 160


🌐 Web query:

Serde Serializer trait methods serialize_bytes serialize_seq serialize_map documentation

💡 Result:

The Serde Serializer trait in Rust defines methods for serializing data structures into various formats. Here is documentation for the requested methods, sourced from official Serde documentation. serialize_bytes(self, v: &[u8]) -> Result<Self::Ok, Self::Error> Serializes a chunk of raw byte data. This enables serializers to handle byte slices more compactly or efficiently than regular sequences. If no efficient implementation exists, it can forward to serialize_seq: let mut seq = self.serialize_seq(Some(v.len))?; for b in v { seq.serialize_element(b)?; } seq.end serialize_seq(self, len: Option) -> Result<Self::SerializeSeq, Self::Error> Begins serialization of a seq (variably sized heterogeneous sequence like Vec or HashSet). Returns a SerializeSeq instance. Example usage for Vec: let mut seq = serializer.serialize_seq(Some(self.len))?; for element in self { seq.serialize_element(element)?; } seq.end SerializeSeq trait (returned by serialize_seq): - serialize_element(&mut self, value: &T) -> Result<, Self::Error> where T: ?Sized + Serialize: Serializes one sequence element. - end(self) -> Result<Self::Ok, Self::Error>: Finishes serializing the sequence. serialize_map(self, len: Option) -> Result<Self::SerializeMap, Self::Error> Begins serialization of a map (heterogeneous key-value pairs like BTreeMap). Returns a SerializeMap instance. Example usage for HashMap<K, V>: let mut map = serializer.serialize_map(Some(self.len))?; for (k, v) in self { map.serialize_entry(k, v)?; } map.end SerializeMap trait (returned by serialize_map): - serialize_key(&mut self, key: &T) -> Result<, Self::Error> where T: ?Sized + Serialize: Serializes a map key. - serialize_value(&mut self, value: &T) -> Result<, Self::Error> where T: ?Sized + Serialize: Serializes a map value. - serialize_entry(&mut self, key: &K, value: &V) -> Result<, Self::Error> where K: ?Sized + Serialize, V: ?Sized + Serialize: Serializes a key-value entry (default impl uses serialize_key then serialize_value; preferred for optimization). - end(self) -> Result<Self::Ok, Self::Error>: Finishes serializing the map. These methods are required in the Serializer trait (serde::ser::Serializer). Serializer implementations must define associated types like type SerializeSeq: SerializeSeq<Ok = Self::Ok, Error = Self::Error>; and similarly for SerializeMap.

Citations:


🌐 Web query:

Serde Deserializer trait methods deserialize_bytes deserialize_seq

💡 Result:

The Serde Deserializer trait (in serde::de) defines methods for deserializing different data types by driving a Visitor. The methods deserialize_bytes and deserialize_seq are required methods of this trait. deserialize_bytes signature: fn deserialize_bytes(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>, Hint that the Deserialize type is expecting a byte array and does not benefit from taking ownership of buffered data. Deserializers should prefer deserialize_bytes over deserialize_byte_buf when deserializing &[u8] to avoid unnecessary allocations. deserialize_seq signature: fn deserialize_seq(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>, Hint that the Deserialize type is expecting a sequence of values (like Vec). The deserializer provides a SeqAccess implementation via the visitor's visit_seq method, allowing iteration over sequence elements using next_element or next_element_seed. These methods consume self (the Deserializer), pass a Visitor expecting the hinted type, and return the visitor's value or an error. They are part of the 29 methods corresponding to Serde's data model types. Implementing a custom deserializer requires providing these (and others) based on the input format, such as parsing JSON arrays for deserialize_seq or byte literals for deserialize_bytes. Examples from docs show usage in Deserialize impls, e.g.: deserializer.deserialize_bytes(SomeVisitor) or deserializer.deserialize_seq(VecVisitor)

Citations:


Intra-doc links target incorrect Serde traits.

Lines 33-50 incorrectly link serialize_bytes, serialize_seq, and serialize_map to serde::Serialize, and deserialize_bytes, deserialize_seq to serde::Deserialize. These methods are defined on serde::Serializer and serde::Deserializer respectively.

Additionally, line 50 directs users to apache_avro::serde::bytes for the fixed type, but should point to apache_avro::serde::fixed instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/documentation/avro_data_model_to_serde.rs` around lines 33 - 50,
Update the intra-doc links so they point to the correct Serde traits and the
correct apache_avro helper: change references of serialize_bytes, serialize_seq,
serialize_map to link to serde::Serializer (these are methods on Serializer) and
change deserialize_bytes, deserialize_seq, deserialize_map to link to
serde::Deserializer (these are methods on Deserializer); also replace the
apache_avro::serde::bytes reference for the `fixed` type with
apache_avro::serde::fixed. Locate mentions of
serialize_bytes/serialize_seq/serialize_map,
deserialize_bytes/deserialize_seq/deserialize_map, and apache_avro::serde::bytes
in the doc block and update their targets to serde::Serializer,
serde::Deserializer, and apache_avro::serde::fixed respectively.

Comment on lines +45 to +62
//! - **unit variant** => See [Enums](##Enums)
//! - **newtype struct** => [`Schema::Record`] with the unqualified name equal to the name of the struct and one field
//! - **newtype variant** => See [Enums](##Enums)
//! - **seq** => [`Schema::Array`]
//! - **tuple**
//! - For tuples with one element, the schema will be the schema of the only element
//! - For tuples with more than one element, the schema will be a [`Schema::Record`] with as many fields as there are elements.
//! The schema must have the attribute `org.apache.avro.rust.tuple` with the value set to `true`.
//! - **Note:** Serde (de)serializes arrays (`[T; N]`) as tuples. To (de)serialize an array as a
//! [`Schema::Array`] use [`apache_avro::serde::array`].
//! - **tuple_struct** => [`Schema::Record`] with the unqualified name equal to the name of the struct and as many fields as there are elements
//! - **Note:** Tuple structs with 0 or 1 elements will also be (de)serialized as a [`Schema::Record`]. This
//! is different from normal tuples`.
//! - **tuple_variant** => See [Enums](##Enums)
//! - **map** => [`Schema::Map`]
//! - **Note:** the key type of the map will be (de)serialized as a [`Schema::String`]
//! - **struct** => [`Schema::Record`]
//! - **struct_variant** => See [Enums](##Enums)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the enum section links.

[Enums](##Enums) produces a broken fragment in rustdoc. These should point at #enums (or use an implicit intra-doc link) so the “See Enums” references remain navigable.

Suggested patch
-//! - **unit variant** => See [Enums](##Enums)
+//! - **unit variant** => See [Enums](`#enums`)
-//! - **newtype variant** => See [Enums](##Enums)
+//! - **newtype variant** => See [Enums](`#enums`)
-//! - **tuple_variant** => See [Enums](##Enums)
+//! - **tuple_variant** => See [Enums](`#enums`)
-//! - **struct_variant** => See [Enums](##Enums)
+//! - **struct_variant** => See [Enums](`#enums`)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/documentation/serde_data_model_to_avro.rs` around lines 45 - 62,
Replace the broken markdown links `[Enums](##Enums)` in the documentation
comments (appearing in the list items for "unit variant", "newtype variant",
"tuple_variant", and "struct_variant") so they point to the actual Enums
section; update each occurrence to use a correct fragment link (e.g., `#enums`)
or an intra-doc reference to the Enums section so the "See Enums" references are
navigable.

Comment on lines +232 to +245
let item = if read_schema.is_some() {
// TODO: Implement SchemaAwareResolvingDeserializer
panic!("Schema aware deserialisation does not resolve schemas yet");
} else {
let config = Config {
names: &self.names_refs,
human_readable: self.human_readable,
};
T::deserialize(SchemaAwareDeserializer::new(
&mut block_bytes,
&self.writer_schema,
config,
)?)?
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return an AvroResult error here instead of panicking.

A caller can legitimately reach this path by configuring a reader schema and then using the typed reader/iterator APIs. The current panic! turns that runtime configuration into process abort instead of a recoverable error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/reader/block.rs` around lines 232 - 245, Replace the panic! in the
read_schema.is_some() branch with a proper AvroResult error return so callers
can recover; instead of panicking in that branch, return Err(AvroError::new(/*
message */)) or an existing variant appropriate for unimplemented
schema-resolving (using the AvroResult and AvroError types) and include a clear
message like "Schema aware deserialisation not implemented:
SchemaAwareResolvingDeserializer required" so the error is propagated from the
branch where read_schema is Some(), referencing the read_schema check,
SchemaAwareResolvingDeserializer and the surrounding T::deserialize call to
locate the change.

Comment on lines +143 to +160
/// # Panics
/// Will panic if a reader schema has been configured, this is a WIP.
pub fn read_deser<T: DeserializeOwned>(&self, reader: &mut impl Read) -> AvroResult<T> {
// `reader` is `impl Read` instead of a generic on the function like T so it's easier to
// specify the type wanted (`read_deser<String>` vs `read_deser<String, _>`)
if let Some((_, _)) = &self.reader {
// TODO: Implement SchemaAwareResolvingDeserializer
panic!("Schema aware deserialisation does not resolve schemas yet");
} else {
T::deserialize(SchemaAwareDeserializer::new(
reader,
self.writer,
Config {
names: self.resolved.get_names(),
human_readable: self.human_readable,
},
)?)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

read_deser should not panic when reader_schema is configured.

The builder already accepts a reader schema, so this makes GenericDatumReader::read_deser crash on a normal runtime configuration. Please return an explicit error until resolving deserialization is implemented, or reject the configuration earlier.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/reader/datum.rs` around lines 143 - 160, The function
GenericDatumReader::read_deser currently panics when self.reader (the reader
schema) is Some; instead return an explicit AvroResult error: replace the panic
in read_deser with returning Err(...) that clearly indicates schema-aware
resolving/deserialization is unimplemented (e.g. an
AvroError::SchemaResolutionNotImplemented or similar new variant/message), and
if that error variant doesn't exist add it to the AvroError enum; keep the rest
of the branch using SchemaAwareDeserializer for the None case and reference
self.reader, read_deser, SchemaAwareDeserializer and Config when making the
change.

Comment on lines +339 to +349
fn serialize_key<T>(&mut self, key: &T) -> Result<(), Self::Error>
where
T: ?Sized + Serialize,
{
key.serialize(SchemaAwareSerializer::new(
&mut self.writer,
&Schema::String,
self.config,
)?)?;
Ok(())
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: serialize_key writes directly to writer instead of buffer.

In BufferedBlockSerializer::serialize_key, the key is written directly to self.writer (line 344) instead of self.buffer. This breaks the buffering semantics for maps - keys will be written immediately while values are buffered, causing malformed output.

Compare with DirectBlockSerializer::serialize_key which correctly writes to self.writer since it doesn't buffer.

🐛 Proposed fix
     fn serialize_key<T>(&mut self, key: &T) -> Result<(), Self::Error>
     where
         T: ?Sized + Serialize,
     {
         key.serialize(SchemaAwareSerializer::new(
-            &mut self.writer,
+            &mut self.buffer,
             &Schema::String,
             self.config,
         )?)?;
         Ok(())
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/block.rs` around lines 339 - 349,
BufferedBlockSerializer::serialize_key currently serializes map keys into
self.writer instead of the in-memory buffer, breaking buffering semantics;
change BufferedBlockSerializer::serialize_key to use
SchemaAwareSerializer::new(&mut self.buffer, &Schema::String, self.config) so
keys are written into the same buffer as values (mirroring DirectBlockSerializer
behavior which uses self.writer only for direct output), ensure the Result
propagation remains the same and return Ok(()) on success.

Comment on lines +52 to +66
fn recursive_type_check(value: &Value, schema: &'s Schema) -> bool {
match (value, schema) {
(Value::Null, Schema::Null)
| (Value::Bool(_), Schema::Boolean)
| (
Value::String(_),
Schema::Bytes
| Schema::String
| Schema::Decimal(DecimalSchema {
inner: InnerDecimalSchema::Bytes,
..
})
| Schema::BigDecimal
| Schema::Uuid(UuidSchema::Bytes | UuidSchema::String),
) => true,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Decode Avro byte defaults by code point, not UTF-8.

Avro encodes bytes/fixed defaults as JSON strings whose code points 0..=255 map 1:1 to raw bytes. s.len() and s.as_bytes() use UTF-8, so defaults like "\u00FF" get mis-sized and serialized as two bytes. This arm also currently accepts string defaults for Schema::Date, which is int-backed.

Fix sketch
+fn avro_default_bytes(s: &str) -> Option<Vec<u8>> {
+    s.chars().map(|ch| u8::try_from(ch as u32).ok()).collect()
+}
+
 ...
-            ) => s.len() == fixed.size,
+            ) => match avro_default_bytes(s) {
+                Some(bytes) => bytes.len() == fixed.size,
+                None => false,
+            },

 ...
-                | Schema::Duration(_)
-                | Schema::Date,
-            ) => serializer.serialize_bytes(s.as_bytes()),
+                | Schema::Duration(_),
+            ) => serializer.serialize_bytes(
+                &avro_default_bytes(s)
+                    .ok_or_else(|| S::Error::custom("byte default contains non-byte code point"))?,
+            ),

Use the same helper in the bytes-backed recursive_type_check arm as well, so code points above U+00FF are rejected consistently.

Also applies to: 83-93, 151-160

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/record/field_default.rs` around lines 52 - 66, The
bytes/fixed/default handling in recursive_type_check is using UTF-8
length/bytes, causing multi-code-point JSON string defaults to be mis-decoded
and allowing string defaults for int-backed types like Schema::Date; update the
arms that accept Value::String for bytes-like schemas (the match arm in
recursive_type_check and the similar arms at the other locations noted) to
decode the JSON string by iterating its Unicode code points and mapping each
code point 0..=255 to a byte (rejecting any code point > 0xFF), and do not
accept string defaults for Schema::Date (ensure Schema::Date only accepts
numeric Value types); use the existing helper used elsewhere in the module for
this code-point-to-byte validation so behavior is consistent.

Comment on lines +141 to +165
fn serialize_element<T>(&mut self, value: &T) -> Result<(), Self::Error>
where
T: ?Sized + Serialize,
{
let schema = &self.schema.fields[self.field_position].schema;
self.bytes_written += value.serialize(SchemaAwareSerializer::new(
self.writer,
schema,
self.config,
)?)?;
self.field_position += 1;
Ok(())
}

fn end(self) -> Result<Self::Ok, Self::Error> {
if self.field_position != self.schema.fields.len() {
Err(Details::SerializeTupleMissingElements {
position: self.field_position,
total_elements: self.schema.fields.len(),
}
.into())
} else {
Ok(self.bytes_written)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing bounds check in serialize_element could cause panic.

Line 145 accesses self.schema.fields[self.field_position] without checking if field_position < fields.len(). If the caller provides more elements than the schema expects, this will panic. Additionally, OneTupleSerializer and OneUnionTupleSerializer don't validate that exactly one element is provided.

🛡️ Proposed fix with bounds check
     fn serialize_element<T>(&mut self, value: &T) -> Result<(), Self::Error>
     where
         T: ?Sized + Serialize,
     {
+        if self.field_position >= self.schema.fields.len() {
+            return Err(Details::SerializeTupleTooManyElements {
+                position: self.field_position,
+                total_elements: self.schema.fields.len(),
+            }
+            .into());
+        }
         let schema = &self.schema.fields[self.field_position].schema;
         self.bytes_written += value.serialize(SchemaAwareSerializer::new(
             self.writer,
             schema,
             self.config,
         )?)?;
         self.field_position += 1;
         Ok(())
     }

This would require adding a new error variant SerializeTupleTooManyElements to Details.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/tuple.rs` around lines 141 - 165, The
serialize_element implementation for SerializeTuple (function serialize_element)
reads self.schema.fields[self.field_position] without checking bounds, which can
panic if too many elements are provided; add a bounds check at the start of
serialize_element that returns a new error variant (e.g.
Details::SerializeTupleTooManyElements) when self.field_position >=
self.schema.fields.len(), and then proceed to serialize and increment
field_position as before; also ensure the end(self) behavior stays (it already
errors on too few elements) and add similar validation for OneTupleSerializer
and OneUnionTupleSerializer to validate that exactly one element was provided
(return SerializeTupleTooManyElements if extra serialize_element calls occur and
missing-elements error if none were provided).

Comment on lines +344 to +358
fn serialize_unit_variant(
self,
name: &'static str,
variant_index: u32,
variant: &'static str,
) -> Result<Self::Ok, Self::Error> {
match self.union.find_named_schema(name, self.config.names)? {
Some((index, Schema::Enum(schema))) if schema.symbols[variant_index as usize] == variant => {
let mut bytes_written = zig_i32(index as i32, &mut *self.writer)?;
bytes_written += zig_i32(variant_index as i32, &mut *self.writer)?;
Ok(bytes_written)
}
_ => Err(self.error("unit variant", format!("Expected Schema::Enum(name: {name}, symbols[{variant_index}] == {variant}) in variants"))),
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential out-of-bounds panic when accessing enum symbols.

Line 351 accesses schema.symbols[variant_index as usize] without bounds checking. While variant_index typically comes from serde-derived code matching the enum definition, a mismatch between the Rust enum and Avro schema could cause a panic.

🛡️ Proposed fix with bounds check
     fn serialize_unit_variant(
         self,
         name: &'static str,
         variant_index: u32,
         variant: &'static str,
     ) -> Result<Self::Ok, Self::Error> {
         match self.union.find_named_schema(name, self.config.names)? {
-            Some((index, Schema::Enum(schema))) if schema.symbols[variant_index as usize] == variant => {
+            Some((index, Schema::Enum(schema))) if schema.symbols.get(variant_index as usize) == Some(&variant.to_string()) => {
                 let mut bytes_written = zig_i32(index as i32, &mut *self.writer)?;
                 bytes_written += zig_i32(variant_index as i32, &mut *self.writer)?;
                 Ok(bytes_written)
             }
             _ => Err(self.error("unit variant", format!("Expected Schema::Enum(name: {name}, symbols[{variant_index}] == {variant}) in variants"))),
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/union.rs` around lines 344 - 358,
serialize_unit_variant can panic by indexing schema.symbols[variant_index as
usize] without bounds checking; modify serialize_unit_variant to first match the
found schema from self.union.find_named_schema(name, self.config.names) and then
check that variant_index as usize is within schema.symbols.len() before
comparing symbols, returning the same error via self.error when out of bounds or
symbol mismatch; reference serialize_unit_variant, union.find_named_schema,
Schema::Enum, and schema.symbols to locate and implement the bounds check and
safe handling.

Comment on lines +679 to +682
use serde::{
Deserialize, Deserializer, Serialize, Serializer,
de::{DeserializeOwned, Error as _},
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n avro/src/serde/with.rs | sed -n '670,730p'

Repository: martin-augment/avro-rs

Length of output: 2310


🏁 Script executed:

cat -n avro/src/serde/with.rs | sed -n '728,820p'

Repository: martin-augment/avro-rs

Length of output: 3538


Use Deserialize<'de> instead of DeserializeOwned.

The array::deserialize function only deserializes into Vec<T> before converting to an array via try_into(), so DeserializeOwned is unnecessarily restrictive and prevents borrowing types. The parallel array_opt::deserialize implementation uses T: Deserialize<'de> with identical deserialization logic, demonstrating the looser bound is sufficient.

🩹 Suggested change
     use serde::{
         Deserialize, Deserializer, Serialize, Serializer,
-        de::{DeserializeOwned, Error as _},
+        de::Error as _,
     };
…
-        T: DeserializeOwned,
+        T: Deserialize<'de>,

Also applies to: 713-717

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/with.rs` around lines 679 - 682, The generic bound T:
DeserializeOwned used in array::deserialize (and the similar spots around
713-717) is unnecessarily restrictive; change the trait bound to T:
Deserialize<'de> so the function can borrow during deserialization (matching
array_opt::deserialize), and update the imports to remove DeserializeOwned if
unused (ensure the use serde::de::DeserializeOwned reference is replaced by
using serde::Deserialize<'de> where required). Locate and modify the
array::deserialize signature and any other functions that currently require
DeserializeOwned to instead require Deserialize<'de>, keeping the existing
deserialization-to-Vec<T> then try_into() flow.

@martin-augment
Copy link
Copy Markdown
Owner Author

Superseded by #65

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants