Skip to content

512: feat!: Support enums and tuples in SchemaAwareSerializer and implement SchemaAwareDeserializer#67

Open
martin-augment wants to merge 15 commits intomainfrom
pr-512-2026-03-31-09-05-44
Open

512: feat!: Support enums and tuples in SchemaAwareSerializer and implement SchemaAwareDeserializer#67
martin-augment wants to merge 15 commits intomainfrom
pr-512-2026-03-31-09-05-44

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

512: To review by AI

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Mar 31, 2026

This pull request is too large for Augment to review. The PR exceeds the maximum size limit of 100000 tokens (approximately 400000 characters) for automated code review. Please consider breaking this PR into smaller, more focused changes.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

Walkthrough

This pull request introduces comprehensive schema-aware serialization and deserialization infrastructure to the Apache Avro crate. New deser_schema and ser_schema modules implement direct Serde-based datum reading and writing with full schema awareness. Documentation modules describe Avro-to-Serde and Serde-to-Avro data model mappings. Configuration options for human-readable encoding and target block sizes are threaded through reader and writer builder APIs. Error types are expanded with new variants for schema-aware operations and better type information. The schema module gains utility methods for union variant lookup and normalized name generation. Serde integration extends with new with helpers for arrays and BigDecimal. Tuple and fixed-size array schema generation in derives now produces record schemas for multi-element cases. Tests throughout are updated to use the new datum-based readers and writers.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pr-512-2026-03-31-09-05-44

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

);

self.block.read_next_deser(self.reader_schema)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panic when reader schema equals writer schema during deserialization

High Severity

read_next_deser passes self.reader_schema to block.read_next_deser, which panics if read_schema.is_some(). However, self.reader_schema can be Some(...) even when self.should_resolve_schema is false — specifically when the reader schema equals the writer schema. The outer assert! passes in this case, but the block unconditionally panics on any Some value, crashing at runtime.

Additional Locations (1)
Fix in Cursor Fix in Web

self.reader,
&record.fields[0].schema,
self.config,
)?)?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deserializer incorrectly unwraps union_of_records at field level

Medium Severity

RecordDeserializer::next_value_seed unwraps a field's Record schema when it has the org.apache.avro.rust.union_of_records attribute and one field, reading only the inner field's data. The serializer's RecordSerializer does NOT perform this unwrapping — it writes all fields of the Record normally. This asymmetry means deserialization would read the wrong bytes if a record field directly (not via a union) uses such a schema.

Fix in Cursor Fix in Web

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a schema-aware Serde serialization and deserialization framework, allowing direct conversion between Rust types and Avro data. It introduces SchemaAwareSerializer and SchemaAwareDeserializer, updates reader and writer APIs with builder patterns, and adds comprehensive documentation for Avro-Serde data model mapping. Review feedback recommends replacing panic! and assert! calls in unimplemented code paths with proper error handling to improve library stability, and identifies minor typos in the documentation.

Comment on lines +232 to +234
let item = if read_schema.is_some() {
// TODO: Implement SchemaAwareResolvingDeserializer
panic!("Schema aware deserialisation does not resolve schemas yet");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This code will panic if read_schema is Some. While there's a TODO comment, merging code with a panic! is generally discouraged as it can lead to unexpected crashes. It would be better to return an Err with a Details::Unsupported variant, or similar, to indicate that this feature is not yet implemented.

Comment on lines +148 to +150
if let Some((_, _)) = &self.reader {
// TODO: Implement SchemaAwareResolvingDeserializer
panic!("Schema aware deserialisation does not resolve schemas yet");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This code will panic if a reader schema is configured. While there's a TODO comment, merging code with a panic! is generally discouraged as it can lead to unexpected crashes. It would be better to return an Err with a Details::Unsupported variant, or similar, to indicate that this feature is not yet implemented.

Comment on lines +135 to +138
assert!(
!self.should_resolve_schema,
"Schema aware deserialisation does not resolve schemas yet"
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using assert! here will cause a panic in debug builds if should_resolve_schema is true. This can lead to unexpected crashes during development and testing. It would be more robust to return an Err to indicate that schema resolution is not yet supported for deserialization, similar to how other unsupported features are handled.

//! - **unit_variant** => [`Schema::Record`] named as the variant and with no fields.
//! - **newtype_variant** => [`Schema::Record`] named as the variant and with one field.
//! The schema must have the attribute `org.apache.avro.rust.union_of_records` with the value set to `true`.
//! - **tuple_variant** => [`Schema::Record`] named as the variant and with as many fields as there are element.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a small typo here. element should be elements. This typo also appears on lines 79 and 98.

    - **tuple_variant** => [`Schema::Record`] named as the variant and with as many fields as there are elements.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
avro/src/reader/single_object.rs (1)

366-370: ⚠️ Potential issue | 🟡 Minor

Test doesn't assert on the matches! result.

Line 369 uses matches! but doesn't assert on the result. The match result is discarded, so the test doesn't actually verify the error type.

💚 Fix the assertion
-        matches!(read_result, Err(Details::ReadBytes(_)));
+        assert!(matches!(read_result, Err(Details::ReadBytes(_))));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/reader/single_object.rs` around lines 366 - 370, The test currently
discards the boolean returned by matches! and so never asserts the error type;
update the assertion by checking read_result and asserting it matches
Details::ReadBytes (e.g., replace the bare matches! call with an assertion such
as assert!(matches!(read_result, Err(Details::ReadBytes(_))))), referencing the
read_result variable and the generic_reader.read_value call so the test actually
verifies the expected Details::ReadBytes error variant.
🧹 Nitpick comments (16)
CLAUDE.md (1)

1-1: Clarify policy scope and fix wording on Line 1.

The sentence is ambiguous and has a naming typo (GithubGitHub). Consider making the scope explicit (where this rule applies) to avoid misinterpretation.

Proposed wording
-Never link to any issue or a pull request in any Github repository
+Do not link to issues or pull requests in any GitHub repository.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` at line 1, Update the first line of CLAUDE.md to clarify the
rule's scope and fix the typo "Github" → "GitHub": replace the ambiguous
sentence with an explicit statement that defines where it applies (e.g., "Never
link to any issue or pull request in any GitHub repository referenced by this
project or its documentation") so readers understand the context and the
spelling is corrected.
avro/tests/serde_human_readable_true.rs (1)

104-107: Prefer less brittle error assertions in tests.

Asserting the full formatted error (especially schema debug output) makes tests fragile to non-functional message changes.

Proposed stabilization
-    assert_eq!(
-        writer.write(uuid, &mut buffer).unwrap_err().to_string(),
-        r#"Failed to serialize value of type `str` using Schema::Uuid(Bytes): Expected Schema::String | Schema::Uuid(String)"#
-    );
+    let err = writer.write(uuid, &mut buffer).unwrap_err().to_string();
+    assert!(err.contains("Failed to serialize value of type `str` using Schema::Uuid(Bytes)"));
+    assert!(err.contains("Expected Schema::String | Schema::Uuid(String)"));

...

-    assert_eq!(
-        writer.write(uuid, &mut buffer).unwrap_err().to_string(),
-        r#"Failed to serialize value of type `str` using Schema::Uuid(Fixed(FixedSchema { name: Name { name: "uuid", .. }, size: 16, .. })): Expected Schema::String | Schema::Uuid(String)"#
-    );
+    let err = writer.write(uuid, &mut buffer).unwrap_err().to_string();
+    assert!(err.contains("Failed to serialize value of type `str` using Schema::Uuid(Fixed"));
+    assert!(err.contains("Expected Schema::String | Schema::Uuid(String)"));

Also applies to: 130-133

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/tests/serde_human_readable_true.rs` around lines 104 - 107, The test
currently asserts the entire error string from writer.write(uuid, &mut
buffer).unwrap_err().to_string(), which is brittle; update the assertion to
check for a stable property instead—e.g., assert that the error kind or a
substring is present (use contains) or match on the error enum/variant if
available (call writer.write(...).unwrap_err() and assert that its
variant/Display contains "Expected Schema::String" or that it is a
serialization/schema-mismatch error) so tests no longer depend on full debug
formatting of Schema::Uuid/Schema::String in serde_human_readable_true.rs around
the writer.write assertion. Ensure the change is applied to both occurrences
(the one at lines ~104-107 and the similar block at ~130-133) and reference
writer.write and the error result to locate the code.
avro/tests/serde_human_readable_false.rs (1)

80-83: Consider substring assertions for resilience.

This full message assertion is likely to break on benign formatting updates; checking key fragments is more stable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/tests/serde_human_readable_false.rs` around lines 80 - 83, The test
currently asserts exact equality of the error string from writer.write (in
serde_human_readable_false.rs) which is fragile; change the assertion to check
for key substrings instead (e.g., that
writer.write(...).unwrap_err().to_string() contains the important fragment(s)
like "Expected Schema::Bytes" or "Schema::Uuid(Bytes | Fixed)") so the test
tolerates harmless formatting changes—replace the assert_eq! on
writer.write(...) with one or more assert!(...contains(...)) checks referencing
writer.write and the error string.
avro/src/serde/ser_schema/union.rs (1)

406-439: Clarify the 1-tuple union serialization behavior.

The 1-tuple case calls TupleSerializer::one_union, which re-enters union serialization for the single element. This is correct for handling (T,) where T needs to match a union variant, but the flow could be surprising.

Consider adding a brief comment explaining why 1-tuples delegate back to union selection rather than finding a record with 1 field.

📝 Suggested documentation
         } else if len == 1 {
+            // For 1-tuples, the inner value determines the union branch,
+            // so we delegate to union serialization for the single element.
             Ok(TupleSerializer::one_union(
                 self.writer,
                 self.union,
                 self.config,
             ))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/union.rs` around lines 406 - 439, Add a short
clarifying comment in serialize_tuple explaining why the len == 1 branch
delegates to TupleSerializer::one_union (re-entering union selection) instead of
matching a Schema::Record with one field: this handles 1-tuples like (T,) where
the single element must be resolved against the union variants, distinct from a
record variant found by union.find_record_with_n_fields; mention the relation to
union.index_of_schema_kind and the zig_i32 index write so reviewers can see
where the union-discriminant is emitted.
avro/src/serde/deser_schema/record.rs (1)

74-92: Consider documenting the union-of-records attribute handling.

This special case deserializes using the inner field's schema when org.apache.avro.rust.union_of_records attribute is true and the record has exactly one field. This is a non-obvious behavior that affects data model mapping.

Consider adding a brief comment explaining the purpose of this attribute and when it's typically set (likely during derive macro processing for Rust enums mapped to Avro unions of records).

📝 Proposed documentation addition
         let schema = if let Schema::Ref { name } = schema {
             self.config.get_schema(name)?
         } else {
             schema
         };
+        // When a record is marked with "org.apache.avro.rust.union_of_records" and has exactly
+        // one field, deserialize using that field's schema directly. This supports Rust enums
+        // mapped to Avro unions of single-field records (newtype variants).
         let value = if let Schema::Record(record) = schema
             && record.fields.len() == 1
             && record
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/deser_schema/record.rs` around lines 74 - 92, Add a short
explanatory comment above the special-case branch that checks
Schema::Record(record) and the "org.apache.avro.rust.union_of_records"
attribute: state that when a Record has exactly one field and that attribute is
true (set by the Avro derive macro for Rust enums mapped to Avro unions of
records), we intentionally deserialize using the inner field schema
(record.fields[0].schema) via SchemaAwareDeserializer to preserve the original
union-member type; reference the attribute name, the Schema::Record pattern, and
the use of seed.deserialize(SchemaAwareDeserializer::new(...)) so future readers
understand why the inner schema is used instead of the outer record schema.
avro/src/serde/ser_schema/block.rs (1)

338-355: Map entry counting is split between serialize_key and serialize_value.

In BufferedBlockSerializer's SerializeMap impl, serialize_key (lines 338-347) doesn't increment items_in_buffer, but serialize_value calls serialize_element (line 354) which does. This works correctly since each map entry consists of one key + one value, but the asymmetry could be confusing.

A comment explaining this design would improve maintainability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/block.rs` around lines 338 - 355, Add a brief
comment in the BufferedBlockSerializer SerializeMap implementation explaining
why items_in_buffer is only incremented in serialize_element (called by
serialize_value) and not in serialize_key: note that map entries are serialized
as key then value and serialize_element increments the count once per complete
entry, so counting in serialize_value avoids double-counting and preserves the
one-increment-per-entry invariant; place the comment near the serialize_key and
serialize_value methods, referencing serialize_key, serialize_value,
serialize_element and the items_in_buffer field for clarity.
avro/src/serde/deser_schema/mod.rs (3)

217-221: Pointer comparison for DESERIALIZE_ANY is clever but subtle.

The comment explains the design well, but this pattern is unusual and could confuse future maintainers. Consider adding a brief inline comment at the comparison site (line 686) as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/deser_schema/mod.rs` around lines 217 - 221, The code uses
pointer-address equality for the static DESERIALIZE_ANY sentinel which is clever
but subtle; add a brief inline comment at the place where DESERIALIZE_ANY is
compared (the deserialize_any comparison site) explaining that the comparison
intentionally uses pointer equality (not string content) to prevent
user-provided strings from bypassing validation and that DESERIALIZE_ANY is a
static sentinel so pointer equality is safe; reference DESERIALIZE_ANY and
DESERIALIZE_ANY_FIELDS in the comment so maintainers can locate the sentinel
definition and the comparison site in deserialize_any.

728-733: TODO for Seek optimization in deserialize_ignored_any.

The TODO on line 732 suggests adding a Seek bound to skip over ignored data more efficiently. This is a good optimization opportunity for large ignored fields.

Would you like me to open an issue to track this optimization?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/deser_schema/mod.rs` around lines 728 - 733, The current
deserialize_ignored_any implementation just delegates to deserialize_any;
replace this with logic that requires a Seek bound and skips over ignored data
efficiently: add a Seek constraint to the deserializer type or method signature
(e.g., require Read + Seek on the underlying reader used by the deserializer),
then implement skipping by calculating the ignored field length from the
wire/type info and advancing the reader with seek or seek_relative instead of
reading into memory; update the signature of deserialize_ignored_any and any
related trait impls that reference it (function name: deserialize_ignored_any)
so the method uses the reader's seek to skip bytes for large ignored fields.

256-269: Extract "org.apache.avro.rust.union_of_records" to a shared constant.

The string "org.apache.avro.rust.union_of_records" appears in three implementation files (ser_schema/mod.rs, deser_schema/record.rs, deser_schema/enums.rs). The tuple attribute string (line 257) appears only in this deserialization code, but both are Rust-specific Avro schema attributes that would benefit from centralized definition to improve maintainability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/deser_schema/mod.rs` around lines 256 - 269, Extract the
hard-coded Avro-Rust attribute keys into shared constants and replace the
literal occurrences with those constants: define e.g.
RUST_AVRO_ATTR_UNION_OF_RECORDS = "org.apache.avro.rust.union_of_records" (and
optionally RUST_AVRO_ATTR_TUPLE = "org.apache.avro.rust.tuple") in a central
module (e.g., a new constant in the serde/schema module), then update the
Schema::Record match in deserialize (the branch that calls
deserialize_tuple/deserialize_struct) to use RUST_AVRO_ATTR_TUPLE and update the
checks in the other implementations that currently use
"org.apache.avro.rust.union_of_records" (ser_schema/mod.rs,
deser_schema/record.rs, deser_schema/enums.rs) to use
RUST_AVRO_ATTR_UNION_OF_RECORDS so all files reference the single shared
constant.
avro/src/serde/ser_schema/record/mod.rs (1)

221-230: expect() in serialize_value will panic if called without prior serialize_key.

The expect() on line 227 will panic if serialize_value is called without a preceding serialize_key call. While this is technically a violation of the Serde contract, returning an error would be more defensive.

🛡️ Suggested defensive approach
     fn serialize_value<T>(&mut self, value: &T) -> Result<(), Self::Error>
     where
         T: ?Sized + Serialize,
     {
-        self.serialize_next_field(
-            self.map_position
-                .expect("serialize_value called without calling serialize_key"),
-            value,
-        )
+        let position = self.map_position.ok_or_else(|| {
+            Error::new(Details::Custom("serialize_value called without calling serialize_key".into()))
+        })?;
+        self.serialize_next_field(position, value)
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/record/mod.rs` around lines 221 - 230,
serialize_value currently unwraps map_position with expect and will panic if
called without a prior serialize_key; change it to check map_position and return
a Serde-compatible error instead of panicking: if self.map_position is Some(pos)
call self.serialize_next_field(pos, value) as before, otherwise construct and
return an appropriate Self::Error (a descriptive "missing map key before value"
or similar) so the function returns Result::Err instead of panicking; reference
serialize_value, serialize_next_field and the map_position field to find where
to implement this guard.
avro/src/serde/ser_schema/tuple.rs (1)

239-262: OneTupleSerializer doesn't enforce single-element constraint.

The serialize_element method (lines 239-257) allows multiple elements to be serialized, accumulating bytes. However, for a single-element tuple, only one call should be allowed. Consider adding validation or documenting the expected behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/tuple.rs` around lines 239 - 262, The
OneTupleSerializer currently allows multiple calls to serialize_element and just
accumulates bytes in bytes_written; add a guard to enforce the single-element
tuple constraint by adding an element_written (or seen) boolean field to
OneTupleSerializer, check it at the start of serialize_element and return an
appropriate error if it's already true, set it to true after successfully
serializing the first element, and optionally have end validate that at least
one element was written; update serialize_element’s logic around
Schema::Union/SchemaAwareSerializer and preserve bytes_written behavior but bail
on any subsequent calls.
avro/src/reader/single_object.rs (1)

61-101: Duplicated header validation logic between read_value and read_deser.

Both methods (lines 61-81 and 83-101) duplicate the header reading and validation logic. Consider extracting a helper method to reduce duplication and ensure consistent behavior.

♻️ Suggested refactor
+    fn verify_header<R: Read>(&self, reader: &mut R) -> AvroResult<()> {
+        let mut header = vec![0; self.expected_header.len()];
+        reader.read_exact(&mut header).map_err(Details::ReadHeader)?;
+        if self.expected_header == header {
+            Ok(())
+        } else {
+            Err(Details::SingleObjectHeaderMismatch(self.expected_header.clone(), header).into())
+        }
+    }
+
     pub fn read_value<R: Read>(&self, reader: &mut R) -> AvroResult<Value> {
-        let mut header = vec![0; self.expected_header.len()];
-        match reader.read_exact(&mut header) {
-            Ok(_) => {
-                if self.expected_header == header {
-                    decode_internal(
-                        self.write_schema.get_root_schema(),
-                        self.write_schema.get_names(),
-                        None,
-                        reader,
-                    )
-                } else {
-                    Err(
-                        Details::SingleObjectHeaderMismatch(self.expected_header.clone(), header)
-                            .into(),
-                    )
-                }
-            }
-            Err(io_error) => Err(Details::ReadHeader(io_error).into()),
-        }
+        self.verify_header(reader)?;
+        decode_internal(
+            self.write_schema.get_root_schema(),
+            self.write_schema.get_names(),
+            None,
+            reader,
+        )
     }

     pub fn read_deser<T: DeserializeOwned>(&self, reader: &mut impl Read) -> AvroResult<T> {
-        let mut header = vec![0; self.expected_header.len()];
-        reader
-            .read_exact(&mut header)
-            .map_err(Details::ReadHeader)?;
-        if self.expected_header == header {
-            let config = Config {
-                names: self.write_schema.get_names(),
-                human_readable: self.human_readable,
-            };
-            T::deserialize(SchemaAwareDeserializer::new(
-                reader,
-                self.write_schema.get_root_schema(),
-                config,
-            )?)
-        } else {
-            Err(Details::SingleObjectHeaderMismatch(self.expected_header.clone(), header).into())
-        }
+        self.verify_header(reader)?;
+        let config = Config {
+            names: self.write_schema.get_names(),
+            human_readable: self.human_readable,
+        };
+        T::deserialize(SchemaAwareDeserializer::new(
+            reader,
+            self.write_schema.get_root_schema(),
+            config,
+        )?)
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/reader/single_object.rs` around lines 61 - 101, Extract a small
helper method (e.g., fn validate_and_consume_header(&self, reader: &mut impl
Read) -> AvroResult<()> ) that allocates a header buffer of
self.expected_header.len(), calls reader.read_exact(&mut header) mapping IO
errors to Details::ReadHeader, compares header to self.expected_header and
returns Err(Details::SingleObjectHeaderMismatch(...).into()) on mismatch or
Ok(()) on success; then replace the duplicated header-reading blocks in
read_value and read_deser with a call to this helper so both methods reuse the
same behavior while keeping the remaining decode/deserialization logic (use of
self.write_schema, Config and SchemaAwareDeserializer) unchanged.
avro/src/writer/single_object.rs (1)

97-100: Builder default uses expect() which could panic at runtime.

The expect() in the builder default for resolved will panic if T::get_schema() returns an invalid schema. While the comment suggests this shouldn't happen for valid AvroSchema implementations, a malformed derived schema could cause unexpected panics.

Consider whether this should return a Result instead, or document this panic condition more prominently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/writer/single_object.rs` around lines 97 - 100, The builder default
for the field resolved currently calls
ResolvedOwnedSchema::new(T::get_schema()).expect(...), which can panic; replace
this by propagating the Result from ResolvedOwnedSchema::new instead of
unwrapping so the error is handled at compile/runtime: change the builder
default expression to return the Result<ResolvedOwnedSchema, Error> coming from
ResolvedOwnedSchema::new(T::get_schema()), and update the builder/field type for
resolved (or remove the default and require callers to pass a resolved value) so
that failures from ResolvedOwnedSchema::new are returned as an Error rather than
causing a panic; reference the builder attribute on resolved,
ResolvedOwnedSchema::new, and T::get_schema() when making the change.
avro/src/reader/datum.rs (1)

393-401: Test assertion on error message string is fragile.

Asserting on the exact error message string (line 399-401) is brittle and may break with unrelated changes. Consider asserting on a more stable property like the error variant or using contains() for partial matching.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/reader/datum.rs` around lines 393 - 401, The test currently asserts
the full error message string from
GenericDatumReader::builder(...).build()?.read_deser::<TestRecord3240>(&mut
encoded).unwrap_err(), which is brittle; change the assertion to check a stable
property instead — e.g., match on the error variant or use
error.to_string().contains("failed to fill whole buffer"), or use a
pattern/assertion like matches!(error, crate::error::Error::Io(..)) or
assert!(error.to_string().contains("variable length integer")) so the test
verifies the error kind/partial message rather than the exact full string.
avro/src/serde/deser_schema/enums.rs (1)

194-214: Extract the magic attribute string to a shared constant.

The attribute name "org.apache.avro.rust.union_of_records" appears 8 times across the codebase (record.rs, enums.rs, mod.rs, ser_schema/mod.rs). Extracting it to a single constant will prevent typos, improve maintainability, and make future changes easier.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/deser_schema/enums.rs` around lines 194 - 214, The repeated
literal "org.apache.avro.rust.union_of_records" should be extracted to a single
shared constant (e.g., pub(crate) const UNION_OF_RECORDS_ATTR: &str =
"org.apache.avro.rust.union_of_records") declared in a common module that all
serde modules can import; then replace uses like
attributes.get("org.apache.avro.rust.union_of_records") in enums.rs (the match
branch checking record.attributes), record.rs, mod.rs, and ser_schema/mod.rs
with attributes.get(UNION_OF_RECORDS_ATTR) (or equivalent constant lookup) so
all eight occurrences use the same constant and have appropriate visibility for
the modules that reference it.
avro/src/serde/with.rs (1)

710-722: Consider renaming bytes to vec or items.

The variable bytes (line 715) holds a Vec<T>, not bytes. This naming is misleading and appears to be copy-pasted from the bytes modules.

✏️ Suggested rename
     pub fn deserialize<'de, const N: usize, D, T>(deserializer: D) -> Result<[T; N], D::Error>
     where
         D: Deserializer<'de>,
         T: Deserialize<'de>,
     {
-        let bytes = <Vec<T> as Deserialize>::deserialize(deserializer)?;
-        bytes.try_into().map_err(|v: Vec<T>| {
+        let items = <Vec<T> as Deserialize>::deserialize(deserializer)?;
+        items.try_into().map_err(|v: Vec<T>| {
             D::Error::custom(format!(
                 "Deserialized array has length {} which does not match array length of {N}",
                 v.len()
             ))
         })
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/with.rs` around lines 710 - 722, The local variable named
`bytes` in the function `deserialize<'de, const N: usize, D, T>(deserializer: D)
-> Result<[T; N], D::Error>` is misleading because it holds a Vec<T>; rename it
to something accurate like `vec` or `items` and update all uses in that function
(the call to `<Vec<T> as Deserialize>::deserialize(deserializer)?` assignment
and the subsequent `try_into().map_err(|v: Vec<T>| { ... v.len() ... })`) so the
name reflects the stored values rather than implying raw bytes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@avro/src/reader/block.rs`:
- Around line 232-234: The current branch that handles read_schema uses panic!
for the unimplemented SchemaAwareResolvingDeserializer; replace that panic with
a graceful error return by adding a new error variant (e.g.,
SchemaAwareDeserializationDoesNotResolve) to the Details enum in error.rs and
returning that error from the block reader code path where read_schema.is_some()
is true (reference the read_schema check and the
SchemaAwareResolvingDeserializer TODO comment to locate the code and the Details
enum in error.rs to add the variant).

In `@avro/src/reader/datum.rs`:
- Around line 145-161: The function read_deser currently panics when a reader
schema is present (the branch that references SchemaAwareResolvingDeserializer);
replace the panic! with a proper AvroResult error return instead (e.g., return
Err(AvroError::Unimplemented("schema resolving not implemented")) or add a
specific AvroError variant like ResolveSchemaNotImplemented) so callers get an
error rather than a runtime panic, and update any tests to expect that error;
ensure the change is applied in read_deser and consider adding a short comment
referencing SchemaAwareResolvingDeserializer for future implementation tracking.

In `@avro/src/reader/mod.rs`:
- Around line 133-141: In read_next_deser (method read_next_deser<T:
DeserializeOwned>) replace the assert! on should_resolve_schema with an early
return of an AvroResult::Err so the function returns a proper error instead of
panicking; specifically, check if self.should_resolve_schema is true and return
an Avro error indicating schema-aware resolving is not implemented (use the same
error variant/message pattern as the one used in block.rs), otherwise proceed to
call self.block.read_next_deser(self.reader_schema).

In `@avro/src/schema/mod.rs`:
- Around line 842-848: The current unique_normalized_name() escape scheme
conflates '.' and '_' (e.g., "a._b" vs "a_.b" => "a___b"); update the
normalization so non-alphanumeric characters are encoded uniquely instead of
simply converting '_' to "__" and then mapping others to '_' — for example,
remove the initial .replace('_','__') and change the .map(...) so any
non-ASCII-alphanumeric char is percent/hex-encoded (e.g. format!("_x{:02X}_", c
as u32)) or otherwise encode '.' and '_' to distinct tokens before collecting;
apply this change inside the function that builds the name (the code building
name and returning Cow::Owned(format!("r{}_{}", name.len(), name))) so produced
names remain collision-free and deterministic.

In `@avro/src/schema/union.rs`:
- Around line 105-127: The union lookup in find_named_schema currently compares
only the short name (schema_name.name()) which can return wrong variants across
namespaces; update the equality check to use the fully-qualified name (e.g.,
schema_name.fullname() or schema_name.to_string()) when comparing to the input
name, keeping the rest of the logic (including the Schema::Ref resolution using
names.get(name)) unchanged so that both direct named schemas and refs are
matched by their fully-qualified name; ensure the comparison uses the same
fully-qualified form used elsewhere in resolution so names with different
namespaces don't collide.
- Around line 132-191: The current find_fixed_of_size_n and
find_record_with_n_fields scan named_index and return the first schema matching
by shape only, which can silently pick the wrong union branch; instead collect
all matching candidates while resolving refs via names (same resolution logic
used now on self.schemas and Schema::Ref), and if zero return Ok(None), if
exactly one return Ok(Some((index, &schema))), but if >1 return an Error
indicating ambiguous union resolution (create/raise a
Details::AmbiguousUnionResolution or similar error) so callers can handle
ambiguity; update both functions (referencing find_fixed_of_size_n,
find_record_with_n_fields, Schema::Fixed, UuidSchema::Fixed,
DecimalSchema::InnerDecimalSchema::Fixed, Duration variant, and Schema::Record)
to implement this candidate-collection-and-disambiguation behavior.

In `@avro/src/serde/ser_schema/record/field_default.rs`:
- Around line 52-113: recursive_type_check (and the corresponding serialize
path) fails to handle Schema::Ref returned by AvroSchema::get_schema, so resolve
Schema::Ref to its target before matching; update recursive_type_check (and
serialize) to detect Schema::Ref and follow the referenced schema (resolve the
ref to the actual Schema variant) and then continue the existing match logic,
ensuring named/repeated types (e.g., duplicate record/uuid refs) validate and
serialize defaults correctly.
- Around line 84-92: The code currently treats JSON string defaults for
byte/fixed as UTF-8 by using s.len()/s.as_bytes(), which rejects
single-code-point bytes like "\u00ff" and encodes them incorrectly; update the
handling in the match arm that matches Value::String(s) with Schema::Fixed /
Decimal(inner: Fixed) / Uuid(Fixed) / Duration by decoding the JSON string into
raw bytes: iterate the string by char, ensure each char's code point is in
0..=255, push (char as u32 as u8) into a Vec<u8>, then validate Vec<u8>.len() ==
fixed.size and use that byte buffer for serialization instead of s.as_bytes();
apply the same change to the analogous block around lines 151-159 that handles
byte/fixed defaults elsewhere in field_default.rs.

In `@avro/src/serde/with.rs`:
- Around line 618-624: The function get_record_fields_in_ctxt currently has only
two parameters but should match the module API by accepting a third usize
parameter; update its signature to pub fn get_record_fields_in_ctxt(_: &mut
HashSet<Name>, _: NamespaceRef, _: usize) -> Option<Vec<RecordField>> and adjust
any internal callers or trait implementations to pass the size argument (even if
unused) so the function signature is consistent with
bigdecimal::get_record_fields_in_ctxt and the rest of the module.
- Around line 770-776: The function get_record_fields_in_ctxt currently has a
mismatched API (only two params) — update its signature to accept the third
usize parameter to match the other modules: change get_record_fields_in_ctxt(_:
&mut HashSet<Name>, _: NamespaceRef) -> Option<Vec<RecordField>> to include a
third usize argument (e.g. _: usize) and propagate/ignore it consistently in the
body (still returning None); ensure the symbol name get_record_fields_in_ctxt
and return type Option<Vec<RecordField>> remain unchanged so callers expecting
the 3-arg form compile.
- Around line 694-700: The function get_record_fields_in_ctxt currently has only
two parameters but should accept a third usize parameter to match the module API
(same as bigdecimal modules); update the signature of get_record_fields_in_ctxt
to add a third parameter (usize) and then adjust all call sites to pass the
appropriate usize argument, ensuring the function name and return type
(Option<Vec<RecordField>>) remain unchanged; verify any trait impls or
references to NamespaceRef and HashSet<Name> still align after the signature
change.
- Around line 731-732: The doc comment inside avro/src/serde/with.rs contains a
copy-paste sentence ("When used with different serialization formats, this is
equivalent to [`serde_bytes`].") that is incorrect for the array serialization
docs; remove that sentence (or replace it with a brief correct statement about
array serialization semantics) from the array-related doc comment so the
documentation no longer references serde_bytes incorrectly—look for the array
module/doc block in with.rs and edit the docstring accordingly.

---

Outside diff comments:
In `@avro/src/reader/single_object.rs`:
- Around line 366-370: The test currently discards the boolean returned by
matches! and so never asserts the error type; update the assertion by checking
read_result and asserting it matches Details::ReadBytes (e.g., replace the bare
matches! call with an assertion such as assert!(matches!(read_result,
Err(Details::ReadBytes(_))))), referencing the read_result variable and the
generic_reader.read_value call so the test actually verifies the expected
Details::ReadBytes error variant.

---

Nitpick comments:
In `@avro/src/reader/datum.rs`:
- Around line 393-401: The test currently asserts the full error message string
from GenericDatumReader::builder(...).build()?.read_deser::<TestRecord3240>(&mut
encoded).unwrap_err(), which is brittle; change the assertion to check a stable
property instead — e.g., match on the error variant or use
error.to_string().contains("failed to fill whole buffer"), or use a
pattern/assertion like matches!(error, crate::error::Error::Io(..)) or
assert!(error.to_string().contains("variable length integer")) so the test
verifies the error kind/partial message rather than the exact full string.

In `@avro/src/reader/single_object.rs`:
- Around line 61-101: Extract a small helper method (e.g., fn
validate_and_consume_header(&self, reader: &mut impl Read) -> AvroResult<()> )
that allocates a header buffer of self.expected_header.len(), calls
reader.read_exact(&mut header) mapping IO errors to Details::ReadHeader,
compares header to self.expected_header and returns
Err(Details::SingleObjectHeaderMismatch(...).into()) on mismatch or Ok(()) on
success; then replace the duplicated header-reading blocks in read_value and
read_deser with a call to this helper so both methods reuse the same behavior
while keeping the remaining decode/deserialization logic (use of
self.write_schema, Config and SchemaAwareDeserializer) unchanged.

In `@avro/src/serde/deser_schema/enums.rs`:
- Around line 194-214: The repeated literal
"org.apache.avro.rust.union_of_records" should be extracted to a single shared
constant (e.g., pub(crate) const UNION_OF_RECORDS_ATTR: &str =
"org.apache.avro.rust.union_of_records") declared in a common module that all
serde modules can import; then replace uses like
attributes.get("org.apache.avro.rust.union_of_records") in enums.rs (the match
branch checking record.attributes), record.rs, mod.rs, and ser_schema/mod.rs
with attributes.get(UNION_OF_RECORDS_ATTR) (or equivalent constant lookup) so
all eight occurrences use the same constant and have appropriate visibility for
the modules that reference it.

In `@avro/src/serde/deser_schema/mod.rs`:
- Around line 217-221: The code uses pointer-address equality for the static
DESERIALIZE_ANY sentinel which is clever but subtle; add a brief inline comment
at the place where DESERIALIZE_ANY is compared (the deserialize_any comparison
site) explaining that the comparison intentionally uses pointer equality (not
string content) to prevent user-provided strings from bypassing validation and
that DESERIALIZE_ANY is a static sentinel so pointer equality is safe; reference
DESERIALIZE_ANY and DESERIALIZE_ANY_FIELDS in the comment so maintainers can
locate the sentinel definition and the comparison site in deserialize_any.
- Around line 728-733: The current deserialize_ignored_any implementation just
delegates to deserialize_any; replace this with logic that requires a Seek bound
and skips over ignored data efficiently: add a Seek constraint to the
deserializer type or method signature (e.g., require Read + Seek on the
underlying reader used by the deserializer), then implement skipping by
calculating the ignored field length from the wire/type info and advancing the
reader with seek or seek_relative instead of reading into memory; update the
signature of deserialize_ignored_any and any related trait impls that reference
it (function name: deserialize_ignored_any) so the method uses the reader's seek
to skip bytes for large ignored fields.
- Around line 256-269: Extract the hard-coded Avro-Rust attribute keys into
shared constants and replace the literal occurrences with those constants:
define e.g. RUST_AVRO_ATTR_UNION_OF_RECORDS =
"org.apache.avro.rust.union_of_records" (and optionally RUST_AVRO_ATTR_TUPLE =
"org.apache.avro.rust.tuple") in a central module (e.g., a new constant in the
serde/schema module), then update the Schema::Record match in deserialize (the
branch that calls deserialize_tuple/deserialize_struct) to use
RUST_AVRO_ATTR_TUPLE and update the checks in the other implementations that
currently use "org.apache.avro.rust.union_of_records" (ser_schema/mod.rs,
deser_schema/record.rs, deser_schema/enums.rs) to use
RUST_AVRO_ATTR_UNION_OF_RECORDS so all files reference the single shared
constant.

In `@avro/src/serde/deser_schema/record.rs`:
- Around line 74-92: Add a short explanatory comment above the special-case
branch that checks Schema::Record(record) and the
"org.apache.avro.rust.union_of_records" attribute: state that when a Record has
exactly one field and that attribute is true (set by the Avro derive macro for
Rust enums mapped to Avro unions of records), we intentionally deserialize using
the inner field schema (record.fields[0].schema) via SchemaAwareDeserializer to
preserve the original union-member type; reference the attribute name, the
Schema::Record pattern, and the use of
seed.deserialize(SchemaAwareDeserializer::new(...)) so future readers understand
why the inner schema is used instead of the outer record schema.

In `@avro/src/serde/ser_schema/block.rs`:
- Around line 338-355: Add a brief comment in the BufferedBlockSerializer
SerializeMap implementation explaining why items_in_buffer is only incremented
in serialize_element (called by serialize_value) and not in serialize_key: note
that map entries are serialized as key then value and serialize_element
increments the count once per complete entry, so counting in serialize_value
avoids double-counting and preserves the one-increment-per-entry invariant;
place the comment near the serialize_key and serialize_value methods,
referencing serialize_key, serialize_value, serialize_element and the
items_in_buffer field for clarity.

In `@avro/src/serde/ser_schema/record/mod.rs`:
- Around line 221-230: serialize_value currently unwraps map_position with
expect and will panic if called without a prior serialize_key; change it to
check map_position and return a Serde-compatible error instead of panicking: if
self.map_position is Some(pos) call self.serialize_next_field(pos, value) as
before, otherwise construct and return an appropriate Self::Error (a descriptive
"missing map key before value" or similar) so the function returns Result::Err
instead of panicking; reference serialize_value, serialize_next_field and the
map_position field to find where to implement this guard.

In `@avro/src/serde/ser_schema/tuple.rs`:
- Around line 239-262: The OneTupleSerializer currently allows multiple calls to
serialize_element and just accumulates bytes in bytes_written; add a guard to
enforce the single-element tuple constraint by adding an element_written (or
seen) boolean field to OneTupleSerializer, check it at the start of
serialize_element and return an appropriate error if it's already true, set it
to true after successfully serializing the first element, and optionally have
end validate that at least one element was written; update serialize_element’s
logic around Schema::Union/SchemaAwareSerializer and preserve bytes_written
behavior but bail on any subsequent calls.

In `@avro/src/serde/ser_schema/union.rs`:
- Around line 406-439: Add a short clarifying comment in serialize_tuple
explaining why the len == 1 branch delegates to TupleSerializer::one_union
(re-entering union selection) instead of matching a Schema::Record with one
field: this handles 1-tuples like (T,) where the single element must be resolved
against the union variants, distinct from a record variant found by
union.find_record_with_n_fields; mention the relation to
union.index_of_schema_kind and the zig_i32 index write so reviewers can see
where the union-discriminant is emitted.

In `@avro/src/serde/with.rs`:
- Around line 710-722: The local variable named `bytes` in the function
`deserialize<'de, const N: usize, D, T>(deserializer: D) -> Result<[T; N],
D::Error>` is misleading because it holds a Vec<T>; rename it to something
accurate like `vec` or `items` and update all uses in that function (the call to
`<Vec<T> as Deserialize>::deserialize(deserializer)?` assignment and the
subsequent `try_into().map_err(|v: Vec<T>| { ... v.len() ... })`) so the name
reflects the stored values rather than implying raw bytes.

In `@avro/src/writer/single_object.rs`:
- Around line 97-100: The builder default for the field resolved currently calls
ResolvedOwnedSchema::new(T::get_schema()).expect(...), which can panic; replace
this by propagating the Result from ResolvedOwnedSchema::new instead of
unwrapping so the error is handled at compile/runtime: change the builder
default expression to return the Result<ResolvedOwnedSchema, Error> coming from
ResolvedOwnedSchema::new(T::get_schema()), and update the builder/field type for
resolved (or remove the default and require callers to pass a resolved value) so
that failures from ResolvedOwnedSchema::new are returned as an Error rather than
causing a panic; reference the builder attribute on resolved,
ResolvedOwnedSchema::new, and T::get_schema() when making the change.

In `@avro/tests/serde_human_readable_false.rs`:
- Around line 80-83: The test currently asserts exact equality of the error
string from writer.write (in serde_human_readable_false.rs) which is fragile;
change the assertion to check for key substrings instead (e.g., that
writer.write(...).unwrap_err().to_string() contains the important fragment(s)
like "Expected Schema::Bytes" or "Schema::Uuid(Bytes | Fixed)") so the test
tolerates harmless formatting changes—replace the assert_eq! on
writer.write(...) with one or more assert!(...contains(...)) checks referencing
writer.write and the error string.

In `@avro/tests/serde_human_readable_true.rs`:
- Around line 104-107: The test currently asserts the entire error string from
writer.write(uuid, &mut buffer).unwrap_err().to_string(), which is brittle;
update the assertion to check for a stable property instead—e.g., assert that
the error kind or a substring is present (use contains) or match on the error
enum/variant if available (call writer.write(...).unwrap_err() and assert that
its variant/Display contains "Expected Schema::String" or that it is a
serialization/schema-mismatch error) so tests no longer depend on full debug
formatting of Schema::Uuid/Schema::String in serde_human_readable_true.rs around
the writer.write assertion. Ensure the change is applied to both occurrences
(the one at lines ~104-107 and the similar block at ~130-133) and reference
writer.write and the error result to locate the code.

In `@CLAUDE.md`:
- Line 1: Update the first line of CLAUDE.md to clarify the rule's scope and fix
the typo "Github" → "GitHub": replace the ambiguous sentence with an explicit
statement that defines where it applies (e.g., "Never link to any issue or pull
request in any GitHub repository referenced by this project or its
documentation") so readers understand the context and the spelling is corrected.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: deb04df3-4cb8-4a26-a07a-d03ce7e5d335

📥 Commits

Reviewing files that changed from the base of the PR and between 7aee76d and 110ecda.

📒 Files selected for processing (43)
  • CLAUDE.md
  • avro/examples/test_interop_single_object_encoding.rs
  • avro/src/bigdecimal.rs
  • avro/src/documentation/avro_data_model_to_serde.rs
  • avro/src/documentation/mod.rs
  • avro/src/documentation/serde_data_model_to_avro.rs
  • avro/src/error.rs
  • avro/src/lib.rs
  • avro/src/reader/block.rs
  • avro/src/reader/datum.rs
  • avro/src/reader/mod.rs
  • avro/src/reader/single_object.rs
  • avro/src/schema/mod.rs
  • avro/src/schema/union.rs
  • avro/src/serde/derive.rs
  • avro/src/serde/deser_schema/block.rs
  • avro/src/serde/deser_schema/enums.rs
  • avro/src/serde/deser_schema/identifier.rs
  • avro/src/serde/deser_schema/mod.rs
  • avro/src/serde/deser_schema/record.rs
  • avro/src/serde/deser_schema/tuple.rs
  • avro/src/serde/mod.rs
  • avro/src/serde/ser_schema/block.rs
  • avro/src/serde/ser_schema/mod.rs
  • avro/src/serde/ser_schema/record/field_default.rs
  • avro/src/serde/ser_schema/record/mod.rs
  • avro/src/serde/ser_schema/tuple.rs
  • avro/src/serde/ser_schema/union.rs
  • avro/src/serde/with.rs
  • avro/src/writer/datum.rs
  • avro/src/writer/mod.rs
  • avro/src/writer/single_object.rs
  • avro/tests/avro-rs-226.rs
  • avro/tests/avro-rs-285-bytes_deserialization.rs
  • avro/tests/schema.rs
  • avro/tests/serde_human_readable_false.rs
  • avro/tests/serde_human_readable_true.rs
  • avro/tests/union_schema.rs
  • avro_derive/src/attributes/mod.rs
  • avro_derive/tests/derive.proptest-regressions
  • avro_derive/tests/derive.rs
  • avro_derive/tests/serde.rs
  • licenserc.toml

Comment on lines +232 to +234
let item = if read_schema.is_some() {
// TODO: Implement SchemaAwareResolvingDeserializer
panic!("Schema aware deserialisation does not resolve schemas yet");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace panic! with a proper error return for better user experience.

Using panic! for an unimplemented feature path could crash production code unexpectedly. Since this is a known limitation (schema resolution not yet supported for deserialization), returning an error would be more graceful.

🛡️ Proposed fix to return an error instead of panicking
         let item = if read_schema.is_some() {
-            // TODO: Implement SchemaAwareResolvingDeserializer
-            panic!("Schema aware deserialisation does not resolve schemas yet");
+            return Err(Details::SchemaAwareDeserializationDoesNotResolve.into());
         } else {

This requires adding a new error variant to Details enum in error.rs:

SchemaAwareDeserializationDoesNotResolve,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/reader/block.rs` around lines 232 - 234, The current branch that
handles read_schema uses panic! for the unimplemented
SchemaAwareResolvingDeserializer; replace that panic with a graceful error
return by adding a new error variant (e.g.,
SchemaAwareDeserializationDoesNotResolve) to the Details enum in error.rs and
returning that error from the block reader code path where read_schema.is_some()
is true (reference the read_schema check and the
SchemaAwareResolvingDeserializer TODO comment to locate the code and the Details
enum in error.rs to add the variant).

Comment on lines +145 to +161
pub fn read_deser<T: DeserializeOwned>(&self, reader: &mut impl Read) -> AvroResult<T> {
// `reader` is `impl Read` instead of a generic on the function like T so it's easier to
// specify the type wanted (`read_deser<String>` vs `read_deser<String, _>`)
if let Some((_, _)) = &self.reader {
// TODO: Implement SchemaAwareResolvingDeserializer
panic!("Schema aware deserialisation does not resolve schemas yet");
} else {
T::deserialize(SchemaAwareDeserializer::new(
reader,
self.writer,
Config {
names: self.resolved.get_names(),
human_readable: self.human_readable,
},
)?)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Panic on reader schema configuration should be tracked.

The panic! on line 150 when a reader schema is configured is documented but could cause runtime crashes for users who migrate existing code. Consider returning an error instead or adding a compile-time check.

Do you want me to open an issue to track the SchemaAwareResolvingDeserializer implementation?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/reader/datum.rs` around lines 145 - 161, The function read_deser
currently panics when a reader schema is present (the branch that references
SchemaAwareResolvingDeserializer); replace the panic! with a proper AvroResult
error return instead (e.g., return Err(AvroError::Unimplemented("schema
resolving not implemented")) or add a specific AvroError variant like
ResolveSchemaNotImplemented) so callers get an error rather than a runtime
panic, and update any tests to expect that error; ensure the change is applied
in read_deser and consider adding a short comment referencing
SchemaAwareResolvingDeserializer for future implementation tracking.

Comment on lines +133 to +141
fn read_next_deser<T: DeserializeOwned>(&mut self) -> AvroResult<Option<T>> {
// TODO: Implement SchemaAwareResolvingDeserializer
assert!(
!self.should_resolve_schema,
"Schema aware deserialisation does not resolve schemas yet"
);

self.block.read_next_deser(self.reader_schema)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace assert! with a proper error return (same issue as block.rs).

This assertion will panic if a user provides a reader_schema that differs from the writer schema and then calls read_next_deser. This should return an error instead.

🛡️ Proposed fix to return an error
     fn read_next_deser<T: DeserializeOwned>(&mut self) -> AvroResult<Option<T>> {
-        // TODO: Implement SchemaAwareResolvingDeserializer
-        assert!(
-            !self.should_resolve_schema,
-            "Schema aware deserialisation does not resolve schemas yet"
-        );
+        if self.should_resolve_schema {
+            return Err(Details::SchemaAwareDeserializationDoesNotResolve.into());
+        }

         self.block.read_next_deser(self.reader_schema)
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/reader/mod.rs` around lines 133 - 141, In read_next_deser (method
read_next_deser<T: DeserializeOwned>) replace the assert! on
should_resolve_schema with an early return of an AvroResult::Err so the function
returns a proper error instead of panicking; specifically, check if
self.should_resolve_schema is true and return an Avro error indicating
schema-aware resolving is not implemented (use the same error variant/message
pattern as the one used in block.rs), otherwise proceed to call
self.block.read_next_deser(self.reader_schema).

Comment on lines +842 to +848
let name: String = name
.to_string()
.replace('_', "__")
.chars()
.map(|c| if c.is_ascii_alphanumeric() { c } else { '_' })
.collect();
Cow::Owned(format!("r{}_{}", name.len(), name))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

unique_normalized_name() is still ambiguous for valid Avro names.

This escape scheme still collides on mixed ./_ boundaries: a._b and a_.b both become a___b. That means two distinct named types can still generate the same tuple/array helper name, which can resolve to the wrong Schema::Ref or fail with a duplicate-name collision.

Possible direction
-                let name: String = name
-                    .to_string()
-                    .replace('_', "__")
-                    .chars()
-                    .map(|c| if c.is_ascii_alphanumeric() { c } else { '_' })
-                    .collect();
-                Cow::Owned(format!("r{}_{}", name.len(), name))
+                let encoded = name
+                    .to_string()
+                    .bytes()
+                    .map(|b| format!("{b:02x}"))
+                    .collect::<String>();
+                Cow::Owned(format!("r{encoded}"))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let name: String = name
.to_string()
.replace('_', "__")
.chars()
.map(|c| if c.is_ascii_alphanumeric() { c } else { '_' })
.collect();
Cow::Owned(format!("r{}_{}", name.len(), name))
let encoded = name
.to_string()
.bytes()
.map(|b| format!("{b:02x}"))
.collect::<String>();
Cow::Owned(format!("r{encoded}"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/schema/mod.rs` around lines 842 - 848, The current
unique_normalized_name() escape scheme conflates '.' and '_' (e.g., "a._b" vs
"a_.b" => "a___b"); update the normalization so non-alphanumeric characters are
encoded uniquely instead of simply converting '_' to "__" and then mapping
others to '_' — for example, remove the initial .replace('_','__') and change
the .map(...) so any non-ASCII-alphanumeric char is percent/hex-encoded (e.g.
format!("_x{:02X}_", c as u32)) or otherwise encode '.' and '_' to distinct
tokens before collecting; apply this change inside the function that builds the
name (the code building name and returning Cow::Owned(format!("r{}_{}",
name.len(), name))) so produced names remain collision-free and deterministic.

Comment on lines +105 to +127
pub(crate) fn find_named_schema<'s>(
&'s self,
name: &str,
names: &'s HashMap<Name, impl Borrow<Schema>>,
) -> Result<Option<(usize, &'s Schema)>, Error> {
for index in self.named_index.iter().copied() {
let schema = &self.schemas[index];
if let Some(schema_name) = schema.name()
&& schema_name.name() == name
{
let schema = if let Schema::Ref { name } = schema {
names
.get(name)
.ok_or_else(|| Details::SchemaResolutionError(name.clone()))?
.borrow()
} else {
schema
};
return Ok(Some((index, schema)));
}
}
Ok(None)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use fully-qualified name matching for named union lookup.

Line 113 matches only schema_name.name(). If a union contains records/enums/fixed with the same short name in different namespaces, this can resolve the wrong variant.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/schema/union.rs` around lines 105 - 127, The union lookup in
find_named_schema currently compares only the short name (schema_name.name())
which can return wrong variants across namespaces; update the equality check to
use the fully-qualified name (e.g., schema_name.fullname() or
schema_name.to_string()) when comparing to the input name, keeping the rest of
the logic (including the Schema::Ref resolution using names.get(name)) unchanged
so that both direct named schemas and refs are matched by their fully-qualified
name; ensure the comparison uses the same fully-qualified form used elsewhere in
resolution so names with different namespaces don't collide.

Comment on lines +84 to +92
Value::String(s),
Schema::Fixed(fixed)
| Schema::Decimal(DecimalSchema {
inner: InnerDecimalSchema::Fixed(fixed),
..
})
| Schema::Uuid(UuidSchema::Fixed(fixed))
| Schema::Duration(fixed),
) => s.len() == fixed.size,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Byte and fixed defaults are being encoded as UTF-8.

Avro byte/fixed defaults use one byte per JSON-string code point 0..=255, not UTF-8. s.len() / s.as_bytes() therefore reject valid defaults like "\u00ff" for size-1 fixed values and would serialize that default as 0xc3 0xbf instead of 0xff. Decode the string char-by-char into raw bytes first, then validate fixed sizes against that decoded byte buffer.

Also applies to: 151-159

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/ser_schema/record/field_default.rs` around lines 84 - 92, The
code currently treats JSON string defaults for byte/fixed as UTF-8 by using
s.len()/s.as_bytes(), which rejects single-code-point bytes like "\u00ff" and
encodes them incorrectly; update the handling in the match arm that matches
Value::String(s) with Schema::Fixed / Decimal(inner: Fixed) / Uuid(Fixed) /
Duration by decoding the JSON string into raw bytes: iterate the string by char,
ensure each char's code point is in 0..=255, push (char as u32 as u8) into a
Vec<u8>, then validate Vec<u8>.len() == fixed.size and use that byte buffer for
serialization instead of s.as_bytes(); apply the same change to the analogous
block around lines 151-159 that handles byte/fixed defaults elsewhere in
field_default.rs.

Comment on lines +618 to +624
/// Returns `None`
pub fn get_record_fields_in_ctxt(
_: &mut HashSet<Name>,
_: NamespaceRef,
) -> Option<Vec<RecordField>> {
None
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Same API inconsistency: missing usize parameter.

Same issue as bigdecimal::get_record_fields_in_ctxt — this function should take 3 parameters to match the existing module APIs.

🐛 Proposed fix
     /// Returns `None`
     pub fn get_record_fields_in_ctxt(
+        _: usize,
         _: &mut HashSet<Name>,
         _: NamespaceRef,
     ) -> Option<Vec<RecordField>> {
         None
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Returns `None`
pub fn get_record_fields_in_ctxt(
_: &mut HashSet<Name>,
_: NamespaceRef,
) -> Option<Vec<RecordField>> {
None
}
/// Returns `None`
pub fn get_record_fields_in_ctxt(
_: usize,
_: &mut HashSet<Name>,
_: NamespaceRef,
) -> Option<Vec<RecordField>> {
None
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/with.rs` around lines 618 - 624, The function
get_record_fields_in_ctxt currently has only two parameters but should match the
module API by accepting a third usize parameter; update its signature to pub fn
get_record_fields_in_ctxt(_: &mut HashSet<Name>, _: NamespaceRef, _: usize) ->
Option<Vec<RecordField>> and adjust any internal callers or trait
implementations to pass the size argument (even if unused) so the function
signature is consistent with bigdecimal::get_record_fields_in_ctxt and the rest
of the module.

Comment on lines +694 to +700
/// Returns `None`
pub fn get_record_fields_in_ctxt(
_: &mut HashSet<Name>,
_: NamespaceRef,
) -> Option<Vec<RecordField>> {
None
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Same API inconsistency: missing usize parameter.

Same issue as the bigdecimal modules — this function should take 3 parameters to match the existing module APIs.

🐛 Proposed fix
     /// Returns `None`
     pub fn get_record_fields_in_ctxt(
+        _: usize,
         _: &mut HashSet<Name>,
         _: NamespaceRef,
     ) -> Option<Vec<RecordField>> {
         None
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Returns `None`
pub fn get_record_fields_in_ctxt(
_: &mut HashSet<Name>,
_: NamespaceRef,
) -> Option<Vec<RecordField>> {
None
}
/// Returns `None`
pub fn get_record_fields_in_ctxt(
_: usize,
_: &mut HashSet<Name>,
_: NamespaceRef,
) -> Option<Vec<RecordField>> {
None
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/with.rs` around lines 694 - 700, The function
get_record_fields_in_ctxt currently has only two parameters but should accept a
third usize parameter to match the module API (same as bigdecimal modules);
update the signature of get_record_fields_in_ctxt to add a third parameter
(usize) and then adjust all call sites to pass the appropriate usize argument,
ensuring the function name and return type (Option<Vec<RecordField>>) remain
unchanged; verify any trait impls or references to NamespaceRef and
HashSet<Name> still align after the signature change.

Comment on lines +731 to +732
/// When used with different serialization formats, this is equivalent to [`serde_bytes`].
///
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect documentation: copy-paste error.

The statement "When used with different serialization formats, this is equivalent to [serde_bytes]" doesn't apply to array serialization. This appears to be copy-pasted from the bytes/fixed modules.

📝 Proposed fix: remove or correct the statement
 /// Use [`apache_avro::serde::array`] for non-optional array values.
 ///
-/// When used with different serialization formats, this is equivalent to [`serde_bytes`].
-///
 /// See usage with below example:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/with.rs` around lines 731 - 732, The doc comment inside
avro/src/serde/with.rs contains a copy-paste sentence ("When used with different
serialization formats, this is equivalent to [`serde_bytes`].") that is
incorrect for the array serialization docs; remove that sentence (or replace it
with a brief correct statement about array serialization semantics) from the
array-related doc comment so the documentation no longer references serde_bytes
incorrectly—look for the array module/doc block in with.rs and edit the
docstring accordingly.

Comment on lines +770 to +776
/// Returns `None`
pub fn get_record_fields_in_ctxt(
_: &mut HashSet<Name>,
_: NamespaceRef,
) -> Option<Vec<RecordField>> {
None
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Same API inconsistency: missing usize parameter.

Same issue as the other new modules — this function should take 3 parameters to match the existing module APIs.

🐛 Proposed fix
     /// Returns `None`
     pub fn get_record_fields_in_ctxt(
+        _: usize,
         _: &mut HashSet<Name>,
         _: NamespaceRef,
     ) -> Option<Vec<RecordField>> {
         None
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Returns `None`
pub fn get_record_fields_in_ctxt(
_: &mut HashSet<Name>,
_: NamespaceRef,
) -> Option<Vec<RecordField>> {
None
}
/// Returns `None`
pub fn get_record_fields_in_ctxt(
_: usize,
_: &mut HashSet<Name>,
_: NamespaceRef,
) -> Option<Vec<RecordField>> {
None
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/with.rs` around lines 770 - 776, The function
get_record_fields_in_ctxt currently has a mismatched API (only two params) —
update its signature to accept the third usize parameter to match the other
modules: change get_record_fields_in_ctxt(_: &mut HashSet<Name>, _:
NamespaceRef) -> Option<Vec<RecordField>> to include a third usize argument
(e.g. _: usize) and propagate/ignore it consistently in the body (still
returning None); ensure the symbol name get_record_fields_in_ctxt and return
type Option<Vec<RecordField>> remain unchanged so callers expecting the 3-arg
form compile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants