512: feat!: Support enums and tuples in SchemaAwareSerializer and implement SchemaAwareDeserializer#65
512: feat!: Support enums and tuples in SchemaAwareSerializer and implement SchemaAwareDeserializer#65martin-augment wants to merge 12 commits intomainfrom
SchemaAwareSerializer and implement SchemaAwareDeserializer#65Conversation
… documented model (no tests)
…ng `#[avro(with)]`
…se `fake_variadic`
--------- Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
WalkthroughThis pull request adds schema-aware serialization and deserialization infrastructure to the apache_avro crate. It introduces new ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| ); | ||
|
|
||
| self.block.read_next_deser(self.reader_schema) | ||
| } |
There was a problem hiding this comment.
Reader passes reader_schema causing unexpected panic
High Severity
read_next_deser passes self.reader_schema directly to block.read_next_deser, but block.read_next_deser panics whenever read_schema.is_some(). When a user provides a reader_schema that equals the writer_schema, should_resolve_schema is false (so the assert passes), but the block still receives Some(...) and panics. The existing read_next method correctly passes None when !should_resolve_schema, but read_next_deser does not replicate this filtering logic.
Additional Locations (1)
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Bugbot AI reviewer is correct! The new serde deserializer does not support reader schema and this is explained in a TODO. But instead of panicking and crashing the application it would be better to return an Err.
| && union.is_nullable() | ||
| { | ||
| let index = zag_i32(self.reader)?; | ||
| let schema = union.get_variant(index as usize)?; |
There was a problem hiding this comment.
Unsafe as usize cast on potentially negative index
Medium Severity
In deserialize_option, index as usize is used to convert the result of zag_i32 without checking for negative values. Elsewhere in this file (e.g., with_union on line 137), usize::try_from(index) is correctly used to produce a proper error on negative indices. With corrupt data, the as usize cast silently wraps a negative value, leading to a confusing out-of-bounds error message instead of the expected ConvertI32ToUsize error.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Bugbot AI reviewer is correct! Any casting from a wider integer type to a narrower one should be done by using the checked APIs and report Err in case the value does not fit in the target type. Prevents silently overflowing the value and using a totally wrong value for any following operations with that number.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces schema awareness to the serialization and deserialization processes, enabling more robust and reliable data handling. It includes support for enums and tuples, along with comprehensive documentation to guide developers in mapping Avro schemas to Rust types and vice versa. The changes enhance the library's ability to work with complex data structures while maintaining type safety. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
🤖 Augment PR SummarySummary: This PR expands schema-aware Serde support by adding a raw-Avro Changes:
Technical Notes: Several reader-schema resolution paths are still marked WIP (panicking when resolution would be required), and documentation modules were added to clarify Serde↔Avro data model mappings. 🤖 Was this summary useful? React with 👍 or 👎 |
| "Schema aware deserialisation does not resolve schemas yet" | ||
| ); | ||
|
|
||
| self.block.read_next_deser(self.reader_schema) |
There was a problem hiding this comment.
read_next_deser always forwards self.reader_schema into Block::read_next_deser, but Block::read_next_deser panics whenever read_schema.is_some() (even when should_resolve_schema is false because schemas match). This makes Reader::into_deser_iter panic for a valid configured reader_schema that happens to equal the writer schema.
Severity: high
Other Locations
avro/src/reader/block.rs:232
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Augment AI reviewer is correct! The new serde deserializer does not support reader schema and this is explained in a TODO. But instead of panicking and crashing the application it would be better to return an Err.
| && union.is_nullable() | ||
| { | ||
| let index = zag_i32(self.reader)?; | ||
| let schema = union.get_variant(index as usize)?; |
There was a problem hiding this comment.
index is an i32 but is cast to usize (index as usize), which can wrap negative values and yield confusing out-of-bounds behavior. Other union paths here use usize::try_from(index) with a dedicated conversion error, which seems safer/consistent.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Augment AI reviewer is correct! Any casting from a wider integer type to a narrower one should be done by using the checked APIs and report Err in case the value does not fit in the target type. Prevents silently overflowing the value and using a totally wrong value for any following operations with that number.
| { | ||
| match self.schema { | ||
| Schema::Record(record) if record.name.name() == name && record.fields.len() == len => { | ||
| visitor.visit_map(RecordDeserializer::new(self.reader, record, self.config)) |
There was a problem hiding this comment.
deserialize_tuple_struct calls visitor.visit_map(...), but tuple-struct Deserialize implementations typically expect sequence access from deserialize_tuple_struct. If the visitor doesn’t implement visit_map, tuple-struct deserialization from Avro records will fail at runtime.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| | Schema::BigDecimal | ||
| | Schema::Decimal(_) | ||
| | Schema::Duration(_) | ||
| | Schema::Date, |
There was a problem hiding this comment.
Schema::Date is included in the branch that serializes string defaults as bytes, which would accept an invalid default type for date and write an incompatible encoding. It seems Schema::Date should only accept numeric defaults (like the Schema::Int | Schema::Date | Schema::TimeMillis arm above).
Severity: low
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Augment AI reviewer is correct! The Avro specification does not support encoding dates (and times) as strings, so this branch should be removed. Dates are encoded as numbers (seconds since epoch), so the default value must be an integer
PR Review: feat!: Support enums and tuples in SchemaAwareSerializer and implement SchemaAwareDeserializerThis is a substantial PR adding enum/tuple support to the serializer and implementing the full schema-aware deserializer. The overall design is solid and the modular decomposition (separate files for Bugs / Correctness1. Unsafe cast in In let index = zag_i32(self.reader)?;
let schema = union.get_variant(index as usize)?;Every other place in this PR uses 2.
API Design Concerns3. Panics in public iterator path — these should return
4. Breaking changes need prominent documentation Two silent breaking changes are buried in the diff:
Both should be prominently documented in the PR description / CHANGELOG / migration guide. Minor Issues5. Error message regression Old error messages included actionable hints like:
The new messages drop these hints. Tests were updated to match the new (less helpful) wording. Consider preserving the actionability of these errors. 6. TODOs that affect correctness
Both are worth tracking as follow-up issues if not in scope for this PR. 7.
8. Lines like Positive Highlights
SummaryMain blocking issues: #3 (panics instead of errors in the public iterator) and #4 (undocumented breaking changes). Issues #1 and #5 are moderate. The rest are minor polish. |
There was a problem hiding this comment.
Code Review
This is a substantial pull request that introduces a new schema-aware serializer and deserializer, refactors many parts of the library to use them, and improves the Avro schema mapping for several Rust types like [T; N] and core::time::Duration. The changes are well-structured and accompanied by extensive tests and new documentation, which is commendable.
My main feedback is regarding a regression in the quality of diagnostic error messages for mismatched types during serialization, particularly for uuid::Uuid. The new error messages are more generic and less helpful than before. I've left specific comments with suggestions on how to improve them.
| writer.write(uuid, &mut buffer).unwrap_err().to_string(), | ||
| "Failed to serialize value of type bytes using schema Uuid(String): 55e840e29b41d4a7164466554400. Cause: Expected a string, but got 16 bytes. Did you mean to use `Schema::Uuid(UuidSchema::Fixed)` or `utils::serde_set_human_readable(true)`?" | ||
| "Failed to serialize value of type `bytes` using Schema::Uuid(String): Expected Schema::Bytes | Schema::Fixed | Schema::BigDecimal | Schema::Decimal | Schema::Uuid(Bytes | Fixed) | Schema::Duration" | ||
| ); |
There was a problem hiding this comment.
The updated error message is less informative than the previous one. The old message was very specific, guiding the user on how to fix the issue by suggesting Schema::Uuid(UuidSchema::Fixed) or setting human_readable to true. The new message is a generic list of expected schemas for byte serialization.
While this is a consequence of the new generic serializer, it's a regression in diagnostics. Could we restore a more specific error message for this case? For example, by adding a special check for Schema::Uuid in serialize_bytes within avro/src/serde/ser_schema/mod.rs when the schema doesn't match.
| writer.write(uuid, &mut buffer).unwrap_err().to_string(), | ||
| r#"Failed to serialize value of type string using schema Uuid(Bytes): 550e8400-e29b-41d4-a716-446655440000. Cause: Expected bytes but got a string. Did you mean to use `Schema::Uuid(UuidSchema::String)` or `utils::serde_set_human_readable(false)`?"# | ||
| r#"Failed to serialize value of type `str` using Schema::Uuid(Bytes): Expected Schema::String | Schema::Uuid(String)"# | ||
| ); |
There was a problem hiding this comment.
Similar to my other comment, the new error message here is a regression in terms of helpfulness. The previous error message clearly stated the problem (expected bytes, got a string) and suggested solutions (Schema::Uuid(UuidSchema::String) or setting human_readable to false). The new message is a generic list of schemas that can handle a string.
It would be great to improve the diagnostics here to be as helpful as they were before. This could likely be addressed by adding more specific checks within serialize_str in avro/src/serde/ser_schema/mod.rs for Uuid-related schemas.
| writer.write(uuid, &mut buffer).unwrap_err().to_string(), | ||
| r#"Failed to serialize value of type string using schema Uuid(Fixed(FixedSchema { name: Name { name: "uuid", .. }, size: 16, .. })): 550e8400-e29b-41d4-a716-446655440000. Cause: Expected bytes but got a string. Did you mean to use `Schema::Uuid(UuidSchema::String)` or `utils::serde_set_human_readable(false)`?"# | ||
| r#"Failed to serialize value of type `str` using Schema::Uuid(Fixed(FixedSchema { name: Name { name: "uuid", .. }, size: 16, .. })): Expected Schema::String | Schema::Uuid(String)"# | ||
| ); |
There was a problem hiding this comment.
There was a problem hiding this comment.
Actionable comments posted: 13
🧹 Nitpick comments (2)
avro_derive/src/attributes/mod.rs (1)
321-344: Consider clarifying the error message when default is auto-disabled.The logic to auto-disable
field_defaultwhen#[avro(with)]is used is sound—callingT::field_default()doesn't make sense when the schema comes from a different source.However, the interaction with the subsequent validation (lines 336-344) could produce a confusing error. If a user writes:
#[avro(with)] #[serde(with = "my_module", skip_serializing_if = "Option::is_none")] field: Option<MyType>,They'll receive an error stating
skip_serializing_ifis incompatible with#[avro(default = false)], even though they never explicitly setdefault = false. The auto-disabling behavior isn't obvious from the error message.Consider either:
- Providing a more specific error message when the default was auto-disabled due to
#[avro(with)]- Or documenting this interaction in the TODO mentioned on line 330
💡 Sketch of a clearer error message
+ let default_was_auto_disabled = with != With::Trait && avro.default == FieldDefault::Trait; // TODO: Implement a better way to do this (maybe if user specifies `#[avro(with)]` also use that for the default) // Disable getting the field default, if the schema is not retrieved from the field type if with != With::Trait && avro.default == FieldDefault::Trait { avro.default = FieldDefault::Disabled; } if ((serde.skip_serializing && !serde.skip_deserializing) || serde.skip_serializing_if.is_some()) && avro.default == FieldDefault::Disabled { - errors.push(syn::Error::new( - span, - "`#[serde(skip_serializing)]` and `#[serde(skip_serializing_if)]` are incompatible with `#[avro(default = false)]`" - )); + let msg = if default_was_auto_disabled { + "`#[serde(skip_serializing)]` and `#[serde(skip_serializing_if)]` require a default value, but `#[avro(with)]` disables auto-default. Please provide an explicit `#[avro(default = \"...\")]`" + } else { + "`#[serde(skip_serializing)]` and `#[serde(skip_serializing_if)]` are incompatible with `#[avro(default = false)]`" + }; + errors.push(syn::Error::new(span, msg)); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@avro_derive/src/attributes/mod.rs` around lines 321 - 344, The validation error about incompatibility with `#[avro(default = false)]` can be misleading when the default was auto-disabled because `#[avro(with)]` was used; update the check around serde.skip_serializing/skip_serializing_if to detect that `with != With::Trait` and that the field default was originally `FieldDefault::Trait` (i.e., auto-disabled to `FieldDefault::Disabled`) and, in that case, push a clearer error via `errors.push(syn::Error::new(span, ...))` that mentions the default was auto-disabled due to `#[avro(with)]` (or alternatively include that context in the existing message), using the same symbols (`With::from_avro_and_serde`, `With::Trait`, `avro.default`, `FieldDefault::Trait`/`Disabled`, `serde.skip_serializing`, `serde.skip_serializing_if`, and `span`) so callers know why `default` became disabled.avro/src/reader/block.rs (1)
218-254: Consider extracting duplicated infinite-loop protection logic.Lines 247-250 duplicate the same check from
read_next(lines 209-212). Consider extracting this into a helper method to reduce duplication.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@avro/src/reader/block.rs` around lines 218 - 254, Duplicate infinite-loop protection logic in read_next_deser and read_next should be extracted into a shared helper: create a method (e.g., check_no_progress_or_error or ensure_progress) that takes the original byte count and the current slice length (or both as usize), performs the b_original != 0 && b_original == current_len check, and returns AvroResult<()> with Err(Details::ReadBlock.into()) on failure; call this helper from read_next_deser (replacing lines 247-250) and from read_next (replacing lines 209-212), and update callers to adjust buf_idx and message_count only after the helper confirms progress.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@avro/src/documentation/serde_data_model_to_avro.rs`:
- Around line 56-57: The doc comment in serde_data_model_to_avro.rs contains a
stray backtick in the sentence "This is different from normal tuples`."—remove
the stray backtick so the sentence ends "normal tuples." (Locate the comment
block in serde_data_model_to_avro.rs around the tuple-struct note and edit the
text accordingly.)
In `@avro/src/reader/block.rs`:
- Around line 232-234: Replace the panic! in the branch that checks read_schema
(the SchemaAwareResolvingDeserializer TODO) with returning a proper error result
instead; add a new Details enum variant like SchemaResolutionNotImplemented with
#[error("Schema-aware deserialization does not support schema resolution yet")]
and then change the code in block.rs (the branch that currently calls panic!) to
return Err(Details::SchemaResolutionNotImplemented.into()) (or construct the
crate's error type expected by the surrounding function) so callers can handle
the unimplemented feature gracefully.
In `@avro/src/reader/datum.rs`:
- Around line 145-160: read_deser currently panics any time a reader schema is
present; change it to only error when the reader schema actually differs from
the writer schema. In read_deser, inspect self.reader (the Some((reader_schema,
_)) tuple) and compare reader_schema to self.writer; if they are identical,
proceed with the same SchemaAwareDeserializer path you use in the else branch
(using Config { names: self.resolved.get_names(), human_readable:
self.human_readable }), otherwise return an AvroResult::Err (e.g. an AvroError
indicating schema resolution is not implemented) instead of unconditionally
panicking; references: read_deser, self.reader, self.writer,
SchemaAwareDeserializer, self.resolved.get_names().
In `@avro/src/reader/mod.rs`:
- Around line 133-140: Replace the panic assert in read_next_deser with
returning an AvroResult::Err so the iterator yields an error instead of
panicking: check the should_resolve_schema flag in fn read_next_deser and if
true return an appropriate Avro error (e.g., a variant indicating
schema-resolving deserialization is not implemented or unsupported) wrapped in
AvroResult::Err; otherwise proceed to call
self.block.read_next_deser(self.reader_schema). Ensure the returned error type
matches the crate's AvroError/AvroResult conventions so callers receive a
recoverable Err instead of a panic.
In `@avro/src/serde/derive.rs`:
- Around line 809-815: The synthesized helper record names (created where
T::get_schema_in_ctxt is used and via Name::new_with_enclosing_namespace) can
collide with user-defined types because they live in the caller's namespace and
are directly checked against named_schemas; change the name-generation to
guarantee uniqueness by either: (a) creating the helper names in a dedicated
internal/synthetic namespace instead of enclosing_namespace, or (b)
loop-appending a deterministic collision suffix (e.g., incrementing counter or
short hash/UUID) until Name::new_with_enclosing_namespace returns a name not
present in named_schemas; apply this change to both the tuple/array helper
generation sites (the block using format!("A{N}_{}",
t_schema.unique_normalized_name()) and the analogous T{len} generation) and
ensure subsequent code emits the newly-reserved name rather than producing a
Schema::Ref to an existing user type.
In `@avro/src/serde/deser_schema/mod.rs`:
- Around line 1474-1476: The test constructs value2 using
TupleExternalEnum::Val1 again, so the tuple-variant path for the three-field
variant is never exercised; update the TestTupleExternalEnum construction for
value2 to use TupleExternalEnum::Val2 with three fields (matching the enum's
three-field tuple variant) so the Val2 branch is executed (refer to
TestTupleExternalEnum, TupleExternalEnum::Val2, and value2).
- Around line 633-645: In deserialize_tuple_struct, the code incorrectly calls
visitor.visit_map for tuple-structs; change it to call visitor.visit_seq so
Serde's tuple-struct visitors are satisfied. Specifically, in the match arm
handling Schema::Record where record.name.name() == name && record.fields.len()
== len, replace the visitor.visit_map(RecordDeserializer::new(self.reader,
record, self.config)) call with visitor.visit_seq using the same
RecordDeserializer instance (same params), mirroring how deserialize_tuple uses
visit_seq with ManyTupleDeserializer. Ensure the rest of the method's branches
remain unchanged.
In `@avro/src/serde/deser_schema/tuple.rs`:
- Around line 105-120: In next_element_seed, avoid incrementing current_field
before deserialization: retrieve schema using
self.schema.fields[self.current_field].schema, create the
SchemaAwareDeserializer and call seed.deserialize first (propagating any error),
and only after deserialize succeeds increment self.current_field and return
Some(value); this ensures SchemaAwareDeserializer::new or seed.deserialize
failures do not advance current_field and preserves caller state for retries or
inspection.
In `@avro/src/serde/ser_schema/record/field_default.rs`:
- Around line 83-92: The match on Value::String for byte-like schemas (the arm
matching Schema::Fixed, Decimal with InnerDecimalSchema::Fixed, Uuid::Fixed, and
Duration) should validate length by Unicode code points (s.chars().count()) not
UTF-8 bytes, and the serialization should convert each char to a raw byte by
mapping char as u32 & 0xFF into a Vec<u8> (instead of using s.as_bytes()); also
remove Schema::Date from the string-handling pattern so date logical types are
not treated as byte-strings. Locate and update the match arm(s) in
field_default.rs where Value::String(s) is validated and serialized (the
fixed/decimal/uuid/duration string branches) to use chars().count() for length
checks and iterate chars to produce bytes, and drop Schema::Date from that
pattern.
In `@avro/src/serde/ser_schema/tuple.rs`:
- Around line 282-297: The OneUnionTupleSerializer SerializeTuple impl currently
doesn't enforce exactly one element; add an element counter field (e.g.,
elements_written or element_count) to OneUnionTupleSerializer, increment it
inside serialize_element (alongside bytes_written), and return an appropriate
Error if serialize_element is called when the counter is already 1 (prevent >1
elements). In end(), validate the counter is exactly 1 and return an Error if
it's 0; update serialize_element and end to use this counter to enforce the
single-element requirement while keeping bytes_written logic intact.
- Around line 235-262: OneTupleSerializer currently doesn't enforce that exactly
one element was serialized; add a counter field (e.g., elements_serialized:
usize) to OneTupleSerializer, increment it in serialize_element (alongside
updating bytes_written), and change end() to validate that elements_serialized
== 1 returning an appropriate Err(Error::... ) when it's 0 or >1 (mirroring the
validation logic used by ManyTupleSerializer) so invalid tuple lengths produce
an error instead of silently succeeding.
In `@avro/src/serde/ser_schema/union.rs`:
- Around line 258-310: The serialize_bytes function currently only treats
SchemaKind::Bytes and fixed-size Schema::Fixed as byte-backed union variants, so
Schema::BigDecimal variants are ignored; update the logic in serialize_bytes
(including the BytesType::Bytes branch, the BytesType::Unset matching where
bytes_index is computed, and any error messages) to also consider
SchemaKind::BigDecimal as a bytes-backed variant (e.g., use
union.index_of_schema_kind for both SchemaKind::Bytes and SchemaKind::BigDecimal
or add a helper that checks either kind) so BigDecimal union members are
selected and handled with with_len semantics the same as bytes.
In `@avro/src/serde/with.rs`:
- Around line 546-550: The new helpers bigdecimal, bigdecimal_opt, array, and
array_opt currently define get_record_fields_in_ctxt without the leading usize
parameter, breaking the ABI; update each helper's get_record_fields_in_ctxt
signature to accept the unused leading usize parameter (i.e. change fn
get_record_fields_in_ctxt(_: &mut HashSet<Name>, _: NamespaceRef) ->
Option<Vec<RecordField>> to fn get_record_fields_in_ctxt(_: usize, _: &mut
HashSet<Name>, _: NamespaceRef) -> Option<Vec<RecordField>>) so they match the
existing helpers' signature and remain backward compatible with callers and the
macro-generated 3-argument form.
---
Nitpick comments:
In `@avro_derive/src/attributes/mod.rs`:
- Around line 321-344: The validation error about incompatibility with
`#[avro(default = false)]` can be misleading when the default was auto-disabled
because `#[avro(with)]` was used; update the check around
serde.skip_serializing/skip_serializing_if to detect that `with != With::Trait`
and that the field default was originally `FieldDefault::Trait` (i.e.,
auto-disabled to `FieldDefault::Disabled`) and, in that case, push a clearer
error via `errors.push(syn::Error::new(span, ...))` that mentions the default
was auto-disabled due to `#[avro(with)]` (or alternatively include that context
in the existing message), using the same symbols (`With::from_avro_and_serde`,
`With::Trait`, `avro.default`, `FieldDefault::Trait`/`Disabled`,
`serde.skip_serializing`, `serde.skip_serializing_if`, and `span`) so callers
know why `default` became disabled.
In `@avro/src/reader/block.rs`:
- Around line 218-254: Duplicate infinite-loop protection logic in
read_next_deser and read_next should be extracted into a shared helper: create a
method (e.g., check_no_progress_or_error or ensure_progress) that takes the
original byte count and the current slice length (or both as usize), performs
the b_original != 0 && b_original == current_len check, and returns
AvroResult<()> with Err(Details::ReadBlock.into()) on failure; call this helper
from read_next_deser (replacing lines 247-250) and from read_next (replacing
lines 209-212), and update callers to adjust buf_idx and message_count only
after the helper confirms progress.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 66191e49-13e3-4c2e-a621-d0c50719d431
📒 Files selected for processing (42)
avro/examples/test_interop_single_object_encoding.rsavro/src/bigdecimal.rsavro/src/documentation/avro_data_model_to_serde.rsavro/src/documentation/mod.rsavro/src/documentation/serde_data_model_to_avro.rsavro/src/error.rsavro/src/lib.rsavro/src/reader/block.rsavro/src/reader/datum.rsavro/src/reader/mod.rsavro/src/reader/single_object.rsavro/src/schema/mod.rsavro/src/schema/union.rsavro/src/serde/derive.rsavro/src/serde/deser_schema/block.rsavro/src/serde/deser_schema/enums.rsavro/src/serde/deser_schema/identifier.rsavro/src/serde/deser_schema/mod.rsavro/src/serde/deser_schema/record.rsavro/src/serde/deser_schema/tuple.rsavro/src/serde/mod.rsavro/src/serde/ser_schema/block.rsavro/src/serde/ser_schema/mod.rsavro/src/serde/ser_schema/record/field_default.rsavro/src/serde/ser_schema/record/mod.rsavro/src/serde/ser_schema/tuple.rsavro/src/serde/ser_schema/union.rsavro/src/serde/with.rsavro/src/writer/datum.rsavro/src/writer/mod.rsavro/src/writer/single_object.rsavro/tests/avro-rs-226.rsavro/tests/avro-rs-285-bytes_deserialization.rsavro/tests/schema.rsavro/tests/serde_human_readable_false.rsavro/tests/serde_human_readable_true.rsavro/tests/union_schema.rsavro_derive/src/attributes/mod.rsavro_derive/tests/derive.proptest-regressionsavro_derive/tests/derive.rsavro_derive/tests/serde.rslicenserc.toml
| //! - **Note:** Tuple structs with 0 or 1 elements will also be (de)serialized as a [`Schema::Record`]. This | ||
| //! is different from normal tuples`. |
There was a problem hiding this comment.
Minor typo: stray backtick before period.
Line 57 has "normal tuples`." with a backtick before the period that appears unintentional.
Suggested fix
-//! - **Note:** Tuple structs with 0 or 1 elements will also be (de)serialized as a [`Schema::Record`]. This
-//! is different from normal tuples`.
+//! - **Note:** Tuple structs with 0 or 1 elements will also be (de)serialized as a [`Schema::Record`]. This
+//! is different from normal tuples.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| //! - **Note:** Tuple structs with 0 or 1 elements will also be (de)serialized as a [`Schema::Record`]. This | |
| //! is different from normal tuples`. | |
| //! - **Note:** Tuple structs with 0 or 1 elements will also be (de)serialized as a [`Schema::Record`]. This | |
| //! is different from normal tuples. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/documentation/serde_data_model_to_avro.rs` around lines 56 - 57, The
doc comment in serde_data_model_to_avro.rs contains a stray backtick in the
sentence "This is different from normal tuples`."—remove the stray backtick so
the sentence ends "normal tuples." (Locate the comment block in
serde_data_model_to_avro.rs around the tuple-struct note and edit the text
accordingly.)
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback: The CodeRabbit AI reviewer is correct! There is a trailing backtick in the sentence and it may break the rendering of the rustdoc. It should be removed.
| let item = if read_schema.is_some() { | ||
| // TODO: Implement SchemaAwareResolvingDeserializer | ||
| panic!("Schema aware deserialisation does not resolve schemas yet"); |
There was a problem hiding this comment.
Replace panic! with a proper error return.
Using panic! in library code for an unimplemented feature is problematic — it crashes the application without giving callers a chance to handle the situation gracefully. Return an error instead.
🛡️ Proposed fix
- let item = if read_schema.is_some() {
- // TODO: Implement SchemaAwareResolvingDeserializer
- panic!("Schema aware deserialisation does not resolve schemas yet");
- } else {
+ let item = if let Some(_read_schema) = read_schema {
+ return Err(crate::error::Details::SchemaResolutionNotImplemented.into());
+ } else {You'll need to add the corresponding error variant to Details:
#[error("Schema-aware deserialization does not support schema resolution yet")]
SchemaResolutionNotImplemented,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/reader/block.rs` around lines 232 - 234, Replace the panic! in the
branch that checks read_schema (the SchemaAwareResolvingDeserializer TODO) with
returning a proper error result instead; add a new Details enum variant like
SchemaResolutionNotImplemented with #[error("Schema-aware deserialization does
not support schema resolution yet")] and then change the code in block.rs (the
branch that currently calls panic!) to return
Err(Details::SchemaResolutionNotImplemented.into()) (or construct the crate's
error type expected by the surrounding function) so callers can handle the
unimplemented feature gracefully.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The new serde deserializer does not support reader schema and this is explained in a TODO. But instead of panicking and crashing the application it would be better to return an Err.
| pub fn read_deser<T: DeserializeOwned>(&self, reader: &mut impl Read) -> AvroResult<T> { | ||
| // `reader` is `impl Read` instead of a generic on the function like T so it's easier to | ||
| // specify the type wanted (`read_deser<String>` vs `read_deser<String, _>`) | ||
| if let Some((_, _)) = &self.reader { | ||
| // TODO: Implement SchemaAwareResolvingDeserializer | ||
| panic!("Schema aware deserialisation does not resolve schemas yet"); | ||
| } else { | ||
| T::deserialize(SchemaAwareDeserializer::new( | ||
| reader, | ||
| self.writer, | ||
| Config { | ||
| names: self.resolved.get_names(), | ||
| human_readable: self.human_readable, | ||
| }, | ||
| )?) | ||
| } |
There was a problem hiding this comment.
Only reject actual schema adaptation here.
This panics whenever a reader schema was configured, even in the no-op case where the reader schema is equivalent to the writer schema. That makes read_deser stricter than read_value and turns an AvroResult<T> API into a panic path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/reader/datum.rs` around lines 145 - 160, read_deser currently panics
any time a reader schema is present; change it to only error when the reader
schema actually differs from the writer schema. In read_deser, inspect
self.reader (the Some((reader_schema, _)) tuple) and compare reader_schema to
self.writer; if they are identical, proceed with the same
SchemaAwareDeserializer path you use in the else branch (using Config { names:
self.resolved.get_names(), human_readable: self.human_readable }), otherwise
return an AvroResult::Err (e.g. an AvroError indicating schema resolution is not
implemented) instead of unconditionally panicking; references: read_deser,
self.reader, self.writer, SchemaAwareDeserializer, self.resolved.get_names().
There was a problem hiding this comment.
value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The new serde deserializer does not support reader schema and this is explained in a TODO. But instead of panicking and crashing the application it would be better to return an Err.
| fn read_next_deser<T: DeserializeOwned>(&mut self) -> AvroResult<Option<T>> { | ||
| // TODO: Implement SchemaAwareResolvingDeserializer | ||
| assert!( | ||
| !self.should_resolve_schema, | ||
| "Schema aware deserialisation does not resolve schemas yet" | ||
| ); | ||
|
|
||
| self.block.read_next_deser(self.reader_schema) |
There was a problem hiding this comment.
Return an Err here instead of panicking.
ReaderDeser promises Iterator<Item = AvroResult<T>>, but this assert! breaks that contract and panics on an unsupported configuration instead of surfacing the failure as the iterator's first error.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/reader/mod.rs` around lines 133 - 140, Replace the panic assert in
read_next_deser with returning an AvroResult::Err so the iterator yields an
error instead of panicking: check the should_resolve_schema flag in fn
read_next_deser and if true return an appropriate Avro error (e.g., a variant
indicating schema-resolving deserialization is not implemented or unsupported)
wrapped in AvroResult::Err; otherwise proceed to call
self.block.read_next_deser(self.reader_schema). Ensure the returned error type
matches the crate's AvroError/AvroResult conventions so callers receive a
recoverable Err instead of a panic.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The new serde deserializer does not support reader schema and this is explained in a TODO. But instead of panicking and crashing the application it would be better to return an Err.
| let t_schema = T::get_schema_in_ctxt(named_schemas, enclosing_namespace); | ||
| let name = Name::new_with_enclosing_namespace( | ||
| format!("A{N}_{}", t_schema.unique_normalized_name()), | ||
| enclosing_namespace, | ||
| ) | ||
| .expect("Name is valid"); | ||
| if named_schemas.contains(&name) { |
There was a problem hiding this comment.
Reserve a collision-proof name for synthesized tuple/array records.
A{N}_... and T{len}_... are created in the caller's namespace and checked against named_schemas like ordinary user types. If a real record already has that name, this code will silently emit a Schema::Ref to the existing type instead of the generated helper record.
Also applies to: 919-924
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/serde/derive.rs` around lines 809 - 815, The synthesized helper
record names (created where T::get_schema_in_ctxt is used and via
Name::new_with_enclosing_namespace) can collide with user-defined types because
they live in the caller's namespace and are directly checked against
named_schemas; change the name-generation to guarantee uniqueness by either: (a)
creating the helper names in a dedicated internal/synthetic namespace instead of
enclosing_namespace, or (b) loop-appending a deterministic collision suffix
(e.g., incrementing counter or short hash/UUID) until
Name::new_with_enclosing_namespace returns a name not present in named_schemas;
apply this change to both the tuple/array helper generation sites (the block
using format!("A{N}_{}", t_schema.unique_normalized_name()) and the analogous
T{len} generation) and ensure subsequent code emits the newly-reserved name
rather than producing a Schema::Ref to an existing user type.
| ( | ||
| Value::String(s), | ||
| Schema::Fixed(fixed) | ||
| | Schema::Decimal(DecimalSchema { | ||
| inner: InnerDecimalSchema::Fixed(fixed), | ||
| .. | ||
| }) | ||
| | Schema::Uuid(UuidSchema::Fixed(fixed)) | ||
| | Schema::Duration(fixed), | ||
| ) => s.len() == fixed.size, |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
According to the Apache Avro specification, how are record field defaults for bytesandfixeddecoded from JSON strings, and aredate logical-type defaults represented as JSON numbers rather than strings?
💡 Result:
According to the Apache Avro specification, record field defaults for bytes and fixed types are provided as JSON strings. These strings use Unicode code points 0-255 mapped directly to unsigned 8-bit byte values 0-255 (e.g., "\u00FF" for the byte 255). When decoding the schema from JSON, these strings are interpreted as the corresponding byte sequences or fixed values. For date logical types (which annotate an underlying int type representing days since the Unix epoch), defaults are represented as JSON numbers (integers), following the general rule for int/long field defaults, rather than strings. Examples from schema usage confirm defaults like 1502250227187 for timestamp logical types (on long) and similar integer defaults for date on int. The specification's field default table lists int,long as JSON integer, with no special override for logical types, which are schema attributes ignored during default value parsing.
Citations:
- 1: https://avro.apache.org/docs/1.11.1/specification/
- 2: https://avro.apache.org/docs/1.5.1/spec.html
- 3: https://avro.apache.org/docs/%2B%2Bversion%2B%2B/specification/
- 4: https://avro.apache.org/docs/1.11.4/specification/
🏁 Script executed:
cat -n avro/src/serde/ser_schema/record/field_default.rs | sed -n '75,100p'Repository: martin-augment/avro-rs
Length of output: 1441
🏁 Script executed:
cat -n avro/src/serde/ser_schema/record/field_default.rs | sed -n '145,170p'Repository: martin-augment/avro-rs
Length of output: 1381
Decode Avro byte defaults by code point, not UTF-8.
For bytes/fixed defaults, Avro maps JSON string code points 0-255 to raw bytes 0-255. The code at line 92 uses s.len() to validate size, which counts UTF-8 bytes rather than code points—so "\u00FF" (1 code point, 2 UTF-8 bytes) would fail validation. Similarly, line 160 uses s.as_bytes() to serialize, which produces UTF-8 bytes instead of the raw byte values specified by Avro.
Line 159 also incorrectly includes Schema::Date in the byte-string pattern. Per the Avro specification, date logical-type defaults are JSON numbers (integers), not strings, and should not be handled by string-based serialization.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/serde/ser_schema/record/field_default.rs` around lines 83 - 92, The
match on Value::String for byte-like schemas (the arm matching Schema::Fixed,
Decimal with InnerDecimalSchema::Fixed, Uuid::Fixed, and Duration) should
validate length by Unicode code points (s.chars().count()) not UTF-8 bytes, and
the serialization should convert each char to a raw byte by mapping char as u32
& 0xFF into a Vec<u8> (instead of using s.as_bytes()); also remove Schema::Date
from the string-handling pattern so date logical types are not treated as
byte-strings. Locate and update the match arm(s) in field_default.rs where
Value::String(s) is validated and serialized (the fixed/decimal/uuid/duration
string branches) to use chars().count() for length checks and iterate chars to
produce bytes, and drop Schema::Date from that pattern.
| impl<'s, 'w, W: Write, S: Borrow<Schema>> SerializeTuple for OneTupleSerializer<'s, 'w, W, S> { | ||
| type Ok = usize; | ||
| type Error = Error; | ||
|
|
||
| fn serialize_element<T>(&mut self, value: &T) -> Result<(), Self::Error> | ||
| where | ||
| T: ?Sized + Serialize, | ||
| { | ||
| match self.schema { | ||
| Schema::Union(union) => { | ||
| self.bytes_written += | ||
| value.serialize(UnionSerializer::new(self.writer, union, self.config))?; | ||
| } | ||
| schema => { | ||
| self.bytes_written += value.serialize(SchemaAwareSerializer::new( | ||
| self.writer, | ||
| schema, | ||
| self.config, | ||
| )?)?; | ||
| } | ||
| } | ||
| Ok(()) | ||
| } | ||
|
|
||
| fn end(self) -> Result<Self::Ok, Self::Error> { | ||
| Ok(self.bytes_written) | ||
| } | ||
| } |
There was a problem hiding this comment.
OneTupleSerializer doesn't validate that exactly one element was serialized.
Unlike ManyTupleSerializer which validates element count in end(), OneTupleSerializer::end() unconditionally returns success. If zero or more than one element is serialized, this could lead to incorrect Avro output without error.
💡 Proposed fix
struct OneTupleSerializer<'s, 'w, W: Write, S: Borrow<Schema>> {
writer: &'w mut W,
schema: &'s Schema,
config: Config<'s, S>,
bytes_written: usize,
+ element_written: bool,
}
impl<'s, 'w, W: Write, S: Borrow<Schema>> OneTupleSerializer<'s, 'w, W, S> {
pub fn new(
writer: &'w mut W,
schema: &'s Schema,
config: Config<'s, S>,
bytes_written: Option<usize>,
) -> Self {
Self {
writer,
schema,
config,
bytes_written: bytes_written.unwrap_or(0),
+ element_written: false,
}
}
}
impl<'s, 'w, W: Write, S: Borrow<Schema>> SerializeTuple for OneTupleSerializer<'s, 'w, W, S> {
// ...
fn serialize_element<T>(&mut self, value: &T) -> Result<(), Self::Error>
where
T: ?Sized + Serialize,
{
+ if self.element_written {
+ return Err(Error::new(Details::SerializeTupleTooManyElements {
+ expected: 1,
+ actual: 2,
+ }));
+ }
match self.schema {
// ... existing code ...
}
+ self.element_written = true;
Ok(())
}
fn end(self) -> Result<Self::Ok, Self::Error> {
+ if !self.element_written {
+ return Err(Error::new(Details::SerializeTupleMissingElements {
+ position: 0,
+ total_elements: 1,
+ }));
+ }
Ok(self.bytes_written)
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/serde/ser_schema/tuple.rs` around lines 235 - 262,
OneTupleSerializer currently doesn't enforce that exactly one element was
serialized; add a counter field (e.g., elements_serialized: usize) to
OneTupleSerializer, increment it in serialize_element (alongside updating
bytes_written), and change end() to validate that elements_serialized == 1
returning an appropriate Err(Error::... ) when it's 0 or >1 (mirroring the
validation logic used by ManyTupleSerializer) so invalid tuple lengths produce
an error instead of silently succeeding.
| impl<'s, 'w, W: Write, S: Borrow<Schema>> SerializeTuple for OneUnionTupleSerializer<'s, 'w, W, S> { | ||
| type Ok = usize; | ||
| type Error = Error; | ||
|
|
||
| fn serialize_element<T>(&mut self, value: &T) -> Result<(), Self::Error> | ||
| where | ||
| T: ?Sized + Serialize, | ||
| { | ||
| self.bytes_written += | ||
| value.serialize(UnionSerializer::new(self.writer, self.union, self.config))?; | ||
| Ok(()) | ||
| } | ||
|
|
||
| fn end(self) -> Result<Self::Ok, Self::Error> { | ||
| Ok(self.bytes_written) | ||
| } |
There was a problem hiding this comment.
Same issue: OneUnionTupleSerializer doesn't validate element count.
Similar to OneTupleSerializer, this should validate that exactly one element is serialized.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/serde/ser_schema/tuple.rs` around lines 282 - 297, The
OneUnionTupleSerializer SerializeTuple impl currently doesn't enforce exactly
one element; add an element counter field (e.g., elements_written or
element_count) to OneUnionTupleSerializer, increment it inside serialize_element
(alongside bytes_written), and return an appropriate Error if serialize_element
is called when the counter is already 1 (prevent >1 elements). In end(),
validate the counter is exactly 1 and return an Error if it's 0; update
serialize_element and end to use this counter to enforce the single-element
requirement while keeping bytes_written logic intact.
| fn serialize_bytes(mut self, v: &[u8]) -> Result<Self::Ok, Self::Error> { | ||
| let (index, with_len) = match SER_BYTES_TYPE.get() { | ||
| BytesType::Bytes => { | ||
| if let Some(index) = self.union.index_of_schema_kind(SchemaKind::Bytes) { | ||
| (index, true) | ||
| } else { | ||
| return Err(self.error("bytes", "Expected Schema::Bytes in variants")); | ||
| } | ||
| } | ||
| BytesType::Fixed => { | ||
| if let Some((index, _)) = self | ||
| .union | ||
| .find_fixed_of_size_n(v.len(), self.config.names)? | ||
| { | ||
| (index, false) | ||
| } else { | ||
| return Err(self.error( | ||
| "bytes", | ||
| format!("Expected Schema::Fixed(size: {}) in variants", v.len()), | ||
| )); | ||
| } | ||
| } | ||
| BytesType::Unset => { | ||
| let bytes_index = self.union.index_of_schema_kind(SchemaKind::Bytes); | ||
| let fixed_index = self | ||
| .union | ||
| .find_fixed_of_size_n(v.len(), self.config.names)?; | ||
| // Find the first variant that matches the bytes or fixed | ||
| match (bytes_index, fixed_index) { | ||
| (Some(bytes_index), Some((fixed_index, _))) => { | ||
| (bytes_index.min(fixed_index), bytes_index < fixed_index) | ||
| } | ||
| (Some(bytes_index), None) => (bytes_index, true), | ||
| (None, Some((fixed_index, _))) => (fixed_index, false), | ||
| (None, None) => { | ||
| return Err(self.error( | ||
| "bytes", | ||
| format!( | ||
| "Expected Schema::Bytes | Schema::Fixed(size: {}) in variants", | ||
| v.len() | ||
| ), | ||
| )); | ||
| } | ||
| } | ||
| } | ||
| }; | ||
| let mut bytes_written = zig_i32(index as i32, &mut *self.writer)?; | ||
| if with_len { | ||
| bytes_written += self.write_bytes_with_len(v)?; | ||
| } else { | ||
| bytes_written += self.write_bytes(v)?; | ||
| } | ||
| Ok(bytes_written) |
There was a problem hiding this comment.
Schema::BigDecimal union members are unreachable from serialize_bytes.
This selector only matches SchemaKind::Bytes or fixed-size branches. A union containing Schema::BigDecimal therefore never selects its valid branch on the bytes path, so the new big-decimal adapters will still fail once they serialize into a non-null union. Please treat big-decimal as a bytes-backed union member here as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/serde/ser_schema/union.rs` around lines 258 - 310, The
serialize_bytes function currently only treats SchemaKind::Bytes and fixed-size
Schema::Fixed as byte-backed union variants, so Schema::BigDecimal variants are
ignored; update the logic in serialize_bytes (including the BytesType::Bytes
branch, the BytesType::Unset matching where bytes_index is computed, and any
error messages) to also consider SchemaKind::BigDecimal as a bytes-backed
variant (e.g., use union.index_of_schema_kind for both SchemaKind::Bytes and
SchemaKind::BigDecimal or add a helper that checks either kind) so BigDecimal
union members are selected and handled with with_len semantics the same as
bytes.
| pub fn get_record_fields_in_ctxt( | ||
| _: &mut HashSet<Name>, | ||
| _: NamespaceRef, | ||
| ) -> Option<Vec<RecordField>> { | ||
| None |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Current helper signatures in avro/src/serde/with.rs"
rg -n -C1 'pub fn get_record_fields_in_ctxt\s*\(' avro/src/serde/with.rs
echo
echo "Call sites / macro references outside avro/src/serde/with.rs"
rg -n -C2 '\bget_record_fields_in_ctxt\s*\(' -g '!avro/src/serde/with.rs'Repository: martin-augment/avro-rs
Length of output: 19287
Update new helper signatures to match existing get_record_fields_in_ctxt ABI.
The new bigdecimal, bigdecimal_opt, array, and array_opt helpers at lines 546, 619, 695, and 771 drop the leading usize parameter compared to all existing helpers in the file. This breaks the public API contract: all prior helpers expect (_: usize, _: &mut HashSet<Name>, _: NamespaceRef), but the new ones only expose (_: &mut HashSet<Name>, _: NamespaceRef). The macro generates 2-argument calls, so the new signature technically works with the current derive macro, but it makes these new modules incompatible with the established pattern and any code expecting the historical ABI. For consistency and backward compatibility, add the usize parameter to all new helpers, even if unused.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/serde/with.rs` around lines 546 - 550, The new helpers bigdecimal,
bigdecimal_opt, array, and array_opt currently define get_record_fields_in_ctxt
without the leading usize parameter, breaking the ABI; update each helper's
get_record_fields_in_ctxt signature to accept the unused leading usize parameter
(i.e. change fn get_record_fields_in_ctxt(_: &mut HashSet<Name>, _:
NamespaceRef) -> Option<Vec<RecordField>> to fn get_record_fields_in_ctxt(_:
usize, _: &mut HashSet<Name>, _: NamespaceRef) -> Option<Vec<RecordField>>) so
they match the existing helpers' signature and remain backward compatible with
callers and the macro-generated 3-argument form.
value:useful; category:bug; feedback: The Claude AI reviewer is correct! Any casting from a wider integer type to a narrower one should be done by using the checked APIs and report Err in case the value does not fit in the target type. Prevents silently overflowing the value and using a totally wrong value for any following operations with that number. |


512: To review by AI