Skip to content

477: feat: Allow types to provide default values#55

Open
martin-augment wants to merge 5 commits intomainfrom
pr-477-2026-02-25-06-30-31
Open

477: feat: Allow types to provide default values#55
martin-augment wants to merge 5 commits intomainfrom
pr-477-2026-02-25-06-30-31

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

477: To review by AI

Kriskras99 and others added 5 commits February 20, 2026 10:05
`AvroSchemaComponent` is extended with a function `field_default`
which will be called when deriving a record to set the default value
for a field. The default implementation is to return `None`, which
means no default.

On the derive side, it is now possible to specify a default for a type
using `#[avro(default = "..")]`. It is also possible to disable setting
a default for a field with `#[avro(default = false)]`.

This enables users to use `#[serde(skip_serializing{_if})]` on most fields
without having to provide a default value.
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 25, 2026

Walkthrough

This pull request adds field-level default value support to Avro schema generation. It introduces a new field_default() method to the AvroSchemaComponent trait with implementations across all types, returning None by default or Some(Null) for Option<T>. A new FieldDefault enum supports three modes: Disabled, Trait (delegating to type implementations), and Value (explicit JSON defaults). The derive macro now processes field defaults during schema generation, threading default expressions through the trait implementation. Additionally, UUID string-to-bytes conversion support is extended for Fixed schema contexts. Tests validate default handling across primitive types, collections, and complex wrappers.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-477-2026-02-25-06-30-31

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the apache-avro Rust library by introducing a more flexible and explicit mechanism for defining default values in generated Avro schemas. It allows developers to specify default JSON values directly on types and fields using new #[avro(default)] attributes, and provides a trait method for programmatic default value generation. This change improves schema generation accuracy, especially when integrating with serde's serialization skipping features, by ensuring that Avro schemas correctly reflect default field values.

Highlights

  • Type-level Default Values: Introduced a new #[avro(default = "...")] attribute for types, allowing them to define a default JSON value when used as a field in another record.
  • Enhanced Field-level Default Values: Enhanced the #[avro(default = "...")] attribute for fields, providing options to explicitly set a JSON default, disable default generation with #[avro(default = false)], or inherit the default from the type.
  • New field_default() Trait Method: Added a field_default() method to the AvroSchemaComponent trait, enabling types to programmatically provide their default Avro JSON value.
  • Updated avro_derive Macro: Updated the avro_derive macro to process the new #[avro(default)] attributes on both types and fields, and to generate the field_default() implementation.
  • Improved serde Integration: Refined the interaction between #[serde(skip_serializing)] and #[avro(default)], now requiring #[avro(default = false)] to explicitly disable default generation for skipped fields.
  • Expanded Test Coverage: Expanded test coverage with new unit and UI tests to validate the correct behavior of default value generation and attribute parsing.
Changelog
  • avro/src/serde/derive.rs
    • Updated documentation for #[avro(default = ...)] attribute on types to provide default values.
    • Revised documentation for #[avro(default = ...)] attribute on fields, detailing options for explicit JSON values, disabling defaults, and inheriting from type defaults.
    • Added field_default() method to AvroSchemaComponent trait with a default None implementation.
    • Implemented field_default() for various primitive types and Option<T>, with Option<T> returning Some(serde_json::Value::Null).
    • Updated impl_schema and impl_passthrough_schema macros to include field_default() implementation.
  • avro/src/types.rs
    • Added a conversion case for Value::String to Uuid when the schema is UuidSchema::Fixed.
  • avro_derive/src/attributes/avro.rs
    • Added a default: Option<String> field to ContainerAttributes for type-level defaults.
    • Changed FieldAttributes::default from Option<String> to the new FieldDefault enum.
  • avro_derive/src/attributes/mod.rs
    • Introduced FieldDefault enum (Trait, Disabled, Value(String)) to represent different default value strategies.
    • Updated NamedTypeOptions to include a default: TokenStream for type-level default implementations.
    • Modified FieldOptions::from_attributes to parse #[avro(default)] into FieldDefault and added validation for #[serde(skip_serializing)] combined with #[avro(default = false)].
  • avro_derive/src/lib.rs
    • Updated derive_avro_schema to pass the type-level field_default_impl to create_trait_definition.
    • Modified create_trait_definition to include the field_default() method implementation.
    • Adjusted get_struct_schema_def to handle FieldDefault enum when determining field default values, calling type_to_field_default_expr for FieldDefault::Trait.
    • Added type_to_field_default_expr function to generate field_default() calls for various types.
    • Updated internal test expectations to include the field_default() method in generated impl AvroSchemaComponent blocks.
  • avro_derive/tests/derive.rs
    • Added use statements for Duration and Uuid.
    • Modified test_basic_struct_with_defaults to include #[avro(default = false)] for a field.
    • Updated avro_rs_397_uuid test to include a default value for the UUID fixed schema.
    • Added new test cases: avro_rs_476_field_default, avro_rs_476_field_default_false, and avro_rs_476_field_default_provided to thoroughly test the new default value attributes and trait integration.
  • avro_derive/tests/ui/avro_rs_226_skip_serializing.rs
    • Added #[avro(default = false)] to a field with #[serde(skip_serializing)].
  • avro_derive/tests/ui/avro_rs_226_skip_serializing.stderr
    • Updated the expected error message for skip_serializing to reflect the new incompatibility with #[avro(default = false)].
  • avro_derive/tests/ui/avro_rs_226_skip_serializing_if.rs
    • Added #[avro(default = false)] to a field with #[serde(skip_serializing_if)].
  • avro_derive/tests/ui/avro_rs_226_skip_serializing_if.stderr
    • Updated the expected error message for skip_serializing_if to reflect the new incompatibility with #[avro(default = false)].
Activity
  • No human activity has been recorded on this pull request since its creation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature by allowing types to provide default values in Avro schemas. The implementation is comprehensive, with changes across both the avro and avro_derive crates. The new field_default method in the AvroSchemaComponent trait is well-designed, and the derive macro has been effectively updated to support the new #[avro(default)] attribute options. The addition of extensive tests ensures the new functionality is robust. I have one minor suggestion to improve the clarity of the documentation.

None
}

/// If `T` has a field default, this will return an array of elements with that default. Otherwise there is no default.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation for field_default on [T; N] is misleading. It states that it will return an array of default elements if the inner type T has a default, but the implementation always returns None. This could cause confusion. The comment should be updated to accurately reflect that arrays do not have a default value by default.

Suggested change
/// If `T` has a field default, this will return an array of elements with that default. Otherwise there is no default.
/// Arrays do not have a default value by default.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! The docstring is obsolete since now there is no default value by default for all types but std::option::Option. The method could be removed because it does the same as the default implementation in the AvroSchemaComponent trait.

@claude
Copy link
Copy Markdown

claude bot commented Feb 25, 2026

PR Review: feat: Allow types to provide default values

Thanks for this feature! The overall design is clean and the three-tier approach (Trait / Disabled / Value) for FieldDefault is a natural extension of the existing API. The documentation improvements and test coverage are solid. Below are some issues worth addressing before merging.


Bug: Misleading error variant in types.rs (new UUID case)

File: avro/src/types.rs

The new match arm:

(Value::String(ref string), UuidSchema::Fixed(_)) => {
    let bytes = string.as_bytes();
    if bytes.len() != 16 {
        return Err(Details::ConvertFixedToUuid(bytes.len()).into());
    }
    Value::Uuid(Uuid::from_slice(bytes).map_err(Details::ConvertSliceToUuid)?)
}

uses Details::ConvertFixedToUuid when the input is a Value::String, not a Value::Fixed. The error variant name and its message will be confusing to end users since it implies the conversion failed from a fixed value. A new variant (e.g. ConvertStringToUuid) would better describe the failure mode.

Additionally, this conversion is semantically unusual. A standard UUID string is 36 chars (with dashes) or 32 chars (without); 16 bytes is the raw binary representation. The check bytes.len() != 16 means only a string whose UTF-8 byte length is exactly 16 is accepted, which is a narrow and surprising constraint. Please add a test that exercises this new arm (both the happy path and the length-mismatch error path) to confirm the intended semantics.


Bug: Stale / misleading doc comment on Vec<T>::field_default()

File: avro/src/serde/derive.rs, ~line 618

/// If `T` has a field default, this will return an array of elements with that default. Otherwise there is no default.
fn field_default() -> Option<serde_json::Value> {
    None
}

The comment describes behaviour that is not implemented — the function always returns None. Either implement the described behaviour (constructing a singleton-array JSON value from T::field_default()) or replace the comment with something accurate such as "Returns None; Vec<T> does not provide a default value by default."


Validation gap: skip_serializing + FieldDefault::Trait with no runtime default

File: avro_derive/src/attributes/mod.rs, lines 321–329

The validation guard was changed from:

&& avro.default.is_none()       // old: error when no default attribute at all

to:

&& avro.default == FieldDefault::Disabled  // new: error only when explicitly false

This means a field annotated with #[serde(skip_serializing)] and no #[avro(default)] attribute now silently passes the compile-time check even when the type's field_default() returns None at runtime. The result is an Avro schema field with skip_serializing but no default — which the Avro spec requires — and no diagnostic. Consider emitting a warning (or staying with a hard error) when FieldDefault::Trait is selected and the type cannot be proven at compile time to provide a non-None default. At minimum, document this caveat clearly in the attribute docs.


Minor: format!(...).as_str() anti-pattern in generated code

Files: avro_derive/src/attributes/mod.rs line 132, avro_derive/src/lib.rs line 208

.expect(format!("Invalid JSON: {:?}", #default_value).as_str())

This allocates a String at runtime for an error path that is unreachable in practice (the JSON was already validated at compile time). Prefer:

.expect("Invalid JSON (validated at compile time)")
// or simply
.unwrap()

Clippy's clippy::expect_fun_call lint will flag this pattern.


Minor: Non-fully-qualified serde_json path in generated code

Files: avro_derive/src/attributes/mod.rs line 132, avro_derive/src/lib.rs line 208

The generated code uses serde_json::from_str without a :: prefix. If a user's crate does not have serde_json as a direct dependency this will fail to compile. The existing field-default code has the same issue, so this is not new, but the new container-level field_default() method propagates it further. Consider using ::serde_json::from_str or re-exporting serde_json from apache_avro and using that qualified path.


Minor: FieldDefault missing Clone

The FieldDefault enum does not derive Clone. This is fine for current usage (it is always moved), but the sibling With enum does derive Clone for consistency. It is worth deciding if Clone is intentionally omitted or just forgotten.


Minor: Doc comment accuracy

In avro/src/serde/derive.rs (the manual-implementation guide), the note:

///  - Implement `field_default()` if you want to use `#[serde(skip_serializing{,_if})]`.

is no longer quite right. After this PR, skip_serializing is allowed whenever the field does not have default = false; implementing field_default() is one way to satisfy this but not the only way. The comment should be updated to describe the actual requirement.


Test coverage suggestions

  1. UUID String→Fixed conversion – add a round-trip test covering the new (Value::String, UuidSchema::Fixed) match arm.
  2. Container default propagation – add a test checking that field_default() on a struct with #[avro(default = "{...}")] is called correctly when the struct is used as a field type.
  3. skip_serializing + Trait + None default – add a compile-test (UI test) showing the silent-pass case described above, so future reviewers are aware of the behaviour.

Overall

The feature is well-scoped and the attribute API (Trait / Disabled / Value) is intuitive. The primary items to address are the misleading error variant in types.rs, the stale Vec<T> doc comment, and the reduced compile-time safety around skip_serializing.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Feb 25, 2026

🤖 Augment PR Summary

Summary: This PR extends the Avro derive support to allow types to supply default values that can be reused automatically when those types are used as record fields.

Changes:

  • Introduces AvroSchemaComponent::field_default() (defaulting to None) and wires it through the schema-derive flow.
  • Adds a container-level #[avro(default = "…JSON…")] attribute to define a type-level default value, emitted via field_default().
  • Reworks field-level #[avro(default = …)] handling via a new FieldDefault enum: trait-provided default, explicitly disabled (false), or an explicit JSON literal.
  • Updates derive codegen so record fields without explicit defaults can fall back to <T as AvroSchemaComponent>::field_default() (notably, Option<T> now defaults to JSON null).
  • Adds UUID resolution support for fixed-uuid schemas when the value is represented as a JSON string.
  • Expands unit/UI tests to cover the new default behaviors and error cases.

Technical Notes: Default values are validated as JSON during macro expansion, but compatibility with Avro’s encoding rules (e.g., fixed/bytes defaults represented as JSON strings) remains important for correctness and schema round-tripping.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Value::Uuid(Uuid::from_slice(bytes).map_err(Details::ConvertSliceToUuid)?)
}
(Value::String(ref string), UuidSchema::Fixed(_)) => {
let bytes = string.as_bytes();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For UuidSchema::Fixed, using string.as_bytes() interprets the JSON string as UTF-8 bytes; Avro fixed defaults are effectively “one Unicode code point per byte”, so any \u00XX values (>0x7F) will become multi-byte and either fail the length check or produce the wrong UUID.

Severity: high

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback: The Augment AI reviewer is not correct! By specification UUID v4 could contain only alphanumeric characters and the '-'. So, there are no multi-byte characters in it. If the String length is not 16 then the value is invalid.

/// - Array: `#[avro(default = r#"["One", "Two", "Three"]"#)]`.
/// - Object: `#[avro(default = r#"{"One": 1}"#)]`.
///
/// See [the specification](https://avro.apache.org/docs/++version++/specification/#schema-record)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc link contains ++version++, which looks like a placeholder and likely won’t resolve as-is; consider linking to a concrete Avro spec version or a stable URL.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:incorrect-but-reasonable; category:documentation; feedback: The Augment AI reviewer is not correct! ++version++ is a special placeholder for the next version that is not yet released. It is always valid.

None
}

/// If `T` has a field default, this will return an array of elements with that default. Otherwise there is no default.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment says the array implementation will derive a default from T’s field_default, but field_default() currently always returns None; could we align the comment with the actual behavior (or vice-versa)?

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! The docstring is obsolete since now there is no default value by default for all types but std::option::Option. The method could be removed because it does the same as the default implementation in the AvroSchemaComponent trait.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
avro_derive/src/lib.rs (1)

473-493: type_to_field_default_expr duplicates the match structure of type_to_schema_expr and type_to_get_record_fields_expr.

All three functions share the same match arms and error messages, differing only in the generated expression. Consider extracting the common pattern into a single helper that takes a closure or token template to reduce duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro_derive/src/lib.rs` around lines 473 - 493, The three functions
type_to_field_default_expr, type_to_schema_expr, and
type_to_get_record_fields_expr duplicate the same match arms and error creation;
extract a single helper (e.g., map_type_to_expr or with_type_match) that takes a
closure/Fn(&Type) -> TokenStream (or an enum/template selector) and returns
Result<TokenStream, Vec<syn::Error>> so the shared match and error messages live
once; then replace each of the three functions with a thin wrapper that calls
the helper and supplies the closure producing the specific quoted expression
(for example the closure returns quote! {<#ty as
apache_avro::AvroSchemaComponent>::field_default()}, quote! {<#ty as
apache_avro::AvroSchemaComponent>::schema()}, etc.), preserving the
Ptr/Tuple/other syn::Error branches exactly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@avro_derive/tests/derive.rs`:
- Around line 2516-2518: The field `_r` is an Option<f64> annotated with
#[avro(default = "42.0")] which is invalid because Option unions are emitted as
["null","double"] and Avro union defaults must match the first branch (null);
change the attribute on `_r` to use a null default (e.g., #[avro(default =
"null")]) or remove the default entirely, or alternatively change the union
ordering to put double first (so the default 42.0 would be valid) — locate the
`_r: Option<f64>` field and update the #[avro(...)] attribute accordingly.

In `@avro/src/serde/derive.rs`:
- Around line 617-621: The docstring for fn field_default() currently claims it
"will return an array of elements with that default" for types with field
defaults, but the implementation always returns None; update the comment on
field_default() to accurately reflect the current behavior (e.g., "Returns None;
no field defaults are provided by this derive implementation") and remove the
misleading mention of arrays/field defaults, or alternatively implement the
described logic inside field_default() if you intend to support returning
default arrays for T—refer to the field_default() function and any surrounding
derive logic when making the change.

---

Nitpick comments:
In `@avro_derive/src/lib.rs`:
- Around line 473-493: The three functions type_to_field_default_expr,
type_to_schema_expr, and type_to_get_record_fields_expr duplicate the same match
arms and error creation; extract a single helper (e.g., map_type_to_expr or
with_type_match) that takes a closure/Fn(&Type) -> TokenStream (or an
enum/template selector) and returns Result<TokenStream, Vec<syn::Error>> so the
shared match and error messages live once; then replace each of the three
functions with a thin wrapper that calls the helper and supplies the closure
producing the specific quoted expression (for example the closure returns quote!
{<#ty as apache_avro::AvroSchemaComponent>::field_default()}, quote! {<#ty as
apache_avro::AvroSchemaComponent>::schema()}, etc.), preserving the
Ptr/Tuple/other syn::Error branches exactly.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 113d7d1 and 28ebd24.

📒 Files selected for processing (10)
  • avro/src/serde/derive.rs
  • avro/src/types.rs
  • avro_derive/src/attributes/avro.rs
  • avro_derive/src/attributes/mod.rs
  • avro_derive/src/lib.rs
  • avro_derive/tests/derive.rs
  • avro_derive/tests/ui/avro_rs_226_skip_serializing.rs
  • avro_derive/tests/ui/avro_rs_226_skip_serializing.stderr
  • avro_derive/tests/ui/avro_rs_226_skip_serializing_if.rs
  • avro_derive/tests/ui/avro_rs_226_skip_serializing_if.stderr

Comment on lines +2516 to +2518
#[avro(default = "42.0")]
_r: Option<f64>,
#[avro(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd avro_derive && sed -n '2510,2525p' tests/derive.rs

Repository: martin-augment/avro-rs

Length of output: 696


🏁 Script executed:

# Search for how Option types generate union schemas in avro_derive
rg -A5 -B5 "Option" avro_derive/src --type rs | head -100

Repository: martin-augment/avro-rs

Length of output: 92


🏁 Script executed:

# Look for the derive macro implementation to understand union handling
fd "lib.rs" avro_derive/src -x cat -n {}

Repository: martin-augment/avro-rs

Length of output: 50380


🏁 Script executed:

# View the test function containing this code
sed -n '2450,2550p' avro_derive/tests/derive.rs

Repository: martin-augment/avro-rs

Length of output: 5148


🏁 Script executed:

# Search for how Option types are handled in schema generation
rg -A10 -B5 "Option" avro_derive/src/lib.rs | head -80

Repository: martin-augment/avro-rs

Length of output: 3852


🏁 Script executed:

# Look for union schema handling and default validation
fd . avro_derive -type f -name "*.rs" | xargs rg -l "union\|Union" | head -5

Repository: martin-augment/avro-rs

Length of output: 236


🏁 Script executed:

# Search for union/default validation in apache-avro crate
find . -path ./avro_derive -prune -o -type f -name "*.rs" -print | xargs rg -l "union.*default|default.*union" | head -5

Repository: martin-augment/avro-rs

Length of output: 163


🏁 Script executed:

# Look specifically at the schema module to understand how defaults are validated
rg -B5 -A10 "default.*first\|first.*branch\|union.*default" --type-list | head -100

Repository: martin-augment/avro-rs

Length of output: 455


🏁 Script executed:

# Check the Avro spec reference - search for comments about union defaults
rg -i "avro.*spec|specification" avro/src | grep -i "default\|union" | head -20

Repository: martin-augment/avro-rs

Length of output: 822


🏁 Script executed:

# Search for union schema handling and default validation
rg -B3 -A8 "Union\|union" avro/src/schema.rs | head -150

Repository: martin-augment/avro-rs

Length of output: 124


🏁 Script executed:

# Look at types.rs for union handling
rg -B2 -A5 "Union" avro/src/types.rs | head -100

Repository: martin-augment/avro-rs

Length of output: 3807


🏁 Script executed:

# Check if there's validation of defaults in the schema module
rg "default" avro/src/schema.rs | head -20

Repository: martin-augment/avro-rs

Length of output: 124


🏁 Script executed:

# Find all schema-related files in avro crate
find avro/src -type f -name "*.rs" | head -20

Repository: martin-augment/avro-rs

Length of output: 528


🏁 Script executed:

# Look for mod.rs or lib.rs that might define schema structures
rg "pub.*Union\|struct.*Union" avro/src --type rust | head -20

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

# Search for where defaults are handled in the schema
rg -B5 -A8 "default.*:.*Value\|RecordField.*default" avro/src --type rust | head -100

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

# Check schema/mod.rs for union and default handling
head -200 avro/src/schema/mod.rs

Repository: martin-augment/avro-rs

Length of output: 6697


🏁 Script executed:

# Look for RecordField structure
rg -A10 "struct RecordField\|pub struct.*Field" avro/src/schema/mod.rs | head -50

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

# Search for validation of defaults in union schemas
rg -B5 -A10 "default.*union\|union.*default" avro/src/schema/mod.rs

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

# View RecordField definition
head -150 avro/src/schema/record.rs

Repository: martin-augment/avro-rs

Length of output: 150


🏁 Script executed:

# View union.rs for union handling
cat avro/src/schema/union.rs | head -200

Repository: martin-augment/avro-rs

Length of output: 6447


🏁 Script executed:

# Search for Avro spec documentation about union defaults
rg -i "default.*must.*first\|first.*branch.*default\|union.*default.*spec" avro/src

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

# Search for parser code that handles defaults
rg -B3 -A10 "default" avro/src/schema/parser.rs | head -150

Repository: martin-augment/avro-rs

Length of output: 4679


🏁 Script executed:

# Look for tests that specifically test union defaults
rg -B5 -A10 "union.*default\|option.*default" avro_derive/tests/derive.rs | head -100

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

# Check the Avro spec comment references for union defaults
rg "https://avro.apache.org" avro/src --type rust | grep -i spec

Repository: martin-augment/avro-rs

Length of output: 3975


🏁 Script executed:

# Search for record parsing and field default handling
rg -B5 -A15 "parse_record\|RecordField.*default" avro/src/schema/parser.rs | head -200

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

# Look for tests that use Option with defaults
rg -B10 -A10 "Option.*default\|default.*Option" avro/tests | head -150

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

# Check if there's any validation of defaults against union first variant
rg -B5 -A10 "first.*union\|first.*variant\|union.*first" avro/src --type rust | head -100

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

# Get more context around the test to see what it asserts
sed -n '2470,2600p' avro_derive/tests/derive.rs

Repository: martin-augment/avro-rs

Length of output: 4421


🏁 Script executed:

# Look for any test that specifically tests invalid defaults
rg -B3 -A10 "invalid.*default\|default.*invalid\|wrong.*default" avro_derive/tests

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

# Check the test result - view what the expected schema JSON looks like for _r field
sed -n '2470,2550p' avro_derive/tests/derive.rs | grep -A2 -B2 "_r"

Repository: martin-augment/avro-rs

Length of output: 2416


🏁 Script executed:

# Let me check if there are any comments or issues related to this in the codebase
rg -B5 -A5 "FIXME\|TODO\|BUG\|HACK" avro_derive/tests/derive.rs | grep -B5 -A5 "Option\|union\|default" | head -50

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

# Check if this test actually runs successfully
grep -n "avro_rs_476_field_default_provided" avro_derive/tests/derive.rs

Repository: martin-augment/avro-rs

Length of output: 112


🌐 Web query:

Avro union default value must match first type specification

💡 Result:

In Apache Avro, a field default for a union must be a JSON value whose type matches the union’s first branch (the first schema listed in the union). This is required by the Avro spec and is why “nullable” unions are typically written with "null" first. [1]

Example:

{ "name": "x", "type": ["null","string"], "default": null }   // valid
{ "name": "x", "type": ["null","string"], "default": "hi" }   // invalid (default is string, first branch is null)
{ "name": "x", "type": ["string","null"], "default": "hi" }   // valid

Reason (practical): Avro defaults for unions don’t carry the union-branch discriminator, so the reader resolves the default by assuming it is of the first union type. [1]

Source: Apache Avro specification (Unions / defaults). [1]


Option<f64> default is Avro-invalid with current union ordering.

Option<f64> emits ["null","double"], but the code sets default 42.0. Per the Avro specification, union field defaults must match the type of the first union branch. Since null is first, the default must be null.

Minimal spec-compliant adjustment
-        #[avro(default = "42.0")]
+        #[avro(default = "null")]
         _r: Option<f64>,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[avro(default = "42.0")]
_r: Option<f64>,
#[avro(
#[avro(default = "null")]
_r: Option<f64>,
#[avro(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro_derive/tests/derive.rs` around lines 2516 - 2518, The field `_r` is an
Option<f64> annotated with #[avro(default = "42.0")] which is invalid because
Option unions are emitted as ["null","double"] and Avro union defaults must
match the first branch (null); change the attribute on `_r` to use a null
default (e.g., #[avro(default = "null")]) or remove the default entirely, or
alternatively change the union ordering to put double first (so the default 42.0
would be valid) — locate the `_r: Option<f64>` field and update the #[avro(...)]
attribute accordingly.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback: The CodeRabbit AI reviewer is not correct! There is no such requirement in the Avro specification. https://avro.apache.org/docs/1.12.0/specification/#schema-record says "Default values for union fields correspond to the first schema that matches in the union.", so 42.0 will match for "double" schema.

Comment on lines +617 to +621

/// If `T` has a field default, this will return an array of elements with that default. Otherwise there is no default.
fn field_default() -> Option<serde_json::Value> {
None
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Stale doc comment contradicts the implementation.

The doc comment says "If T has a field default, this will return an array of elements with that default" but the method unconditionally returns None. Either update the comment to reflect the current behavior, or implement the described logic.

Proposed fix (update doc to match implementation)
-    /// If `T` has a field default, this will return an array of elements with that default. Otherwise there is no default.
-    fn field_default() -> Option<serde_json::Value> {
-        None
-    }
+    fn field_default() -> Option<serde_json::Value> {
+        None
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// If `T` has a field default, this will return an array of elements with that default. Otherwise there is no default.
fn field_default() -> Option<serde_json::Value> {
None
}
fn field_default() -> Option<serde_json::Value> {
None
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/serde/derive.rs` around lines 617 - 621, The docstring for fn
field_default() currently claims it "will return an array of elements with that
default" for types with field defaults, but the implementation always returns
None; update the comment on field_default() to accurately reflect the current
behavior (e.g., "Returns None; no field defaults are provided by this derive
implementation") and remove the misleading mention of arrays/field defaults, or
alternatively implement the described logic inside field_default() if you intend
to support returning default arrays for T—refer to the field_default() function
and any surrounding derive logic when making the change.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The CodeRabbit AI reviewer is correct! The docstring is obsolete since now there is no default value by default for all types but std::option::Option. The method could be removed because it does the same as the default implementation in the AvroSchemaComponent trait.

@martin-augment
Copy link
Copy Markdown
Owner Author

Bug: Stale / misleading doc comment on Vec<T>::field_default()

File: avro/src/serde/derive.rs, ~line 618

/// If `T` has a field default, this will return an array of elements with that default. Otherwise there is no default.
fn field_default() -> Option<serde_json::Value> {
    None
}

The comment describes behaviour that is not implemented — the function always returns None. Either implement the described behaviour (constructing a singleton-array JSON value from T::field_default()) or replace the comment with something accurate such as "Returns None; Vec<T> does not provide a default value by default."

value:good-to-have; category:documentation; feedback: The Claude AI reviewer is correct! The docstring is obsolete since now there is no default value by default for all types but std::option::Option. The method could be removed because it does the same as the default implementation in the AvroSchemaComponent trait.

@martin-augment
Copy link
Copy Markdown
Owner Author

Bug: Misleading error variant in types.rs (new UUID case)

File: avro/src/types.rs

The new match arm:

(Value::String(ref string), UuidSchema::Fixed(_)) => {
    let bytes = string.as_bytes();
    if bytes.len() != 16 {
        return Err(Details::ConvertFixedToUuid(bytes.len()).into());
    }
    Value::Uuid(Uuid::from_slice(bytes).map_err(Details::ConvertSliceToUuid)?)
}

uses Details::ConvertFixedToUuid when the input is a Value::String, not a Value::Fixed. The error variant name and its message will be confusing to end users since it implies the conversion failed from a fixed value. A new variant (e.g. ConvertStringToUuid) would better describe the failure mode.

Additionally, this conversion is semantically unusual. A standard UUID string is 36 chars (with dashes) or 32 chars (without); 16 bytes is the raw binary representation. The check bytes.len() != 16 means only a string whose UTF-8 byte length is exactly 16 is accepted, which is a narrow and surprising constraint. Please add a test that exercises this new arm (both the happy path and the length-mismatch error path) to confirm the intended semantics.

value:annoying; category:bug; feedback: The Claude AI reviewer is not correct! By specification UUID v4 could contain only alphanumeric characters and the '-'. So, there are no multi-byte characters in it. If the String length is not 16 then the value is invalid.

@martin-augment
Copy link
Copy Markdown
Owner Author

Validation gap: skip_serializing + FieldDefault::Trait with no runtime default

File: avro_derive/src/attributes/mod.rs, lines 321–329

The validation guard was changed from:

&& avro.default.is_none()       // old: error when no default attribute at all

to:

&& avro.default == FieldDefault::Disabled  // new: error only when explicitly false

This means a field annotated with #[serde(skip_serializing)] and no #[avro(default)] attribute now silently passes the compile-time check even when the type's field_default() returns None at runtime. The result is an Avro schema field with skip_serializing but no default — which the Avro spec requires — and no diagnostic. Consider emitting a warning (or staying with a hard error) when FieldDefault::Trait is selected and the type cannot be proven at compile time to provide a non-None default. At minimum, document this caveat clearly in the attribute docs.

value:valid-but-wont-fix; category:bug; feedback: The Claude AI reviewer is correct! There is no way to check that at compile time and there is a gap! But this is not a reason to fail the build or log warnings! The user application may have provided a valid implementation of AvroSchemaComponent::field_default(). The compile time checks are best effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants