477: feat: Allow types to provide default values#55
477: feat: Allow types to provide default values#55martin-augment wants to merge 5 commits intomainfrom
Conversation
`AvroSchemaComponent` is extended with a function `field_default`
which will be called when deriving a record to set the default value
for a field. The default implementation is to return `None`, which
means no default.
On the derive side, it is now possible to specify a default for a type
using `#[avro(default = "..")]`. It is also possible to disable setting
a default for a field with `#[avro(default = false)]`.
This enables users to use `#[serde(skip_serializing{_if})]` on most fields
without having to provide a default value.
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
WalkthroughThis pull request adds field-level default value support to Avro schema generation. It introduces a new ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature by allowing types to provide default values in Avro schemas. The implementation is comprehensive, with changes across both the avro and avro_derive crates. The new field_default method in the AvroSchemaComponent trait is well-designed, and the derive macro has been effectively updated to support the new #[avro(default)] attribute options. The addition of extensive tests ensures the new functionality is robust. I have one minor suggestion to improve the clarity of the documentation.
| None | ||
| } | ||
|
|
||
| /// If `T` has a field default, this will return an array of elements with that default. Otherwise there is no default. |
There was a problem hiding this comment.
The documentation for field_default on [T; N] is misleading. It states that it will return an array of default elements if the inner type T has a default, but the implementation always returns None. This could cause confusion. The comment should be updated to accurately reflect that arrays do not have a default value by default.
| /// If `T` has a field default, this will return an array of elements with that default. Otherwise there is no default. | |
| /// Arrays do not have a default value by default. |
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! The docstring is obsolete since now there is no default value by default for all types but std::option::Option. The method could be removed because it does the same as the default implementation in the AvroSchemaComponent trait.
PR Review: feat: Allow types to provide default valuesThanks for this feature! The overall design is clean and the three-tier approach (Trait / Disabled / Value) for Bug: Misleading error variant in
|
🤖 Augment PR SummarySummary: This PR extends the Avro derive support to allow types to supply default values that can be reused automatically when those types are used as record fields. Changes:
Technical Notes: Default values are validated as JSON during macro expansion, but compatibility with Avro’s encoding rules (e.g., fixed/bytes defaults represented as JSON strings) remains important for correctness and schema round-tripping. 🤖 Was this summary useful? React with 👍 or 👎 |
| Value::Uuid(Uuid::from_slice(bytes).map_err(Details::ConvertSliceToUuid)?) | ||
| } | ||
| (Value::String(ref string), UuidSchema::Fixed(_)) => { | ||
| let bytes = string.as_bytes(); |
There was a problem hiding this comment.
For UuidSchema::Fixed, using string.as_bytes() interprets the JSON string as UTF-8 bytes; Avro fixed defaults are effectively “one Unicode code point per byte”, so any \u00XX values (>0x7F) will become multi-byte and either fail the length check or produce the wrong UUID.
Severity: high
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:annoying; category:bug; feedback: The Augment AI reviewer is not correct! By specification UUID v4 could contain only alphanumeric characters and the '-'. So, there are no multi-byte characters in it. If the String length is not 16 then the value is invalid.
| /// - Array: `#[avro(default = r#"["One", "Two", "Three"]"#)]`. | ||
| /// - Object: `#[avro(default = r#"{"One": 1}"#)]`. | ||
| /// | ||
| /// See [the specification](https://avro.apache.org/docs/++version++/specification/#schema-record) |
There was a problem hiding this comment.
value:incorrect-but-reasonable; category:documentation; feedback: The Augment AI reviewer is not correct! ++version++ is a special placeholder for the next version that is not yet released. It is always valid.
| None | ||
| } | ||
|
|
||
| /// If `T` has a field default, this will return an array of elements with that default. Otherwise there is no default. |
There was a problem hiding this comment.
The doc comment says the array implementation will derive a default from T’s field_default, but field_default() currently always returns None; could we align the comment with the actual behavior (or vice-versa)?
Severity: low
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! The docstring is obsolete since now there is no default value by default for all types but std::option::Option. The method could be removed because it does the same as the default implementation in the AvroSchemaComponent trait.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
avro_derive/src/lib.rs (1)
473-493:type_to_field_default_exprduplicates the match structure oftype_to_schema_exprandtype_to_get_record_fields_expr.All three functions share the same match arms and error messages, differing only in the generated expression. Consider extracting the common pattern into a single helper that takes a closure or token template to reduce duplication.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@avro_derive/src/lib.rs` around lines 473 - 493, The three functions type_to_field_default_expr, type_to_schema_expr, and type_to_get_record_fields_expr duplicate the same match arms and error creation; extract a single helper (e.g., map_type_to_expr or with_type_match) that takes a closure/Fn(&Type) -> TokenStream (or an enum/template selector) and returns Result<TokenStream, Vec<syn::Error>> so the shared match and error messages live once; then replace each of the three functions with a thin wrapper that calls the helper and supplies the closure producing the specific quoted expression (for example the closure returns quote! {<#ty as apache_avro::AvroSchemaComponent>::field_default()}, quote! {<#ty as apache_avro::AvroSchemaComponent>::schema()}, etc.), preserving the Ptr/Tuple/other syn::Error branches exactly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@avro_derive/tests/derive.rs`:
- Around line 2516-2518: The field `_r` is an Option<f64> annotated with
#[avro(default = "42.0")] which is invalid because Option unions are emitted as
["null","double"] and Avro union defaults must match the first branch (null);
change the attribute on `_r` to use a null default (e.g., #[avro(default =
"null")]) or remove the default entirely, or alternatively change the union
ordering to put double first (so the default 42.0 would be valid) — locate the
`_r: Option<f64>` field and update the #[avro(...)] attribute accordingly.
In `@avro/src/serde/derive.rs`:
- Around line 617-621: The docstring for fn field_default() currently claims it
"will return an array of elements with that default" for types with field
defaults, but the implementation always returns None; update the comment on
field_default() to accurately reflect the current behavior (e.g., "Returns None;
no field defaults are provided by this derive implementation") and remove the
misleading mention of arrays/field defaults, or alternatively implement the
described logic inside field_default() if you intend to support returning
default arrays for T—refer to the field_default() function and any surrounding
derive logic when making the change.
---
Nitpick comments:
In `@avro_derive/src/lib.rs`:
- Around line 473-493: The three functions type_to_field_default_expr,
type_to_schema_expr, and type_to_get_record_fields_expr duplicate the same match
arms and error creation; extract a single helper (e.g., map_type_to_expr or
with_type_match) that takes a closure/Fn(&Type) -> TokenStream (or an
enum/template selector) and returns Result<TokenStream, Vec<syn::Error>> so the
shared match and error messages live once; then replace each of the three
functions with a thin wrapper that calls the helper and supplies the closure
producing the specific quoted expression (for example the closure returns quote!
{<#ty as apache_avro::AvroSchemaComponent>::field_default()}, quote! {<#ty as
apache_avro::AvroSchemaComponent>::schema()}, etc.), preserving the
Ptr/Tuple/other syn::Error branches exactly.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
avro/src/serde/derive.rsavro/src/types.rsavro_derive/src/attributes/avro.rsavro_derive/src/attributes/mod.rsavro_derive/src/lib.rsavro_derive/tests/derive.rsavro_derive/tests/ui/avro_rs_226_skip_serializing.rsavro_derive/tests/ui/avro_rs_226_skip_serializing.stderravro_derive/tests/ui/avro_rs_226_skip_serializing_if.rsavro_derive/tests/ui/avro_rs_226_skip_serializing_if.stderr
| #[avro(default = "42.0")] | ||
| _r: Option<f64>, | ||
| #[avro( |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd avro_derive && sed -n '2510,2525p' tests/derive.rsRepository: martin-augment/avro-rs
Length of output: 696
🏁 Script executed:
# Search for how Option types generate union schemas in avro_derive
rg -A5 -B5 "Option" avro_derive/src --type rs | head -100Repository: martin-augment/avro-rs
Length of output: 92
🏁 Script executed:
# Look for the derive macro implementation to understand union handling
fd "lib.rs" avro_derive/src -x cat -n {}Repository: martin-augment/avro-rs
Length of output: 50380
🏁 Script executed:
# View the test function containing this code
sed -n '2450,2550p' avro_derive/tests/derive.rsRepository: martin-augment/avro-rs
Length of output: 5148
🏁 Script executed:
# Search for how Option types are handled in schema generation
rg -A10 -B5 "Option" avro_derive/src/lib.rs | head -80Repository: martin-augment/avro-rs
Length of output: 3852
🏁 Script executed:
# Look for union schema handling and default validation
fd . avro_derive -type f -name "*.rs" | xargs rg -l "union\|Union" | head -5Repository: martin-augment/avro-rs
Length of output: 236
🏁 Script executed:
# Search for union/default validation in apache-avro crate
find . -path ./avro_derive -prune -o -type f -name "*.rs" -print | xargs rg -l "union.*default|default.*union" | head -5Repository: martin-augment/avro-rs
Length of output: 163
🏁 Script executed:
# Look specifically at the schema module to understand how defaults are validated
rg -B5 -A10 "default.*first\|first.*branch\|union.*default" --type-list | head -100Repository: martin-augment/avro-rs
Length of output: 455
🏁 Script executed:
# Check the Avro spec reference - search for comments about union defaults
rg -i "avro.*spec|specification" avro/src | grep -i "default\|union" | head -20Repository: martin-augment/avro-rs
Length of output: 822
🏁 Script executed:
# Search for union schema handling and default validation
rg -B3 -A8 "Union\|union" avro/src/schema.rs | head -150Repository: martin-augment/avro-rs
Length of output: 124
🏁 Script executed:
# Look at types.rs for union handling
rg -B2 -A5 "Union" avro/src/types.rs | head -100Repository: martin-augment/avro-rs
Length of output: 3807
🏁 Script executed:
# Check if there's validation of defaults in the schema module
rg "default" avro/src/schema.rs | head -20Repository: martin-augment/avro-rs
Length of output: 124
🏁 Script executed:
# Find all schema-related files in avro crate
find avro/src -type f -name "*.rs" | head -20Repository: martin-augment/avro-rs
Length of output: 528
🏁 Script executed:
# Look for mod.rs or lib.rs that might define schema structures
rg "pub.*Union\|struct.*Union" avro/src --type rust | head -20Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
# Search for where defaults are handled in the schema
rg -B5 -A8 "default.*:.*Value\|RecordField.*default" avro/src --type rust | head -100Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
# Check schema/mod.rs for union and default handling
head -200 avro/src/schema/mod.rsRepository: martin-augment/avro-rs
Length of output: 6697
🏁 Script executed:
# Look for RecordField structure
rg -A10 "struct RecordField\|pub struct.*Field" avro/src/schema/mod.rs | head -50Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
# Search for validation of defaults in union schemas
rg -B5 -A10 "default.*union\|union.*default" avro/src/schema/mod.rsRepository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
# View RecordField definition
head -150 avro/src/schema/record.rsRepository: martin-augment/avro-rs
Length of output: 150
🏁 Script executed:
# View union.rs for union handling
cat avro/src/schema/union.rs | head -200Repository: martin-augment/avro-rs
Length of output: 6447
🏁 Script executed:
# Search for Avro spec documentation about union defaults
rg -i "default.*must.*first\|first.*branch.*default\|union.*default.*spec" avro/srcRepository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
# Search for parser code that handles defaults
rg -B3 -A10 "default" avro/src/schema/parser.rs | head -150Repository: martin-augment/avro-rs
Length of output: 4679
🏁 Script executed:
# Look for tests that specifically test union defaults
rg -B5 -A10 "union.*default\|option.*default" avro_derive/tests/derive.rs | head -100Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
# Check the Avro spec comment references for union defaults
rg "https://avro.apache.org" avro/src --type rust | grep -i specRepository: martin-augment/avro-rs
Length of output: 3975
🏁 Script executed:
# Search for record parsing and field default handling
rg -B5 -A15 "parse_record\|RecordField.*default" avro/src/schema/parser.rs | head -200Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
# Look for tests that use Option with defaults
rg -B10 -A10 "Option.*default\|default.*Option" avro/tests | head -150Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
# Check if there's any validation of defaults against union first variant
rg -B5 -A10 "first.*union\|first.*variant\|union.*first" avro/src --type rust | head -100Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
# Get more context around the test to see what it asserts
sed -n '2470,2600p' avro_derive/tests/derive.rsRepository: martin-augment/avro-rs
Length of output: 4421
🏁 Script executed:
# Look for any test that specifically tests invalid defaults
rg -B3 -A10 "invalid.*default\|default.*invalid\|wrong.*default" avro_derive/testsRepository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
# Check the test result - view what the expected schema JSON looks like for _r field
sed -n '2470,2550p' avro_derive/tests/derive.rs | grep -A2 -B2 "_r"Repository: martin-augment/avro-rs
Length of output: 2416
🏁 Script executed:
# Let me check if there are any comments or issues related to this in the codebase
rg -B5 -A5 "FIXME\|TODO\|BUG\|HACK" avro_derive/tests/derive.rs | grep -B5 -A5 "Option\|union\|default" | head -50Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
# Check if this test actually runs successfully
grep -n "avro_rs_476_field_default_provided" avro_derive/tests/derive.rsRepository: martin-augment/avro-rs
Length of output: 112
🌐 Web query:
Avro union default value must match first type specification
💡 Result:
In Apache Avro, a field default for a union must be a JSON value whose type matches the union’s first branch (the first schema listed in the union). This is required by the Avro spec and is why “nullable” unions are typically written with "null" first. [1]
Example:
{ "name": "x", "type": ["null","string"], "default": null } // valid
{ "name": "x", "type": ["null","string"], "default": "hi" } // invalid (default is string, first branch is null)
{ "name": "x", "type": ["string","null"], "default": "hi" } // validReason (practical): Avro defaults for unions don’t carry the union-branch discriminator, so the reader resolves the default by assuming it is of the first union type. [1]
Source: Apache Avro specification (Unions / defaults). [1]
Option<f64> default is Avro-invalid with current union ordering.
Option<f64> emits ["null","double"], but the code sets default 42.0. Per the Avro specification, union field defaults must match the type of the first union branch. Since null is first, the default must be null.
Minimal spec-compliant adjustment
- #[avro(default = "42.0")]
+ #[avro(default = "null")]
_r: Option<f64>,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[avro(default = "42.0")] | |
| _r: Option<f64>, | |
| #[avro( | |
| #[avro(default = "null")] | |
| _r: Option<f64>, | |
| #[avro( |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro_derive/tests/derive.rs` around lines 2516 - 2518, The field `_r` is an
Option<f64> annotated with #[avro(default = "42.0")] which is invalid because
Option unions are emitted as ["null","double"] and Avro union defaults must
match the first branch (null); change the attribute on `_r` to use a null
default (e.g., #[avro(default = "null")]) or remove the default entirely, or
alternatively change the union ordering to put double first (so the default 42.0
would be valid) — locate the `_r: Option<f64>` field and update the #[avro(...)]
attribute accordingly.
There was a problem hiding this comment.
value:annoying; category:bug; feedback: The CodeRabbit AI reviewer is not correct! There is no such requirement in the Avro specification. https://avro.apache.org/docs/1.12.0/specification/#schema-record says "Default values for union fields correspond to the first schema that matches in the union.", so 42.0 will match for "double" schema.
|
|
||
| /// If `T` has a field default, this will return an array of elements with that default. Otherwise there is no default. | ||
| fn field_default() -> Option<serde_json::Value> { | ||
| None | ||
| } |
There was a problem hiding this comment.
Stale doc comment contradicts the implementation.
The doc comment says "If T has a field default, this will return an array of elements with that default" but the method unconditionally returns None. Either update the comment to reflect the current behavior, or implement the described logic.
Proposed fix (update doc to match implementation)
- /// If `T` has a field default, this will return an array of elements with that default. Otherwise there is no default.
- fn field_default() -> Option<serde_json::Value> {
- None
- }
+ fn field_default() -> Option<serde_json::Value> {
+ None
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// If `T` has a field default, this will return an array of elements with that default. Otherwise there is no default. | |
| fn field_default() -> Option<serde_json::Value> { | |
| None | |
| } | |
| fn field_default() -> Option<serde_json::Value> { | |
| None | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/serde/derive.rs` around lines 617 - 621, The docstring for fn
field_default() currently claims it "will return an array of elements with that
default" for types with field defaults, but the implementation always returns
None; update the comment on field_default() to accurately reflect the current
behavior (e.g., "Returns None; no field defaults are provided by this derive
implementation") and remove the misleading mention of arrays/field defaults, or
alternatively implement the described logic inside field_default() if you intend
to support returning default arrays for T—refer to the field_default() function
and any surrounding derive logic when making the change.
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The CodeRabbit AI reviewer is correct! The docstring is obsolete since now there is no default value by default for all types but std::option::Option. The method could be removed because it does the same as the default implementation in the AvroSchemaComponent trait.
value:good-to-have; category:documentation; feedback: The Claude AI reviewer is correct! The docstring is obsolete since now there is no default value by default for all types but std::option::Option. The method could be removed because it does the same as the default implementation in the AvroSchemaComponent trait. |
value:annoying; category:bug; feedback: The Claude AI reviewer is not correct! By specification UUID v4 could contain only alphanumeric characters and the '-'. So, there are no multi-byte characters in it. If the String length is not 16 then the value is invalid. |
value:valid-but-wont-fix; category:bug; feedback: The Claude AI reviewer is correct! There is no way to check that at compile time and there is a gap! But this is not a reason to fail the build or log warnings! The user application may have provided a valid implementation of AvroSchemaComponent::field_default(). The compile time checks are best effort. |
477: To review by AI