467: feat!: Support default field for array and map#48
467: feat!: Support default field for array and map#48martin-augment wants to merge 1 commit intomainfrom
default field for array and map#48Conversation
WalkthroughThe changes refactor schema value handling to distinguish between JSON values and Rust values by introducing a ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Avro schema implementation by adding support for Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
PR Review: Support
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@avro/src/error.rs`:
- Around line 437-447: Update the error strings for MapDefaultWrongType and
MapDefaultWrongInnerType to fix the grammar ("a object" → "an object"); locate
the enum variants MapDefaultWrongType and MapDefaultWrongInnerType in error.rs
and change their #[error(...)] messages to use "an object" (and ensure the rest
of the messages remain unchanged).
In `@avro/src/schema/mod.rs`:
- Around line 3992-3995: Fix the typo in the error message that currently reads
"Default value for a map must be a object!" by changing it to "Default value for
a map must be an object!" in the error variant/constructor in error.rs (the
variant that builds this message for map default value type errors), and update
any tests expecting the old string (e.g., the assert in mod.rs that compares to
r#"Default value for a map must be a object! Got: "invalid""#) to the corrected
message r#"Default value for a map must be an object! Got: "invalid""# so the
message and tests remain consistent.
- Around line 5229-5231: The boolean results of json.contains(...) are currently
discarded, so the tests don't fail when the substring is missing; wrap each of
the four occurrences of json.contains(...) with assert!(...) to assert the
expected JSON content. Locate the occurrences where a Schema variant is
serialized (e.g., serde_json::to_string(&Schema::Array(array))? assigning to
json) and replace json.contains(r#"... "#) with assert!(json.contains(r#"...
"#)) at each of those spots (the four places around the Schema::Array and the
other Schema serializations) so the tests actually validate the JSON output.
In `@avro/src/schema/parser.rs`:
- Around line 711-726: The default validation for array/map inner values fails
when `items` or `values` is a `Schema::Ref` because `v.validate(&items)` builds
an isolated `ResolvedSchema`; update the validation to construct the resolved
schema with the parser's known schemata by calling
`ResolvedSchema::new_with_known_schemata(&items, &self.parsed_schemas)` (or
equivalent) and use that resolved schema's `validate` instead of the current
isolated one; apply the same change in both the array-default branch (where
`items` is checked) and the map-default branch (where `values` is checked) so
refs resolve against `parsed_schemas`.
🧹 Nitpick comments (2)
avro/src/schema/mod.rs (2)
769-772:default.clone()on every serialization is a potential performance concern.Each call to
serializeclones the entire default array/map just to convert it toJsonValue. If schemas are serialized frequently (e.g., in a writer hot path), this could add up.Consider caching the
JsonValuerepresentation or storing the default asJsonValuedirectly to avoid repeated cloning + conversion.
656-662: Consider addingmap_with_default/array_with_defaultconstructors.Since the
defaultfield is only settable by mutating the struct directly, it might be worth providing convenience constructors that accept a default value — especially given this is afeat!(breaking) change and a good time to expand the API surface. This is optional and can be deferred.Also applies to: 674-680
| #[error("Default value for an array must be an array! Got: {0}")] | ||
| ArrayDefaultWrongType(serde_json::Value), | ||
|
|
||
| #[error("Default value for an array must be an array of {0}! Found: {1:?}")] | ||
| ArrayDefaultWrongInnerType(Schema, Value), | ||
|
|
||
| #[error("Default value for a map must be a object! Got: {0}")] | ||
| MapDefaultWrongType(serde_json::Value), | ||
|
|
||
| #[error("Default value for a map must be a object with (String, {0})! Found: (String, {1:?})")] | ||
| MapDefaultWrongInnerType(Schema, Value), |
There was a problem hiding this comment.
Grammar: "a object" → "an object" in map error messages.
Lines 443 and 446 use "a object" which is grammatically incorrect.
✏️ Proposed fix
- #[error("Default value for a map must be a object! Got: {0}")]
+ #[error("Default value for a map must be an object! Got: {0}")]
MapDefaultWrongType(serde_json::Value),
- #[error("Default value for a map must be a object with (String, {0})! Found: (String, {1:?})")]
+ #[error("Default value for a map must be an object with (String, {0})! Found: (String, {1:?})")]
MapDefaultWrongInnerType(Schema, Value),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[error("Default value for an array must be an array! Got: {0}")] | |
| ArrayDefaultWrongType(serde_json::Value), | |
| #[error("Default value for an array must be an array of {0}! Found: {1:?}")] | |
| ArrayDefaultWrongInnerType(Schema, Value), | |
| #[error("Default value for a map must be a object! Got: {0}")] | |
| MapDefaultWrongType(serde_json::Value), | |
| #[error("Default value for a map must be a object with (String, {0})! Found: (String, {1:?})")] | |
| MapDefaultWrongInnerType(Schema, Value), | |
| #[error("Default value for an array must be an array! Got: {0}")] | |
| ArrayDefaultWrongType(serde_json::Value), | |
| #[error("Default value for an array must be an array of {0}! Found: {1:?}")] | |
| ArrayDefaultWrongInnerType(Schema, Value), | |
| #[error("Default value for a map must be an object! Got: {0}")] | |
| MapDefaultWrongType(serde_json::Value), | |
| #[error("Default value for a map must be an object with (String, {0})! Found: (String, {1:?})")] | |
| MapDefaultWrongInnerType(Schema, Value), |
🤖 Prompt for AI Agents
In `@avro/src/error.rs` around lines 437 - 447, Update the error strings for
MapDefaultWrongType and MapDefaultWrongInnerType to fix the grammar ("a object"
→ "an object"); locate the enum variants MapDefaultWrongType and
MapDefaultWrongInnerType in error.rs and change their #[error(...)] messages to
use "an object" (and ensure the rest of the messages remain unchanged).
| assert_eq!( | ||
| r#"Default value for a map must be a object! Got: "invalid""#, | ||
| err | ||
| ); |
There was a problem hiding this comment.
Typo: "a object" should be "an object".
The error message reads "Default value for a map must be a object!" — this should be "an object". This likely originates from the error variant definition in error.rs.
🤖 Prompt for AI Agents
In `@avro/src/schema/mod.rs` around lines 3992 - 3995, Fix the typo in the error
message that currently reads "Default value for a map must be a object!" by
changing it to "Default value for a map must be an object!" in the error
variant/constructor in error.rs (the variant that builds this message for map
default value type errors), and update any tests expecting the old string (e.g.,
the assert in mod.rs that compares to r#"Default value for a map must be a
object! Got: "invalid""#) to the corrected message r#"Default value for a map
must be an object! Got: "invalid""# so the message and tests remain consistent.
|
|
||
| let json = serde_json::to_string(&Schema::Array(array))?; | ||
| json.contains(r#""default":[]"#); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the specific lines mentioned in the review
cat -n avro/src/schema/mod.rs | sed -n '5225,5235p'
cat -n avro/src/schema/mod.rs | sed -n '5255,5265p'
cat -n avro/src/schema/mod.rs | sed -n '5316,5326p'
cat -n avro/src/schema/mod.rs | sed -n '5341,5351p'Repository: martin-augment/avro-rs
Length of output: 1423
🏁 Script executed:
# Search for bare .contains() calls in test code that are not wrapped in assert!
rg -n '^\s+\w+\.contains\(' --type=rust avro/src/schema/mod.rsRepository: martin-augment/avro-rs
Length of output: 275
🏁 Script executed:
# Search the entire rust codebase for this pattern
rg -n '\w+\.contains\(' --type=rust | grep -v 'assert!' | head -20Repository: martin-augment/avro-rs
Length of output: 1152
🏁 Script executed:
# Check the context around line 38 in serde.rs
cat -n avro_derive/tests/serde.rs | sed -n '35,42p'Repository: martin-augment/avro-rs
Length of output: 289
Fix: json.contains() results are unused — wrap all four occurrences with assert!()
These tests silently pass regardless of JSON content because the boolean result from json.contains() is discarded. This occurs at lines 5231, 5260, 5321, and 5346.
Proposed fix
Line 5231:
- json.contains(r#""default":[]"#);
+ assert!(json.contains(r#""default":[]"#));Line 5260:
- json.contains(r#""default":["foo","bar"]"#);
+ assert!(json.contains(r#""default":["foo","bar"]"#));Line 5321:
- json.contains(r#""default":{}"#);
+ assert!(json.contains(r#""default":{}"#));Line 5346:
- json.contains(r#""default":{"foo":"bar"}"#);
+ assert!(json.contains(r#""default":{"foo":"bar"}"#));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let json = serde_json::to_string(&Schema::Array(array))?; | |
| json.contains(r#""default":[]"#); | |
| let json = serde_json::to_string(&Schema::Array(array))?; | |
| assert!(json.contains(r#""default":[]"#)); |
🤖 Prompt for AI Agents
In `@avro/src/schema/mod.rs` around lines 5229 - 5231, The boolean results of
json.contains(...) are currently discarded, so the tests don't fail when the
substring is missing; wrap each of the four occurrences of json.contains(...)
with assert!(...) to assert the expected JSON content. Locate the occurrences
where a Schema variant is serialized (e.g.,
serde_json::to_string(&Schema::Array(array))? assigning to json) and replace
json.contains(r#"... "#) with assert!(json.contains(r#"... "#)) at each of those
spots (the four places around the Schema::Array and the other Schema
serializations) so the tests actually validate the JSON output.
| let default = if let Some(default) = complex.get("default").cloned() { | ||
| if let Value::Array(_) = default { | ||
| let crate::types::Value::Array(array) = crate::types::Value::from(default) else { | ||
| unreachable!("JsonValue::Array can only become a Value::Array") | ||
| }; | ||
| // Check that the default type matches the schema type | ||
| if let Some(value) = array.iter().find(|v| !v.validate(&items)) { | ||
| return Err(Details::ArrayDefaultWrongInnerType(items, value.clone()).into()); | ||
| } | ||
| Some(array) | ||
| } else { | ||
| return Err(Details::ArrayDefaultWrongType(default).into()); | ||
| } | ||
| } else { | ||
| None | ||
| }; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check how validate is called - does it handle Refs without external names?
# Also check if parse_array/parse_map could receive a Ref as items/values.
echo "=== Checking if items can be a Ref ==="
rg -n 'Schema::Ref' avro/src/schema/parser.rs -C3
echo ""
echo "=== Checking ResolvedSchema::try_from for single schema ==="
rg -n 'impl.*TryFrom.*Schema' avro/src/schema/ -A 10Repository: martin-augment/avro-rs
Length of output: 3524
🏁 Script executed:
#!/bin/bash
# Check the validate method implementation
echo "=== Checking Value::validate signature and implementation ==="
rg -n 'fn validate' avro/src/ -B 2 -A 5 | head -100Repository: martin-augment/avro-rs
Length of output: 5841
🏁 Script executed:
#!/bin/bash
# Check where items/values come from in parse_array and parse_map
echo "=== Checking parse_array items handling ==="
sed -n '680,730p' avro/src/schema/parser.rs
echo ""
echo "=== Checking parse_map values handling ==="
sed -n '730,780p' avro/src/schema/parser.rsRepository: martin-augment/avro-rs
Length of output: 4019
🏁 Script executed:
#!/bin/bash
# Get the full implementation of Value::validate
echo "=== Full Value::validate implementation ==="
sed -n '364,450p' avro/src/types.rsRepository: martin-augment/avro-rs
Length of output: 4235
🏁 Script executed:
#!/bin/bash
# Check ResolvedSchema::new implementation to see how it handles refs
echo "=== ResolvedSchema::new implementation ==="
sed -n '40,78p' avro/src/schema/resolve.rsRepository: martin-augment/avro-rs
Length of output: 1409
🏁 Script executed:
#!/bin/bash
# Search for how Schema::Ref is handled in validation
echo "=== Searching for Schema::Ref handling in resolve.rs ==="
rg -n 'Schema::Ref' avro/src/schema/resolve.rs -B 2 -A 2Repository: martin-augment/avro-rs
Length of output: 344
Default validation fails for ref-type items or values schemas.
When items or values is a Schema::Ref, calling v.validate(&items) (line 717) creates an isolated ResolvedSchema that includes only the ref itself, not the parser's parsed_schemas context. Validation then fails to resolve the reference and incorrectly rejects valid defaults with "Unresolved schema reference" error. This affects both arrays (line 717) and maps (line 751).
To fix this, pass the parser's parsed_schemas to ResolvedSchema::new_with_known_schemata so refs can be properly resolved during validation.
🤖 Prompt for AI Agents
In `@avro/src/schema/parser.rs` around lines 711 - 726, The default validation for
array/map inner values fails when `items` or `values` is a `Schema::Ref` because
`v.validate(&items)` builds an isolated `ResolvedSchema`; update the validation
to construct the resolved schema with the parser's known schemata by calling
`ResolvedSchema::new_with_known_schemata(&items, &self.parsed_schemas)` (or
equivalent) and use that resolved schema's `validate` instead of the current
isolated one; apply the same change in both the array-default branch (where
`items` is checked) and the map-default branch (where `values` is checked) so
refs resolve against `parsed_schemas`.
There was a problem hiding this comment.
Code Review
This pull request introduces support for default fields in Avro array and map schemas, which is a great feature enhancement. However, the current implementation introduces several Denial of Service (DoS) vulnerabilities. Specifically, the conversion of JSON default values to Avro values can trigger a panic when encountering large integers, and the recursive validation of these default values lacks a depth limit, which can lead to stack overflows. These issues are particularly critical if the library is used to parse schemas from untrusted sources. It is recommended to use safe conversion methods that return errors instead of panicking and to implement recursion depth limits for all recursive parsing and validation logic. Additionally, while the implementation is thorough and includes comprehensive tests, I have a few suggestions to improve the new tests by turning some statements into actual assertions.
| .and_then(|items| self.parse(items, enclosing_namespace))?; | ||
| let default = if let Some(default) = complex.get("default").cloned() { | ||
| if let Value::Array(_) = default { | ||
| let crate::types::Value::Array(array) = crate::types::Value::from(default) else { |
There was a problem hiding this comment.
The conversion of the default value using crate::types::Value::from(default) can trigger a panic if the JSON contains a number that does not fit into a signed 64-bit integer (e.g., a large u64). This is due to a panic! in the From<JsonValue> for Value implementation in avro/src/types.rs (line 286). An attacker can exploit this to cause a Denial of Service by providing a schema with a malicious default value that crashes the application during parsing.
|
|
||
| let default = if let Some(default) = complex.get("default").cloned() { | ||
| if let Value::Object(_) = default { | ||
| let crate::types::Value::Map(map) = crate::types::Value::from(default) else { |
There was a problem hiding this comment.
Similar to the array default handling, converting a map's default value using crate::types::Value::from(default) can trigger a panic if the default value contains a large integer that exceeds the range of a signed 64-bit integer. This allows for a Denial of Service attack via a specially crafted schema that causes the parser to crash.
| unreachable!("JsonValue::Array can only become a Value::Array") | ||
| }; | ||
| // Check that the default type matches the schema type | ||
| if let Some(value) = array.iter().find(|v| !v.validate(&items)) { |
| unreachable!("JsonValue::Object can only become a Value::Map") | ||
| }; | ||
| // Check that the default type matches the schema type | ||
| if let Some(value) = map.values().find(|v| !v.validate(&types)) { |
| assert_eq!(array.default, Some(Vec::new())); | ||
|
|
||
| let json = serde_json::to_string(&Schema::Array(array))?; | ||
| json.contains(r#""default":[]"#); |
| ); | ||
|
|
||
| let json = serde_json::to_string(&Schema::Array(array))?; | ||
| json.contains(r#""default":["foo","bar"]"#); |
There was a problem hiding this comment.
| assert_eq!(map.default, Some(HashMap::new())); | ||
|
|
||
| let json = serde_json::to_string(&Schema::Map(map))?; | ||
| json.contains(r#""default":{}"#); |
| assert_eq!(map.default, Some(hashmap)); | ||
|
|
||
| let json = serde_json::to_string(&Schema::Map(map))?; | ||
| json.contains(r#""default":{"foo":"bar"}"#); |
There was a problem hiding this comment.
🤖 Augment PR SummarySummary: Adds first-class support for Changes:
Technical Notes: Defaults are stored as 🤖 Was this summary useful? React with 👍 or 👎 |
| .and_then(|items| self.parse(items, enclosing_namespace))?; | ||
| let default = if let Some(default) = complex.get("default").cloned() { | ||
| if let Value::Array(_) = default { | ||
| let crate::types::Value::Array(array) = crate::types::Value::from(default) else { |
There was a problem hiding this comment.
crate::types::Value::from(default) can panic for JSON numbers that serde_json stores as u64 (e.g. integers > i64::MAX), which would make schema parsing crash instead of returning an Error for an invalid default.
Severity: high
Other Locations
avro/src/schema/parser.rs:747
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| assert_eq!(array.default, Some(Vec::new())); | ||
|
|
||
| let json = serde_json::to_string(&Schema::Array(array))?; | ||
| json.contains(r#""default":[]"#); |
There was a problem hiding this comment.
The result of json.contains(...) is currently ignored, so these tests don’t actually assert that the serialized schema includes default (and this may also trigger unused_must_use warnings if warnings are denied).
Severity: medium
Other Locations
avro/src/schema/mod.rs:5260avro/src/schema/mod.rs:5321avro/src/schema/mod.rs:5346
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
value:annoying; category:bug; feedback: The Claude AI reviewer is not correct! The apache-avro crate is still in development and it is allowed to make breaking changes before its v1.0.0 release. |
value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct! Adding more unit tests with more complex schemas will expose any problems in the current implementation and will prevent from regressions in the future. |
value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct! It would be good to make it possible to pass more arguments to the factory method. Using the bon crate builder generating macros will make really nice APIs! |
467: To review by AI