Skip to content

467: feat!: Support default field for array and map#48

Open
martin-augment wants to merge 1 commit intomainfrom
pr-467-2026-02-16-16-16-07
Open

467: feat!: Support default field for array and map#48
martin-augment wants to merge 1 commit intomainfrom
pr-467-2026-02-16-16-16-07

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

467: To review by AI

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 16, 2026

Walkthrough

The changes refactor schema value handling to distinguish between JSON values and Rust values by introducing a JsonValue type for public-facing schema metadata. Error validation is expanded with four new error variants for array and map default value mismatches. Public API signatures are updated to accept JsonValue for attributes and parsing parameters. Schema structs (MapSchema, ArraySchema, EnumSchema, FixedSchema) gain a default field and transition their attributes from Value to JsonValue. Validation logic in the parser is enhanced to check that default array and map values conform to their element and value schemas respectively.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-467-2026-02-16-16-16-07

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the Avro schema implementation by adding support for default values in array and map schema types. This feature allows schema definitions to specify a default value for arrays and maps, which will be used if no explicit value is provided. The changes include modifications to schema structures, parsing logic, serialization, and robust validation to ensure that default values conform to their respective schema types.

Highlights

  • Default Field Support: Introduced default fields for ArraySchema and MapSchema to allow specifying default values directly within the schema definition.
  • Robust Validation: Implemented comprehensive validation for these new default fields, ensuring type compatibility with the array's items and map's values schemas.
  • New Error Types: Added new error types (ArrayDefaultWrongType, ArrayDefaultWrongInnerType, MapDefaultWrongType, MapDefaultWrongInnerType) to provide specific feedback for invalid default values.
  • Type Clarity Refactoring: Refactored internal usage of serde_json::Value to serde_json::JsonValue to prevent naming conflicts with crate::types::Value and improve clarity.
  • Serialization Updates: Updated schema serialization to correctly include the default field when present for array and map schemas.
Changelog
  • avro/src/error.rs
    • Added ArrayDefaultWrongType error for incorrect array default value types.
    • Added ArrayDefaultWrongInnerType error for incorrect inner types within an array default value.
    • Added MapDefaultWrongType error for incorrect map default value types.
    • Added MapDefaultWrongInnerType error for incorrect inner types within a map default value.
    • Deprecated FixedDefaultLenSizeMismatch error variant.
  • avro/src/schema/mod.rs
    • Imported types::Value and aliased serde_json::Value as JsonValue.
    • Added default: Option<Vec<Value>> to ArraySchema.
    • Added default: Option<HashMap<String, Value>> to MapSchema.
    • Updated attributes fields in MapSchema, ArraySchema, EnumSchema, and FixedSchema to use JsonValue.
    • Modified parse_list and parse_list_with_schemata to use JsonValue for schema parsing.
    • Updated Schema::parse and Schema::parse_with_names to accept &JsonValue.
    • Updated Schema::custom_attributes to return &BTreeMap<String, JsonValue>.
    • Modified Schema::map, Schema::map_with_attributes, Schema::array, and Schema::array_with_attributes to initialize default as None.
    • Adjusted Serialize implementation for Schema::Array and Schema::Map to serialize the new default field if present.
    • Updated parsing_canonical_form, pcf_map, and pcf_array functions to use &JsonValue for schema processing.
    • Updated various test cases to use JsonValue for default values and attributes.
    • Added new test cases to verify the new default field functionality and validation for arrays and maps.
  • avro/src/schema/parser.rs
    • Modified parse_array to extract and validate the default field for array schemas, checking its type and inner element types.
    • Modified parse_map to extract and validate the default field for map schemas, checking its type and inner element types.
  • avro/src/types.rs
    • Updated an assertion in a test case to reflect the change in the debug representation of ArraySchema due to the new default field.
Activity
  • The pull request was created with the title '467: feat!: Support default field for array and map' and the description '467: To review by AI'.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link
Copy Markdown

claude bot commented Feb 16, 2026

PR Review: Support default field for array and map schemas

Summary

This PR adds support for default values on array and map schemas in Avro, addressing issue apache#467. The implementation is generally solid with good test coverage, though there are some areas that could be improved.

✅ Strengths

  1. Comprehensive Test Coverage: Excellent test cases covering:

    • Empty defaults for arrays and maps
    • Defaults with actual values
    • Invalid type validation
    • Mixed type validation
    • All tests properly check error messages
  2. Type Safety: The implementation properly validates that default values match the schema type at parse time (avro/src/schema/parser.rs:711-727, 746-766)

  3. Proper Serialization: The Serialize implementation correctly handles the optional default field (avro/src/schema/mod.rs:758-795)

  4. Backward Compatibility: Deprecated error variant properly marked with #[deprecated] attribute

🔍 Issues and Concerns

1. Breaking Change in Public API (High Priority)

Adding new fields to public structs ArraySchema and MapSchema is a breaking change:

pub struct MapSchema {
    pub types: Box<Schema>,
    pub default: Option<HashMap<String, Value>>,  // NEW FIELD
    pub attributes: BTreeMap<String, JsonValue>,
}

Impact: Any code constructing these structs directly will break.

Recommendation:

  • This is marked as feat!: (breaking), which is correct
  • Consider providing builder methods or Default implementations to ease migration
  • Document this breaking change in CHANGELOG.md

2. Potential Performance Issue (Medium Priority)

In the serialization code (avro/src/schema/mod.rs:758-795), the size hint calculation could be optimized:

let mut map = serializer.serialize_map(Some(
    2 + attributes.len() + if default.is_some() { 1 } else { 0 },
))?;

While correct, this pattern is repeated twice. Consider extracting to a variable for clarity.

3. Error Message Consistency (Low Priority)

Error messages have slight inconsistencies:

  • "Default value for an array must be an array! Got: {0}" (uses "an array")
  • "Default value for a map must be a object! Got: {0}" (uses "a object" - should be "an object")

Recommendation: Change to "an object" for grammatical correctness.

4. Potential Validation Gap (Medium Priority)

The validation in parser.rs validates that array/map default values match the schema type, but there's a subtle issue:

if let Some(value) = array.iter().find(|v| !v.validate(&items)) {
    return Err(Details::ArrayDefaultWrongInnerType(items, value.clone()).into());
}

This validates each value individually, which is good. However, for complex nested schemas (arrays of records, maps of unions, etc.), this might not catch all edge cases.

Recommendation: Consider adding integration tests with nested complex types to ensure validation is thorough.

5. Missing Builder Updates (Medium Priority)

The helper functions Schema::array(), Schema::map(), Schema::array_with_attributes(), and Schema::map_with_attributes() all hardcode default: None. This is fine, but consider adding:

  • Schema::array_with_default()
  • Schema::map_with_default()
  • Or updating the existing _with_attributes versions to accept an optional default

This would provide a more ergonomic API for users constructing schemas programmatically.

6. Documentation (Low Priority)

The new fields default in ArraySchema and MapSchema lack documentation comments explaining:

  • What values are valid
  • When validation occurs
  • How this interacts with the Avro specification

Recommendation: Add doc comments similar to those on EnumSchema::default.

🔒 Security Considerations

No security concerns identified. The validation logic properly prevents type confusion attacks.

⚡ Performance Considerations

The implementation is efficient. The validation happens once at parse time, not during serialization/deserialization, which is the correct approach.

📋 Test Coverage Assessment

Excellent test coverage:

  • ✅ Empty defaults
  • ✅ Non-empty defaults
  • ✅ Invalid type detection
  • ✅ Mixed type detection
  • ✅ Serialization round-trip verification
  • ✅ Custom attributes exclusion check

Missing tests:

  • ⚠️ Nested complex types (array of records with defaults, map of unions with defaults)
  • ⚠️ Interaction with schema evolution (reader/writer schemas)
  • ⚠️ Very large default values (performance/memory)

🎯 Recommendations

Must Fix:

  1. Fix grammar: "a object" → "an object" in error message (avro/src/error.rs:444)

Should Fix:

  1. Add documentation comments to the new default fields
  2. Consider adding builder methods for schemas with defaults
  3. Add integration tests with nested complex types

Nice to Have:

  1. Extract size hint calculation to improve readability
  2. Document breaking changes in migration guide

📝 Overall Assessment

This is a solid implementation that correctly implements the feature according to the Avro specification. The test coverage is comprehensive and the validation logic is sound. The main concerns are around API ergonomics and documentation rather than correctness.

Recommendation: Approve with minor revisions (fix grammar, add docs).


🤖 Generated with Claude Code

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@avro/src/error.rs`:
- Around line 437-447: Update the error strings for MapDefaultWrongType and
MapDefaultWrongInnerType to fix the grammar ("a object" → "an object"); locate
the enum variants MapDefaultWrongType and MapDefaultWrongInnerType in error.rs
and change their #[error(...)] messages to use "an object" (and ensure the rest
of the messages remain unchanged).

In `@avro/src/schema/mod.rs`:
- Around line 3992-3995: Fix the typo in the error message that currently reads
"Default value for a map must be a object!" by changing it to "Default value for
a map must be an object!" in the error variant/constructor in error.rs (the
variant that builds this message for map default value type errors), and update
any tests expecting the old string (e.g., the assert in mod.rs that compares to
r#"Default value for a map must be a object! Got: "invalid""#) to the corrected
message r#"Default value for a map must be an object! Got: "invalid""# so the
message and tests remain consistent.
- Around line 5229-5231: The boolean results of json.contains(...) are currently
discarded, so the tests don't fail when the substring is missing; wrap each of
the four occurrences of json.contains(...) with assert!(...) to assert the
expected JSON content. Locate the occurrences where a Schema variant is
serialized (e.g., serde_json::to_string(&Schema::Array(array))? assigning to
json) and replace json.contains(r#"... "#) with assert!(json.contains(r#"...
"#)) at each of those spots (the four places around the Schema::Array and the
other Schema serializations) so the tests actually validate the JSON output.

In `@avro/src/schema/parser.rs`:
- Around line 711-726: The default validation for array/map inner values fails
when `items` or `values` is a `Schema::Ref` because `v.validate(&items)` builds
an isolated `ResolvedSchema`; update the validation to construct the resolved
schema with the parser's known schemata by calling
`ResolvedSchema::new_with_known_schemata(&items, &self.parsed_schemas)` (or
equivalent) and use that resolved schema's `validate` instead of the current
isolated one; apply the same change in both the array-default branch (where
`items` is checked) and the map-default branch (where `values` is checked) so
refs resolve against `parsed_schemas`.
🧹 Nitpick comments (2)
avro/src/schema/mod.rs (2)

769-772: default.clone() on every serialization is a potential performance concern.

Each call to serialize clones the entire default array/map just to convert it to JsonValue. If schemas are serialized frequently (e.g., in a writer hot path), this could add up.

Consider caching the JsonValue representation or storing the default as JsonValue directly to avoid repeated cloning + conversion.


656-662: Consider adding map_with_default / array_with_default constructors.

Since the default field is only settable by mutating the struct directly, it might be worth providing convenience constructors that accept a default value — especially given this is a feat! (breaking) change and a good time to expand the API surface. This is optional and can be deferred.

Also applies to: 674-680

Comment on lines +437 to +447
#[error("Default value for an array must be an array! Got: {0}")]
ArrayDefaultWrongType(serde_json::Value),

#[error("Default value for an array must be an array of {0}! Found: {1:?}")]
ArrayDefaultWrongInnerType(Schema, Value),

#[error("Default value for a map must be a object! Got: {0}")]
MapDefaultWrongType(serde_json::Value),

#[error("Default value for a map must be a object with (String, {0})! Found: (String, {1:?})")]
MapDefaultWrongInnerType(Schema, Value),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Grammar: "a object" → "an object" in map error messages.

Lines 443 and 446 use "a object" which is grammatically incorrect.

✏️ Proposed fix
-    #[error("Default value for a map must be a object! Got: {0}")]
+    #[error("Default value for a map must be an object! Got: {0}")]
     MapDefaultWrongType(serde_json::Value),

-    #[error("Default value for a map must be a object with (String, {0})! Found: (String, {1:?})")]
+    #[error("Default value for a map must be an object with (String, {0})! Found: (String, {1:?})")]
     MapDefaultWrongInnerType(Schema, Value),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[error("Default value for an array must be an array! Got: {0}")]
ArrayDefaultWrongType(serde_json::Value),
#[error("Default value for an array must be an array of {0}! Found: {1:?}")]
ArrayDefaultWrongInnerType(Schema, Value),
#[error("Default value for a map must be a object! Got: {0}")]
MapDefaultWrongType(serde_json::Value),
#[error("Default value for a map must be a object with (String, {0})! Found: (String, {1:?})")]
MapDefaultWrongInnerType(Schema, Value),
#[error("Default value for an array must be an array! Got: {0}")]
ArrayDefaultWrongType(serde_json::Value),
#[error("Default value for an array must be an array of {0}! Found: {1:?}")]
ArrayDefaultWrongInnerType(Schema, Value),
#[error("Default value for a map must be an object! Got: {0}")]
MapDefaultWrongType(serde_json::Value),
#[error("Default value for a map must be an object with (String, {0})! Found: (String, {1:?})")]
MapDefaultWrongInnerType(Schema, Value),
🤖 Prompt for AI Agents
In `@avro/src/error.rs` around lines 437 - 447, Update the error strings for
MapDefaultWrongType and MapDefaultWrongInnerType to fix the grammar ("a object"
→ "an object"); locate the enum variants MapDefaultWrongType and
MapDefaultWrongInnerType in error.rs and change their #[error(...)] messages to
use "an object" (and ensure the rest of the messages remain unchanged).

Comment on lines +3992 to +3995
assert_eq!(
r#"Default value for a map must be a object! Got: "invalid""#,
err
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo: "a object" should be "an object".

The error message reads "Default value for a map must be a object!" — this should be "an object". This likely originates from the error variant definition in error.rs.

🤖 Prompt for AI Agents
In `@avro/src/schema/mod.rs` around lines 3992 - 3995, Fix the typo in the error
message that currently reads "Default value for a map must be a object!" by
changing it to "Default value for a map must be an object!" in the error
variant/constructor in error.rs (the variant that builds this message for map
default value type errors), and update any tests expecting the old string (e.g.,
the assert in mod.rs that compares to r#"Default value for a map must be a
object! Got: "invalid""#) to the corrected message r#"Default value for a map
must be an object! Got: "invalid""# so the message and tests remain consistent.

Comment on lines +5229 to +5231

let json = serde_json::to_string(&Schema::Array(array))?;
json.contains(r#""default":[]"#);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the specific lines mentioned in the review
cat -n avro/src/schema/mod.rs | sed -n '5225,5235p'
cat -n avro/src/schema/mod.rs | sed -n '5255,5265p'
cat -n avro/src/schema/mod.rs | sed -n '5316,5326p'
cat -n avro/src/schema/mod.rs | sed -n '5341,5351p'

Repository: martin-augment/avro-rs

Length of output: 1423


🏁 Script executed:

# Search for bare .contains() calls in test code that are not wrapped in assert!
rg -n '^\s+\w+\.contains\(' --type=rust avro/src/schema/mod.rs

Repository: martin-augment/avro-rs

Length of output: 275


🏁 Script executed:

# Search the entire rust codebase for this pattern
rg -n '\w+\.contains\(' --type=rust | grep -v 'assert!' | head -20

Repository: martin-augment/avro-rs

Length of output: 1152


🏁 Script executed:

# Check the context around line 38 in serde.rs
cat -n avro_derive/tests/serde.rs | sed -n '35,42p'

Repository: martin-augment/avro-rs

Length of output: 289


Fix: json.contains() results are unused — wrap all four occurrences with assert!()

These tests silently pass regardless of JSON content because the boolean result from json.contains() is discarded. This occurs at lines 5231, 5260, 5321, and 5346.

Proposed fix

Line 5231:

-        json.contains(r#""default":[]"#);
+        assert!(json.contains(r#""default":[]"#));

Line 5260:

-        json.contains(r#""default":["foo","bar"]"#);
+        assert!(json.contains(r#""default":["foo","bar"]"#));

Line 5321:

-        json.contains(r#""default":{}"#);
+        assert!(json.contains(r#""default":{}"#));

Line 5346:

-        json.contains(r#""default":{"foo":"bar"}"#);
+        assert!(json.contains(r#""default":{"foo":"bar"}"#));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let json = serde_json::to_string(&Schema::Array(array))?;
json.contains(r#""default":[]"#);
let json = serde_json::to_string(&Schema::Array(array))?;
assert!(json.contains(r#""default":[]"#));
🤖 Prompt for AI Agents
In `@avro/src/schema/mod.rs` around lines 5229 - 5231, The boolean results of
json.contains(...) are currently discarded, so the tests don't fail when the
substring is missing; wrap each of the four occurrences of json.contains(...)
with assert!(...) to assert the expected JSON content. Locate the occurrences
where a Schema variant is serialized (e.g.,
serde_json::to_string(&Schema::Array(array))? assigning to json) and replace
json.contains(r#"... "#) with assert!(json.contains(r#"... "#)) at each of those
spots (the four places around the Schema::Array and the other Schema
serializations) so the tests actually validate the JSON output.

Comment on lines +711 to +726
let default = if let Some(default) = complex.get("default").cloned() {
if let Value::Array(_) = default {
let crate::types::Value::Array(array) = crate::types::Value::from(default) else {
unreachable!("JsonValue::Array can only become a Value::Array")
};
// Check that the default type matches the schema type
if let Some(value) = array.iter().find(|v| !v.validate(&items)) {
return Err(Details::ArrayDefaultWrongInnerType(items, value.clone()).into());
}
Some(array)
} else {
return Err(Details::ArrayDefaultWrongType(default).into());
}
} else {
None
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how validate is called - does it handle Refs without external names?
# Also check if parse_array/parse_map could receive a Ref as items/values.
echo "=== Checking if items can be a Ref ==="
rg -n 'Schema::Ref' avro/src/schema/parser.rs -C3

echo ""
echo "=== Checking ResolvedSchema::try_from for single schema ==="
rg -n 'impl.*TryFrom.*Schema' avro/src/schema/ -A 10

Repository: martin-augment/avro-rs

Length of output: 3524


🏁 Script executed:

#!/bin/bash
# Check the validate method implementation
echo "=== Checking Value::validate signature and implementation ==="
rg -n 'fn validate' avro/src/ -B 2 -A 5 | head -100

Repository: martin-augment/avro-rs

Length of output: 5841


🏁 Script executed:

#!/bin/bash
# Check where items/values come from in parse_array and parse_map
echo "=== Checking parse_array items handling ==="
sed -n '680,730p' avro/src/schema/parser.rs

echo ""
echo "=== Checking parse_map values handling ==="
sed -n '730,780p' avro/src/schema/parser.rs

Repository: martin-augment/avro-rs

Length of output: 4019


🏁 Script executed:

#!/bin/bash
# Get the full implementation of Value::validate
echo "=== Full Value::validate implementation ==="
sed -n '364,450p' avro/src/types.rs

Repository: martin-augment/avro-rs

Length of output: 4235


🏁 Script executed:

#!/bin/bash
# Check ResolvedSchema::new implementation to see how it handles refs
echo "=== ResolvedSchema::new implementation ==="
sed -n '40,78p' avro/src/schema/resolve.rs

Repository: martin-augment/avro-rs

Length of output: 1409


🏁 Script executed:

#!/bin/bash
# Search for how Schema::Ref is handled in validation
echo "=== Searching for Schema::Ref handling in resolve.rs ==="
rg -n 'Schema::Ref' avro/src/schema/resolve.rs -B 2 -A 2

Repository: martin-augment/avro-rs

Length of output: 344


Default validation fails for ref-type items or values schemas.

When items or values is a Schema::Ref, calling v.validate(&items) (line 717) creates an isolated ResolvedSchema that includes only the ref itself, not the parser's parsed_schemas context. Validation then fails to resolve the reference and incorrectly rejects valid defaults with "Unresolved schema reference" error. This affects both arrays (line 717) and maps (line 751).

To fix this, pass the parser's parsed_schemas to ResolvedSchema::new_with_known_schemata so refs can be properly resolved during validation.

🤖 Prompt for AI Agents
In `@avro/src/schema/parser.rs` around lines 711 - 726, The default validation for
array/map inner values fails when `items` or `values` is a `Schema::Ref` because
`v.validate(&items)` builds an isolated `ResolvedSchema`; update the validation
to construct the resolved schema with the parser's known schemata by calling
`ResolvedSchema::new_with_known_schemata(&items, &self.parsed_schemas)` (or
equivalent) and use that resolved schema's `validate` instead of the current
isolated one; apply the same change in both the array-default branch (where
`items` is checked) and the map-default branch (where `values` is checked) so
refs resolve against `parsed_schemas`.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for default fields in Avro array and map schemas, which is a great feature enhancement. However, the current implementation introduces several Denial of Service (DoS) vulnerabilities. Specifically, the conversion of JSON default values to Avro values can trigger a panic when encountering large integers, and the recursive validation of these default values lacks a depth limit, which can lead to stack overflows. These issues are particularly critical if the library is used to parse schemas from untrusted sources. It is recommended to use safe conversion methods that return errors instead of panicking and to implement recursion depth limits for all recursive parsing and validation logic. Additionally, while the implementation is thorough and includes comprehensive tests, I have a few suggestions to improve the new tests by turning some statements into actual assertions.

.and_then(|items| self.parse(items, enclosing_namespace))?;
let default = if let Some(default) = complex.get("default").cloned() {
if let Value::Array(_) = default {
let crate::types::Value::Array(array) = crate::types::Value::from(default) else {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The conversion of the default value using crate::types::Value::from(default) can trigger a panic if the JSON contains a number that does not fit into a signed 64-bit integer (e.g., a large u64). This is due to a panic! in the From<JsonValue> for Value implementation in avro/src/types.rs (line 286). An attacker can exploit this to cause a Denial of Service by providing a schema with a malicious default value that crashes the application during parsing.


let default = if let Some(default) = complex.get("default").cloned() {
if let Value::Object(_) = default {
let crate::types::Value::Map(map) = crate::types::Value::from(default) else {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Similar to the array default handling, converting a map's default value using crate::types::Value::from(default) can trigger a panic if the default value contains a large integer that exceeds the range of a signed 64-bit integer. This allows for a Denial of Service attack via a specially crafted schema that causes the parser to crash.

unreachable!("JsonValue::Array can only become a Value::Array")
};
// Check that the default type matches the schema type
if let Some(value) = array.iter().find(|v| !v.validate(&items)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The recursive call to v.validate(&items) lacks a depth limit. A deeply nested default value in the schema can cause a stack overflow, leading to a Denial of Service. It is recommended to implement a recursion depth limit for schema and value validation.

unreachable!("JsonValue::Object can only become a Value::Map")
};
// Check that the default type matches the schema type
if let Some(value) = map.values().find(|v| !v.validate(&types)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The recursive call to v.validate(&types) for map default values lacks a depth limit, making it vulnerable to stack overflow attacks using deeply nested schemas and default values. Implementing a recursion depth limit is advised to prevent potential crashes.

assert_eq!(array.default, Some(Vec::new()));

let json = serde_json::to_string(&Schema::Array(array))?;
json.contains(r#""default":[]"#);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This check currently has no effect because the boolean result of contains is not used. To make this a functional test assertion, it should be wrapped in assert!.

Suggested change
json.contains(r#""default":[]"#);
assert!(json.contains(r#"\"default\":[]"#));

);

let json = serde_json::to_string(&Schema::Array(array))?;
json.contains(r#""default":["foo","bar"]"#);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This check currently has no effect because the boolean result of contains is not used. To make this a functional test assertion, it should be wrapped in assert!.

Suggested change
json.contains(r#""default":["foo","bar"]"#);
assert!(json.contains(r#"\"default\":[\"foo\",\"bar\"]"#));

assert_eq!(map.default, Some(HashMap::new()));

let json = serde_json::to_string(&Schema::Map(map))?;
json.contains(r#""default":{}"#);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This check currently has no effect because the boolean result of contains is not used. To make this a functional test assertion, it should be wrapped in assert!.

Suggested change
json.contains(r#""default":{}"#);
assert!(json.contains(r#"\"default\":{}"#));

assert_eq!(map.default, Some(hashmap));

let json = serde_json::to_string(&Schema::Map(map))?;
json.contains(r#""default":{"foo":"bar"}"#);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This check currently has no effect because the boolean result of contains is not used. To make this a functional test assertion, it should be wrapped in assert!.

Suggested change
json.contains(r#""default":{"foo":"bar"}"#);
assert!(json.contains(r#"\"default\":{\"foo\":\"bar\"}"#));

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Feb 16, 2026

🤖 Augment PR Summary

Summary: Adds first-class support for default values on array and map schema definitions (breaking change).

Changes:

  • Extend ArraySchema/MapSchema with default fields and update constructors accordingly.
  • Update schema parsing to read and validate default for arrays/maps (type + inner element/value validation).
  • Add dedicated error variants for array/map default type mismatches; refine enum default error message.
  • Adjust schema serialization and custom-attribute handling so default is not treated as a custom attribute.
  • Add/adjust tests to cover valid and invalid defaults for arrays and maps.

Technical Notes: Defaults are stored as types::Value and converted back to JSON during schema serialization.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

.and_then(|items| self.parse(items, enclosing_namespace))?;
let default = if let Some(default) = complex.get("default").cloned() {
if let Value::Array(_) = default {
let crate::types::Value::Array(array) = crate::types::Value::from(default) else {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crate::types::Value::from(default) can panic for JSON numbers that serde_json stores as u64 (e.g. integers > i64::MAX), which would make schema parsing crash instead of returning an Error for an invalid default.

Severity: high

Other Locations
  • avro/src/schema/parser.rs:747

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

assert_eq!(array.default, Some(Vec::new()));

let json = serde_json::to_string(&Schema::Array(array))?;
json.contains(r#""default":[]"#);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of json.contains(...) is currently ignored, so these tests don’t actually assert that the serialized schema includes default (and this may also trigger unused_must_use warnings if warnings are denied).

Severity: medium

Other Locations
  • avro/src/schema/mod.rs:5260
  • avro/src/schema/mod.rs:5321
  • avro/src/schema/mod.rs:5346

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

@martin-augment
Copy link
Copy Markdown
Owner Author

1. Breaking Change in Public API (High Priority)

Adding new fields to public structs ArraySchema and MapSchema is a breaking change:

pub struct MapSchema {
    pub types: Box<Schema>,
    pub default: Option<HashMap<String, Value>>,  // NEW FIELD
    pub attributes: BTreeMap<String, JsonValue>,
}

Impact: Any code constructing these structs directly will break.

value:annoying; category:bug; feedback: The Claude AI reviewer is not correct! The apache-avro crate is still in development and it is allowed to make breaking changes before its v1.0.0 release.

@martin-augment
Copy link
Copy Markdown
Owner Author

4. Potential Validation Gap (Medium Priority)

The validation in parser.rs validates that array/map default values match the schema type, but there's a subtle issue:

if let Some(value) = array.iter().find(|v| !v.validate(&items)) {
    return Err(Details::ArrayDefaultWrongInnerType(items, value.clone()).into());
}

This validates each value individually, which is good. However, for complex nested schemas (arrays of records, maps of unions, etc.), this might not catch all edge cases.

Recommendation: Consider adding integration tests with nested complex types to ensure validation is thorough.

value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct! Adding more unit tests with more complex schemas will expose any problems in the current implementation and will prevent from regressions in the future.

@martin-augment
Copy link
Copy Markdown
Owner Author

5. Missing Builder Updates (Medium Priority)

The helper functions Schema::array(), Schema::map(), Schema::array_with_attributes(), and Schema::map_with_attributes() all hardcode default: None. This is fine, but consider adding:

  • Schema::array_with_default()
  • Schema::map_with_default()
  • Or updating the existing _with_attributes versions to accept an optional default

This would provide a more ergonomic API for users constructing schemas programmatically.

value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct! It would be good to make it possible to pass more arguments to the factory method. Using the bon crate builder generating macros will make really nice APIs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant