Conversation
WalkthroughThis pull request refactors Avro schema parsing and representation. The changes remove the ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces stricter validation for Avro schema definitions, primarily by requiring complex types within record fields to be explicitly wrapped in a JSON object under the 'type' key. This change ensures closer adherence to the Avro specification, preventing ambiguous or malformed schema definitions. Additionally, error reporting for default value mismatches has been improved with more contextual information, and a new error has been added to validate the presence of the 'type' attribute in record fields. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Code Review: PR #54 — Stricter Schema ParsingOverall, this is a good refactor. Removing Bugs / Typos1. Double "in" in error message ( #[error("No `type` in in record field")]
GetRecordFieldTypeField,Should be 2. Typo in warning message (
Code Quality3. Dead code in Now that complex-type attributes ( key if key == "symbols" && matches!(schema, Schema::Enum(_)) => continue,
key if key == "size" && matches!(schema, Schema::Fixed(_)) => continue,
key if key == "items" && matches!(schema, Schema::Array(_)) => continue,
key if key == "values" && matches!(schema, Schema::Map(_)) => continue,They should either be removed, or a comment should be left explaining why they're kept (e.g. as a defensive guard). Leaving them silently is confusing. 4. Weak fallback in error reporting ( field_schema
.independent_canonical_form(&schemata)
.unwrap_or("Unknown".to_string()),If Spec Compliance ConcernThe Avro spec does specify that a record field's Since this is a
The error message for Positives
MinorIn |
There was a problem hiding this comment.
Code Review
This pull request introduces stricter Avro schema parsing, aligning the implementation more closely with the specification by enforcing that complex type definitions within record fields are properly nested inside a type object. While this improves compliance, several changes introduce or maintain patterns vulnerable to Denial of Service (DoS) attacks via resource exhaustion. Specifically, the recursive parser lacks a depth limit, error/log messages format potentially large untrusted JSON structures into strings, and the error path for default value resolution performs expensive cloning of the entire schema registry. These issues should be addressed by implementing recursion limits, truncating large values in diagnostic output, and optimizing the error reporting logic. Additionally, a couple of minor typos were found in user-facing messages that should be corrected.
| warn!( | ||
| "Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested instide the `type` for the field" | ||
| ); |
There was a problem hiding this comment.
This warning message contains a typo: 'instide' should be 'inside'. More critically, logging the logical_type value directly can lead to a Denial of Service (DoS) if it represents an extremely large JSON object. Formatting such large untrusted structures into log messages consumes excessive CPU and memory. Consider truncating logical_type before logging or only logging its type to mitigate this resource exhaustion vulnerability.
| warn!( | |
| "Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested instide the `type` for the field" | |
| ); | |
| warn!( | |
| "Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested inside the `type` for the field" | |
| ); |
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There is a minor typo in the warning log message that should be fixed to avoid confusion in the readers of the logs.
| #[error("`default`'s value type of field `{0}` in `{1}` must be a `{2:#}`. Got: {3:?}")] | ||
| GetDefaultRecordField(String, String, String, serde_json::Value), |
There was a problem hiding this comment.
Including the full serde_json::Value in the GetDefaultRecordField error message can lead to a Denial of Service (DoS) via memory exhaustion. When this error is triggered, the entire untrusted value is formatted into a string using its Debug representation. If an attacker provides a schema with an extremely large or deeply nested invalid default value, the process may consume excessive memory and CPU during error string construction.
Recommendation: Limit the size of the value included in the error message by truncating it or providing only a summary of the value's type and a small portion of its content.
| Some(Value::Object(data)) => { | ||
| self.parse_complex(data, enclosing_namespace, RecordSchemaParseLocation::Root) | ||
| } | ||
| Some(Value::Object(data)) => self.parse_complex(data, enclosing_namespace), |
There was a problem hiding this comment.
The recursive call to parse_complex (and other recursive parsing functions like parse_union and parse_record) lacks a depth limit. An attacker can provide a deeply nested JSON schema (e.g., {"type": {"type": {"type": ...}}}) to trigger a stack overflow, leading to a Denial of Service (DoS) crash.
Recommendation: Implement a maximum recursion depth in the Parser to prevent stack exhaustion when processing maliciously crafted, deeply nested schemas.
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! Highly recursive schemas could lead to stack overflow during the parsing of the schema. It would be good to have a configurable limit of much deep such schemas could be, so the user applications can protect themselves from such kind of an attack.
|
|
||
| if !resolved { | ||
| let schemata = names.values().cloned().collect::<Vec<_>>(); | ||
| return Err(Details::GetDefaultRecordField( |
There was a problem hiding this comment.
Cloning all schemas in the names map every time a default value fails to resolve can lead to significant memory pressure and CPU usage, especially in environments with large schema registries. This can be exploited to cause a Denial of Service (DoS) by providing a schema with many fields that have invalid default values.
Recommendation: Avoid cloning the entire schema registry in the error path. Consider passing a reference to the registry or using a more efficient way to generate the diagnostic information.
There was a problem hiding this comment.
value:valid-but-wont-fix; category:bug; feedback: The Gemini AI reviewer is correct that really huge schemas could lead to a lot of memory consumption but the chances for this are minimal. The same is true for the non-error path - similar huge schemas could also lead to the same problem.
| #[error("No `type` in complex type")] | ||
| GetComplexTypeField, | ||
|
|
||
| #[error("No `type` in in record field")] |
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There is a minor typo in the error message for GetRecordFieldTypeField. It should be fixed to correct the grammar of this error message.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
avro/src/schema/record/field.rs (1)
190-197: Dead filter conditions inget_field_custom_attributesafter the stricter format change.With the new format,
"items","values","symbols", and"size"will never appear as siblings of"name"at the field level — they live inside the nested type object. The fourkey if key == ...arms are now unreachable for well-formed schemas.These can be removed to keep the filter list minimal, or retained as a belt-and-suspenders guard. Either way, a comment explaining the intent would help.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@avro/src/schema/record/field.rs` around lines 190 - 197, The match arms in get_field_custom_attributes that special-case "symbols", "size", "items", and "values" are now dead/unreachable because those keys live inside nested type objects under the stricter schema format; remove those four key-if arms from the match in the get_field_custom_attributes function (leaving the core allowed keys: "type", "name", "doc", "default", "order", "position", "aliases", "logicalType") or if you prefer to keep them for defensive programming, add a clear comment above the match explaining they are retained as a belt-and-suspenders guard for malformed schemas; update the match to only continue on the actual field-level keys and ensure custom_attributes.insert handles all other keys.avro/tests/avro-3787.rs (1)
271-278: Commented-out assertions reduce test coverage.Lines 273–277 contain commented-out
assert_eq!calls with no apparent reason for their omission. These leave the deserialized barParent value unverified.Consider replacing the commented-out assertions with concrete value checks to close the coverage gap, or add a
// TODO(avro-3787)with a tracked issue.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@avro/tests/avro-3787.rs` around lines 271 - 278, The test in avro/tests/avro-3787.rs leaves several assertions commented out, reducing coverage; restore or replace those commented checks in the match arm for types::Value::Record(fields) so the record contents are fully verified: assert the expected field names and enum values for fields[0] and fields[1] (originally referencing "barInit"/"barParent" and types::Value::Enum variants), or if uncertain add a // TODO(avro-3787) and a clear check that fields.len() and the concrete types/values (types::Value::Enum with the expected discriminant and string) match the expected deserialized values to close the gap.avro/src/schema/mod.rs (1)
5405-5425: New test is well-targeted; consider adding a"fixed"coverage case for completeness.
avro_4055covers"record"used directly as a field type, and this test covers"enum". A third case —"fixed"specified directly at field level with a sibling"size"key — would complete the trio and guard against future regressions for all complex types. The expected error message would follow the same pattern already tested here.🧪 Suggested additional case inside `avro_rs_476_enum_cannot_be_directly_in_field` (or a sibling test)
// fixed used directly as a field type (also invalid) let schema_str_fixed = r#"{ "type": "record", "name": "ExampleFixed", "namespace": "com.schema", "fields": [ { "name": "wrong_fixed", "type": "fixed", "size": 16 } ] }"#; let result = Schema::parse_str(schema_str_fixed).unwrap_err(); assert_eq!( result.to_string(), "Invalid schema: There is no type called 'fixed', if you meant to define a non-primitive schema, it should be defined inside `type` attribute. Please review the specification" );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@avro/src/schema/mod.rs` around lines 5405 - 5425, Add a third assertion checking that a "fixed" complex type used directly as a field (with a sibling "size" key) produces the same invalid-schema error: in the test avro_rs_476_enum_cannot_be_directly_in_field (or a new sibling test) create a schema_str_fixed where the field "type" is "fixed" and includes "size": 16, call Schema::parse_str(schema_str_fixed).unwrap_err(), and assert its to_string() equals "Invalid schema: There is no type called 'fixed', if you meant to define a non-primitive schema, it should be defined inside `type` attribute. Please review the specification" so the suite covers record/enum/fixed cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@avro/src/error.rs`:
- Around line 408-409: Typo in the user-facing error message for the
GetRecordFieldTypeField variant: remove the duplicated "in" so the message reads
"No `type` in record field". Update the #[error(...)] attribute on the
GetRecordFieldTypeField enum variant in avro/src/error.rs to the corrected
string, leaving the variant name and surrounding code unchanged.
In `@avro/src/schema/record/field.rs`:
- Around line 88-89: Fix the typo in the warning string in field.rs by changing
"instide" to "inside" in the message emitted where the code constructs the
warning about logicalType placement (the string containing "Ignored the
{enclosing_record}.logicalType property (`{logical_type}`). It should probably
be nested instide the `type` for the field"). Update that literal so it reads
"inside" to correct the spelling.
- Around line 83-90: The code now drops field-level logicalType by calling
parser.parse(ty, ...) on the extracted "type" value; restore previous behavior
by calling parser.parse_complex(field, &enclosing_record.namespace) (the same
call previously used) so field-level attributes like "logicalType", "precision",
and "scale" are respected by the logical type conversion logic (replace the
parser.parse(...) call and remove the warn! about ignored logicalType), and if
you decide the new silent-drop behavior is intended instead, change the warn! to
error! and add an explicit entry to CHANGELOG.md documenting the breaking change
and migration steps.
---
Nitpick comments:
In `@avro/src/schema/mod.rs`:
- Around line 5405-5425: Add a third assertion checking that a "fixed" complex
type used directly as a field (with a sibling "size" key) produces the same
invalid-schema error: in the test avro_rs_476_enum_cannot_be_directly_in_field
(or a new sibling test) create a schema_str_fixed where the field "type" is
"fixed" and includes "size": 16, call
Schema::parse_str(schema_str_fixed).unwrap_err(), and assert its to_string()
equals "Invalid schema: There is no type called 'fixed', if you meant to define
a non-primitive schema, it should be defined inside `type` attribute. Please
review the specification" so the suite covers record/enum/fixed cases.
In `@avro/src/schema/record/field.rs`:
- Around line 190-197: The match arms in get_field_custom_attributes that
special-case "symbols", "size", "items", and "values" are now dead/unreachable
because those keys live inside nested type objects under the stricter schema
format; remove those four key-if arms from the match in the
get_field_custom_attributes function (leaving the core allowed keys: "type",
"name", "doc", "default", "order", "position", "aliases", "logicalType") or if
you prefer to keep them for defensive programming, add a clear comment above the
match explaining they are retained as a belt-and-suspenders guard for malformed
schemas; update the match to only continue on the actual field-level keys and
ensure custom_attributes.insert handles all other keys.
In `@avro/tests/avro-3787.rs`:
- Around line 271-278: The test in avro/tests/avro-3787.rs leaves several
assertions commented out, reducing coverage; restore or replace those commented
checks in the match arm for types::Value::Record(fields) so the record contents
are fully verified: assert the expected field names and enum values for
fields[0] and fields[1] (originally referencing "barInit"/"barParent" and
types::Value::Enum variants), or if uncertain add a // TODO(avro-3787) and a
clear check that fields.len() and the concrete types/values (types::Value::Enum
with the expected discriminant and string) match the expected deserialized
values to close the gap.
| #[error("No `type` in in record field")] | ||
| GetRecordFieldTypeField, |
There was a problem hiding this comment.
Fix typo in error message.
There’s a duplicated “in” in the user-facing error text.
📝 Suggested fix
- #[error("No `type` in in record field")]
+ #[error("No `type` in record field")]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[error("No `type` in in record field")] | |
| GetRecordFieldTypeField, | |
| #[error("No `type` in record field")] | |
| GetRecordFieldTypeField, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/error.rs` around lines 408 - 409, Typo in the user-facing error
message for the GetRecordFieldTypeField variant: remove the duplicated "in" so
the message reads "No `type` in record field". Update the #[error(...)]
attribute on the GetRecordFieldTypeField enum variant in avro/src/error.rs to
the corrected string, leaving the variant name and surrounding code unchanged.
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The CodeRabbit AI reviewer is correct! There is a minor typo in the error message for GetRecordFieldTypeField. It should be fixed to correct the grammar of this error message.
| let ty = field.get("type").ok_or(Details::GetRecordFieldTypeField)?; | ||
| let schema = parser.parse(ty, &enclosing_record.namespace)?; | ||
|
|
||
| if let Some(logical_type) = field.get("logicalType") { | ||
| warn!( | ||
| "Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested instide the `type` for the field" | ||
| ); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's understand the file structure and locate the file in question
find . -type f -name "field.rs" | head -20Repository: martin-augment/avro-rs
Length of output: 99
🏁 Script executed:
#!/bin/bash
# Let's examine the specific file and surrounding context
cat -n avro/src/schema/record/field.rs | head -100Repository: martin-augment/avro-rs
Length of output: 4308
🏁 Script executed:
#!/bin/bash
# Search for logicalType handling throughout the codebase to understand the broader context
rg -n "logicalType" --type rust -B 2 -A 2 | head -100Repository: martin-augment/avro-rs
Length of output: 6020
🏁 Script executed:
#!/bin/bash
# Search for breaking change documentation or CHANGELOG
find . -type f \( -name "CHANGELOG*" -o -name "BREAKING*" -o -name "MIGRATION*" \) | head -20Repository: martin-augment/avro-rs
Length of output: 80
🏁 Script executed:
#!/bin/bash
# Look for test cases that might reveal if field-level logicalType is being tested
rg -n '"logicalType"' --type rust -B 5 -A 5 | grep -E "(test|fn |logicalType)" | head -50Repository: martin-augment/avro-rs
Length of output: 3468
🏁 Script executed:
#!/bin/bash
# Look for the Parser implementation to understand parse() method
rg -n "impl Parser" --type rust -A 20 | head -100Repository: martin-augment/avro-rs
Length of output: 1327
🏁 Script executed:
#!/bin/bash
# Search for parse method in Parser
rg -n "fn parse" avro/src/schema/ --type rust -B 2 -A 10 | grep -A 10 "fn parse"Repository: martin-augment/avro-rs
Length of output: 16873
🏁 Script executed:
#!/bin/bash
# Check CHANGELOG for breaking changes related to logicalType
cat CHANGELOG.md | head -200Repository: martin-augment/avro-rs
Length of output: 4709
🏁 Script executed:
#!/bin/bash
# Look for git history showing changes to field.rs (if available)
git log --oneline avro/src/schema/record/field.rs 2>/dev/null | head -20Repository: martin-augment/avro-rs
Length of output: 131
🏁 Script executed:
#!/bin/bash
# Search for actual field-level logicalType in test data to see if it was ever a pattern
rg -n "logicalType" avro_test_helper/src/data.rs -B 10 | grep -E "\"name\"|\"type\"|logicalType" | head -60Repository: martin-augment/avro-rs
Length of output: 3331
🏁 Script executed:
#!/bin/bash
# Let's see how parse_complex handles logicalType
rg -n "parse_complex\|logicalType" avro/src/schema/parser.rs | head -80Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Look at the git commit to understand what changed
git show 561c8cb --stat 2>/dev/null | head -50Repository: martin-augment/avro-rs
Length of output: 2970
🏁 Script executed:
#!/bin/bash
# Check if there are any tests that specifically test field-level logicalType being processed
rg -n "field.*logicalType\|logicalType.*field" --type rust -B 5 -A 5 | head -100Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Look at parse_complex to see how it handles logicalType
sed -n '248,330p' avro/src/schema/parser.rsRepository: martin-augment/avro-rs
Length of output: 3703
🏁 Script executed:
#!/bin/bash
# Let's check the git diff for the actual change to understand what happened
git show 561c8cb avro/src/schema/record/field.rs 2>/dev/null | head -150Repository: martin-augment/avro-rs
Length of output: 5313
🏁 Script executed:
#!/bin/bash
# Let's check if there's documentation about the breaking change
git show 561c8cb CHANGELOG.md 2>/dev/null | grep -A 10 -B 10 "logicalType\|breaking\|field" | head -80Repository: martin-augment/avro-rs
Length of output: 1820
🏁 Script executed:
#!/bin/bash
# Let's look for any tests that validate the field-level logicalType behavior
rg -n "field.*logicalType\|Ignored the.*logicalType" --type rust -B 10 -A 5 | head -150Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check if the test data with field-level logicalType actually passes or is expected to be ignored
rg -n '"logicalType": "decimal"' avro_test_helper/src/data.rs -B 15 | head -100Repository: martin-augment/avro-rs
Length of output: 2611
🏁 Script executed:
#!/bin/bash
# Let's check if the current code actually processes field-level logicalType in tests
rg -n "DECIMAL_LOGICAL_TYPE\|DATE_LOGICAL_TYPE\|TIMEMILLIS_LOGICAL_TYPE" --type rust -B 2 -A 2Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Look for test code that uses these test constants to understand if field-level logicalType is tested
rg -n "DECIMAL_LOGICAL_TYPE\|DATE_LOGICAL_TYPE" avro/tests --type rust -B 5 -A 10 | head -80Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check the structure of these test data - are they schemas or fields?
rg -n "pub const DECIMAL_LOGICAL_TYPE" avro_test_helper/src/data.rs -A 100 | head -150Repository: martin-augment/avro-rs
Length of output: 2643
🏁 Script executed:
#!/bin/bash
# Let's check what context these test cases are used in - are they record fields or top-level types?
rg -n "DECIMAL_LOGICAL_TYPE" avro/tests --type rust -B 5 -A 5 | head -100Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Let's look at how these test constants are actually used
fd -e rs -t f avro/tests | xargs grep -l "DECIMAL_LOGICAL_TYPE"Repository: martin-augment/avro-rs
Length of output: 413
🏁 Script executed:
#!/bin/bash
# Check one of the test files to see how DECIMAL_LOGICAL_TYPE is used
rg -n "DECIMAL_LOGICAL_TYPE" avro/tests -A 15 | head -60Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Let's verify the Avro specification - check if field-level logicalType is valid
# by looking at how the parser currently handles it in parse_complex
sed -n '248,500p' avro/src/schema/parser.rs | head -200Repository: martin-augment/avro-rs
Length of output: 9597
🏁 Script executed:
#!/bin/bash
# Look at the full parse_complex implementation to understand logicalType handling
rg -n "fn parse_complex" avro/src/schema/parser.rs -A 250 | head -300Repository: martin-augment/avro-rs
Length of output: 12975
🏁 Script executed:
#!/bin/bash
# Let's understand the context better - check if this is actually a newly created file or modified
git log --oneline --all -- avro/src/schema/record/field.rs 2>/dev/null | head -5Repository: martin-augment/avro-rs
Length of output: 450
🏁 Script executed:
#!/bin/bash
# Let's check if there's any documentation about breaking changes in the repo
cat CHANGELOG.md | grep -A 20 "Unreleased\|Breaking\|breaking\|logicalType\|field"Repository: martin-augment/avro-rs
Length of output: 1755
🏁 Script executed:
#!/bin/bash
# Check if there are any existing tests that validate field-level logicalType behavior
fd . avro/tests --full-path -type f -name "*.rs" | xargs grep -l "logicalType.*name\|name.*logicalType" | head -5Repository: martin-augment/avro-rs
Length of output: 236
🏁 Script executed:
#!/bin/bash
# Look for the actual test that uses these DECIMAL_LOGICAL_TYPE constants
rg -n "DECIMAL_LOGICAL_TYPE" --type rust -B 3 -A 15 | head -80Repository: martin-augment/avro-rs
Length of output: 2628
🏁 Script executed:
#!/bin/bash
# Let's check how the DECIMAL_LOGICAL_TYPE examples are actually structured - are they field definitions or type definitions?
rg -n "pub const DECIMAL_LOGICAL_TYPE" avro_test_helper/src/data.rs -A 50 | sed -n '1,60p'Repository: martin-augment/avro-rs
Length of output: 1394
🏁 Script executed:
#!/bin/bash
# Check how these test constants are used in actual tests
rg -n "DECIMAL_LOGICAL_TYPE\|DATE_LOGICAL_TYPE" --type rust | head -20Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Let's look at how these constants are used in the test suite
rg -n "valid_examples\|invalid_examples" avro_test_helper/src/data.rs -B 5 -A 10 | head -50Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Now let's verify: are these test data representing RECORD FIELDS or TOP-LEVEL SCHEMAS?
# Check how valid_examples is used in tests
rg -n "valid_examples\|invalid_examples" --type rust avro/tests -B 3 -A 10 | head -80Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Let's check if there's a test that uses these examples to parse schemas
rg -n "Schema::parse\|parse_str" --type rust avro/tests | grep -B 3 -A 3 "valid_examples\|invalid_examples" | head -50Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Let's understand the actual structure - check if these are field definitions by looking at context
rg -n "\"name\": \"" avro_test_helper/src/data.rs -B 3 -A 8 | grep -E "\"name\"|\"type\"|\"logicalType\"|\"precision\"" | head -80Repository: martin-augment/avro-rs
Length of output: 4478
🏁 Script executed:
#!/bin/bash
# Let's check the git diff more carefully - what was the previous implementation before this change?
# Since this is a NEW file, there must have been a previous file
git log --all --full-history --diff-filter=D -- "*field*.rs" 2>/dev/null | head -20Repository: martin-augment/avro-rs
Length of output: 428
🏁 Script executed:
#!/bin/bash
# Let's see if there was a previous record_field file that was renamed or replaced
git log --all --oneline -- "*/record*field*.rs" 2>/dev/null | head -10Repository: martin-augment/avro-rs
Length of output: 621
🏁 Script executed:
#!/bin/bash
# Look at the commit before this one to see what the code looked like
git show a50c8cd:avro/src/schema/record_field.rs 2>/dev/null | head -150Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Let's verify the Avro specification compliance by checking Apache Avro spec documentation
# Check if there's any comment about spec compliance in the code
rg -n "spec\|SPEC\|Specification\|specification" avro/src/schema/ --type rust | grep -i "logical\|field" | head -20Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Let's look at the previous implementation before the revert
git show f90ccc6^:avro/src/schema/record_field.rs 2>/dev/null | head -120Repository: martin-augment/avro-rs
Length of output: 4197
🏁 Script executed:
#!/bin/bash
# Check if there's any discussion in comments about field-level logicalType support
rg -n "logicalType.*field\|field.*logicalType" avro/src/schema/ -i -B 3 -A 3 | head -100Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Let's understand what the code was doing before - check the commit before this one
git show a50c8cd:avro/src/schema/record_field.rs 2>/dev/null | grep -A 30 "fn parse"Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Let's see if there are any tests that actually use field-level logicalType in a record context
rg -n '"fields".*logicalType\|logicalType.*"fields"' --type rust -B 10 -A 10 | head -100Repository: martin-augment/avro-rs
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Let's verify the exact behavior difference by checking what parse_complex did with field objects
git show a50c8cd:avro/src/schema/parser.rs 2>/dev/null | grep -A 50 "fn parse_complex" | head -70Repository: martin-augment/avro-rs
Length of output: 2084
🏁 Script executed:
#!/bin/bash
# Check if the test data examples actually represent fields (with name, type, logicalType)
# These look like they could be standalone field definitions being tested
python3 << 'EOF'
import json
# Test example from the data
test_json = r'''{
"type": {
"type": "fixed",
"name": "TestDecimal",
"size": 10
},
"logicalType": "decimal",
"precision": 4,
"scale": 2
}'''
data = json.loads(test_json)
# This has "type", "logicalType", "precision", "scale" - structure of a field with logicalType
# If this is a field definition in a record, the logicalType would be at field level
print("Keys in test data:", list(data.keys()))
print("Has 'type' key:", 'type' in data)
print("Has 'logicalType' key:", 'logicalType' in data)
print("Has 'precision' key:", 'precision' in data)
print("\nThis structure suggests it's a FIELD definition with field-level logicalType")
EOFRepository: martin-augment/avro-rs
Length of output: 283
🏁 Script executed:
#!/bin/bash
# Let's confirm this is breaking by checking if there's documentation of the expected behavior
# Check if parse_complex would have handled this in the old code
git show a50c8cd:avro/src/schema/parser.rs 2>/dev/null | grep -B 5 -A 100 "logicalType" | head -150Repository: martin-augment/avro-rs
Length of output: 7275
Field-level logicalType is now silently dropped — document this breaking change and consider whether a warning is sufficient.
The previous implementation called parser.parse_complex(field, ...) with the entire field object, which would process field-level logicalType attributes via the logical type conversion logic. The new code calls parser.parse(ty, ...) on only the extracted type value, silently discarding any logicalType at the field level with only a warn!() log message.
Test data in avro_test_helper/src/data.rs (e.g., lines 388–397) shows valid schemas with field-level logicalType and precision/scale attributes. Existing schemas following this pattern will now produce Schema::Long or Schema::Bytes instead of the expected Schema::TimestampMillis, Schema::Decimal, etc., with no hard error to alert users of the silent change.
The CHANGELOG.md does not document this breaking change. Consider either:
- Explicitly documenting this as a breaking change and migration path, or
- Escalating to
error!()for field-levellogicalTypeto make the incompatibility explicit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/schema/record/field.rs` around lines 83 - 90, The code now drops
field-level logicalType by calling parser.parse(ty, ...) on the extracted "type"
value; restore previous behavior by calling parser.parse_complex(field,
&enclosing_record.namespace) (the same call previously used) so field-level
attributes like "logicalType", "precision", and "scale" are respected by the
logical type conversion logic (replace the parser.parse(...) call and remove the
warn! about ignored logicalType), and if you decide the new silent-drop behavior
is intended instead, change the warn! to error! and add an explicit entry to
CHANGELOG.md documenting the breaking change and migration steps.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The logical-type attribute is ignored on the schema's "top-level" and due to this schemas which worked before now parse to their base type, e.g. Long instead of TimestampMillis
| "Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested instide the `type` for the field" | ||
| ); |
There was a problem hiding this comment.
Typo in warning message: "instide" → "inside".
🔧 Proposed fix
- "Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested instide the `type` for the field"
+ "Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested inside the `type` for the field"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested instide the `type` for the field" | |
| ); | |
| "Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested inside the `type` for the field" | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/schema/record/field.rs` around lines 88 - 89, Fix the typo in the
warning string in field.rs by changing "instide" to "inside" in the message
emitted where the code constructs the warning about logicalType placement (the
string containing "Ignored the {enclosing_record}.logicalType property
(`{logical_type}`). It should probably be nested instide the `type` for the
field"). Update that literal so it reads "inside" to correct the spelling.
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The CodeRabbit AI reviewer is correct! There is a minor typo in the warning log message that should be fixed to avoid confusion in the readers of the logs.
🤖 Augment PR SummarySummary: Tightens Avro schema parsing so non-primitive field schemas (and their logical-type metadata) must be expressed as nested schema objects under the field’s Changes:
Technical Notes: This is a breaking behavior change for schemas that previously relied on placing schema-specific keys (e.g., 🤖 Was this summary useful? React with 👍 or 👎 |
| #[error("No `type` in complex type")] | ||
| GetComplexTypeField, | ||
|
|
||
| #[error("No `type` in in record field")] |
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! There is a minor typo in the error message for GetRecordFieldTypeField. It should be fixed to correct the grammar of this error message.
|
|
||
| if let Some(logical_type) = field.get("logicalType") { | ||
| warn!( | ||
| "Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested instide the `type` for the field" |
There was a problem hiding this comment.
This warning string has a typo (“instide”), and the {enclosing_record}.logicalType wording reads like the property is on the record rather than on the field, which could be confusing when debugging input schemas.
Severity: low
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! There is a minor typo in the warning log message that should be fixed to avoid confusion in the readers of the logs.
value:good-to-have; category:documentation; feedback: The Claude AI reviewer is correct! There is a minor typo in the warning log message that should be fixed to avoid confusion in the readers of the logs. |
value:good-to-have; category:documentation; feedback: The Claude AI reviewer is correct! There is a minor typo in the error message for GetRecordFieldTypeField. It should be fixed to correct the grammar of this error message. |
479: To review by AI