Skip to content

479: fix!: Stricter schema parsing#54

Open
martin-augment wants to merge 2 commits intomainfrom
pr-479-2026-02-20-13-11-58
Open

479: fix!: Stricter schema parsing#54
martin-augment wants to merge 2 commits intomainfrom
pr-479-2026-02-20-13-11-58

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

479: To review by AI

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 20, 2026

Walkthrough

This pull request refactors Avro schema parsing and representation. The changes remove the RecordSchemaParseLocation enum and simplify parsing logic by eliminating location-based branching during complex type processing. Schema field type representations are updated from inline primitive formats to nested object structures (e.g., "type": "bytes" becomes "type": { "type": "bytes" }). Error enum variants are modified to include additional value parameters for improved error diagnostics. Test expectations and schema test data are updated throughout the codebase to reflect the new nested type object representation format.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-479-2026-02-20-13-11-58

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces stricter validation for Avro schema definitions, primarily by requiring complex types within record fields to be explicitly wrapped in a JSON object under the 'type' key. This change ensures closer adherence to the Avro specification, preventing ambiguous or malformed schema definitions. Additionally, error reporting for default value mismatches has been improved with more contextual information, and a new error has been added to validate the presence of the 'type' attribute in record fields.

Highlights

  • Stricter Schema Definition for Complex Types: Enforced that complex Avro types (e.g., enum, fixed, array, map, and types with logicalType) within record fields must now be defined as a JSON object under the 'type' attribute, rather than directly. This aligns with the Avro specification.
  • Improved Error Reporting: Enhanced the 'GetDefaultRecordField' error message to include the actual JSON value that caused the default value mismatch, providing clearer debugging information.
  • New Schema Validation Error: Introduced a new error, 'GetRecordFieldTypeField', to explicitly catch cases where a record field is missing its required 'type' attribute.
  • Simplified Schema Parsing Logic: Removed the 'RecordSchemaParseLocation' enum and its associated logic, streamlining the internal schema parsing mechanism.
Changelog
  • avro/src/bigdecimal.rs
    • Updated test schemas to correctly nest complex types with logical types within a 'type' object.
  • avro/src/error.rs
    • Modified the 'GetDefaultRecordField' error to include the problematic JSON value.
    • Added a new error 'GetRecordFieldTypeField' for missing field types.
  • avro/src/schema/mod.rs
    • Removed an unused import.
    • Updated various test schemas to enforce nesting of complex types within a 'type' object.
    • Added a new test case for direct enum definition in a field.
  • avro/src/schema/parser.rs
    • Refactored 'parse_complex' method by removing the 'RecordSchemaParseLocation' parameter, simplifying schema parsing.
  • avro/src/schema/record/field.rs
    • Removed an unused import.
    • Updated 'RecordField::parse' to explicitly check for the 'type' attribute.
    • Logged warnings for misplaced 'logicalType' attributes.
    • Improved default value error reporting.
  • avro/src/schema/record/mod.rs
    • Removed the export of 'RecordSchemaParseLocation'.
  • avro/src/schema/record/schema.rs
    • Removed the 'RecordSchemaParseLocation' enum definition.
  • avro/src/schema_compatibility.rs
    • Updated test schemas to correctly nest array types within a 'type' object.
  • avro/src/serde/ser_schema.rs
    • Updated a test schema to correctly nest a fixed type with a logical type within a 'type' object.
  • avro/tests/avro-3787.rs
    • Updated test schemas to correctly nest enum types within a 'type' object.
  • avro/tests/schema.rs
    • Updated test schemas to correctly nest array and map types within a 'type' object.
  • avro/tests/validators.rs
    • Updated a test schema to correctly nest an enum type within a 'type' object.
  • avro_derive/tests/derive.rs
    • Updated test schemas to correctly nest array types within a 'type' object.
  • avro_test_helper/src/data.rs
    • Updated test schemas to correctly nest fixed types within a 'type' object.
Activity
  • The pull request description indicates it is 'To review by AI', suggesting no human activity has occurred yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link
Copy Markdown

claude bot commented Feb 20, 2026

Code Review: PR #54 — Stricter Schema Parsing

Overall, this is a good refactor. Removing RecordSchemaParseLocation simplifies the parser noticeably, the error messages are meaningfully improved, and the new test covers the motivating issue. A few items worth addressing before merge:


Bugs / Typos

1. Double "in" in error message (avro/src/error.rs)

#[error("No `type` in in record field")]
GetRecordFieldTypeField,

Should be "No \type` in record field"`.

2. Typo in warning message (avro/src/schema/record/field.rs:88)

"It should probably be nested instide the `type` for the field"

instideinside.


Code Quality

3. Dead code in get_field_custom_attributes (field.rs:191-198)

Now that complex-type attributes (symbols, size, items, values) are required to live inside the nested "type" object, they will never appear at the field level in valid schemas. These four match arms are unreachable dead code:

key if key == "symbols" && matches!(schema, Schema::Enum(_)) => continue,
key if key == "size" && matches!(schema, Schema::Fixed(_)) => continue,
key if key == "items" && matches!(schema, Schema::Array(_)) => continue,
key if key == "values" && matches!(schema, Schema::Map(_)) => continue,

They should either be removed, or a comment should be left explaining why they're kept (e.g. as a defensive guard). Leaving them silently is confusing.

4. Weak fallback in error reporting (field.rs:170-172)

field_schema
    .independent_canonical_form(&schemata)
    .unwrap_or("Unknown".to_string()),

If independent_canonical_form fails, the error message will say must be a \Unknown`, which is unhelpful. Consider falling back to field_schema.canonical_form()` (which cannot fail) so the user always gets a meaningful schema description.


Spec Compliance Concern

The Avro spec does specify that a record field's type attribute is a schema object, so the old flattened form was non-standard. However, the Apache Avro Java reference implementation is permissive and accepts flattened field schemas in practice — meaning real-world schemas produced by other tooling may break.

Since this is a fix! (breaking) change, it would be worth documenting:

  • Which previously-accepted patterns now fail
  • A migration guide or examples showing the old vs new format

The error message for GetRecordFieldTypeField helps, but a CHANGELOG entry or note in the PR description would reduce friction for library consumers.


Positives

  • Removing RecordSchemaParseLocation is a nice simplification — the two-path dispatch was confusing.
  • GetDefaultRecordField now includes the actual bad value (Got: {3:?}), which makes debugging schema mismatches much easier.
  • avro_rs_476_enum_cannot_be_directly_in_field is a good regression test.
  • The warning for field-level logicalType is the right call — it degrades gracefully while guiding users toward the correct form.
  • The tests in avro-3787.rs with the "default" inside the type object are now more clearly structured.

Minor

In avro/tests/validators.rs, the previously-broken test field was missing a "name" key entirely — the fix adds "name": "field-name". This silently fixes a pre-existing test bug, worth calling out in the PR description.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces stricter Avro schema parsing, aligning the implementation more closely with the specification by enforcing that complex type definitions within record fields are properly nested inside a type object. While this improves compliance, several changes introduce or maintain patterns vulnerable to Denial of Service (DoS) attacks via resource exhaustion. Specifically, the recursive parser lacks a depth limit, error/log messages format potentially large untrusted JSON structures into strings, and the error path for default value resolution performs expensive cloning of the entire schema registry. These issues should be addressed by implementing recursion limits, truncating large values in diagnostic output, and optimizing the error reporting logic. Additionally, a couple of minor typos were found in user-facing messages that should be corrected.

Comment on lines +87 to +89
warn!(
"Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested instide the `type` for the field"
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

This warning message contains a typo: 'instide' should be 'inside'. More critically, logging the logical_type value directly can lead to a Denial of Service (DoS) if it represents an extremely large JSON object. Formatting such large untrusted structures into log messages consumes excessive CPU and memory. Consider truncating logical_type before logging or only logging its type to mitigate this resource exhaustion vulnerability.

Suggested change
warn!(
"Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested instide the `type` for the field"
);
warn!(
"Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested inside the `type` for the field"
);

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There is a minor typo in the warning log message that should be fixed to avoid confusion in the readers of the logs.

Comment on lines +312 to +313
#[error("`default`'s value type of field `{0}` in `{1}` must be a `{2:#}`. Got: {3:?}")]
GetDefaultRecordField(String, String, String, serde_json::Value),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Including the full serde_json::Value in the GetDefaultRecordField error message can lead to a Denial of Service (DoS) via memory exhaustion. When this error is triggered, the entire untrusted value is formatted into a string using its Debug representation. If an attacker provides a schema with an extremely large or deeply nested invalid default value, the process may consume excessive memory and CPU during error string construction.

Recommendation: Limit the size of the value included in the error message by truncating it or providing only a summary of the value's type and a small portion of its content.

Some(Value::Object(data)) => {
self.parse_complex(data, enclosing_namespace, RecordSchemaParseLocation::Root)
}
Some(Value::Object(data)) => self.parse_complex(data, enclosing_namespace),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The recursive call to parse_complex (and other recursive parsing functions like parse_union and parse_record) lacks a depth limit. An attacker can provide a deeply nested JSON schema (e.g., {"type": {"type": {"type": ...}}}) to trigger a stack overflow, leading to a Denial of Service (DoS) crash.

Recommendation: Implement a maximum recursion depth in the Parser to prevent stack exhaustion when processing maliciously crafted, deeply nested schemas.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! Highly recursive schemas could lead to stack overflow during the parsing of the schema. It would be good to have a configurable limit of much deep such schemas could be, so the user applications can protect themselves from such kind of an attack.


if !resolved {
let schemata = names.values().cloned().collect::<Vec<_>>();
return Err(Details::GetDefaultRecordField(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Cloning all schemas in the names map every time a default value fails to resolve can lead to significant memory pressure and CPU usage, especially in environments with large schema registries. This can be exploited to cause a Denial of Service (DoS) by providing a schema with many fields that have invalid default values.

Recommendation: Avoid cloning the entire schema registry in the error path. Consider passing a reference to the registry or using a more efficient way to generate the diagnostic information.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:valid-but-wont-fix; category:bug; feedback: The Gemini AI reviewer is correct that really huge schemas could lead to a lot of memory consumption but the chances for this are minimal. The same is true for the non-error path - similar huge schemas could also lead to the same problem.

#[error("No `type` in complex type")]
GetComplexTypeField,

#[error("No `type` in in record field")]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in this error message. It should be 'in record field' instead of 'in in record field'.

Suggested change
#[error("No `type` in in record field")]
#[error("No `type` in record field")]

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There is a minor typo in the error message for GetRecordFieldTypeField. It should be fixed to correct the grammar of this error message.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
avro/src/schema/record/field.rs (1)

190-197: Dead filter conditions in get_field_custom_attributes after the stricter format change.

With the new format, "items", "values", "symbols", and "size" will never appear as siblings of "name" at the field level — they live inside the nested type object. The four key if key == ... arms are now unreachable for well-formed schemas.

These can be removed to keep the filter list minimal, or retained as a belt-and-suspenders guard. Either way, a comment explaining the intent would help.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/schema/record/field.rs` around lines 190 - 197, The match arms in
get_field_custom_attributes that special-case "symbols", "size", "items", and
"values" are now dead/unreachable because those keys live inside nested type
objects under the stricter schema format; remove those four key-if arms from the
match in the get_field_custom_attributes function (leaving the core allowed
keys: "type", "name", "doc", "default", "order", "position", "aliases",
"logicalType") or if you prefer to keep them for defensive programming, add a
clear comment above the match explaining they are retained as a
belt-and-suspenders guard for malformed schemas; update the match to only
continue on the actual field-level keys and ensure custom_attributes.insert
handles all other keys.
avro/tests/avro-3787.rs (1)

271-278: Commented-out assertions reduce test coverage.

Lines 273–277 contain commented-out assert_eq! calls with no apparent reason for their omission. These leave the deserialized barParent value unverified.

Consider replacing the commented-out assertions with concrete value checks to close the coverage gap, or add a // TODO(avro-3787) with a tracked issue.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/tests/avro-3787.rs` around lines 271 - 278, The test in
avro/tests/avro-3787.rs leaves several assertions commented out, reducing
coverage; restore or replace those commented checks in the match arm for
types::Value::Record(fields) so the record contents are fully verified: assert
the expected field names and enum values for fields[0] and fields[1] (originally
referencing "barInit"/"barParent" and types::Value::Enum variants), or if
uncertain add a // TODO(avro-3787) and a clear check that fields.len() and the
concrete types/values (types::Value::Enum with the expected discriminant and
string) match the expected deserialized values to close the gap.
avro/src/schema/mod.rs (1)

5405-5425: New test is well-targeted; consider adding a "fixed" coverage case for completeness.

avro_4055 covers "record" used directly as a field type, and this test covers "enum". A third case — "fixed" specified directly at field level with a sibling "size" key — would complete the trio and guard against future regressions for all complex types. The expected error message would follow the same pattern already tested here.

🧪 Suggested additional case inside `avro_rs_476_enum_cannot_be_directly_in_field` (or a sibling test)
// fixed used directly as a field type (also invalid)
let schema_str_fixed = r#"{
    "type": "record",
    "name": "ExampleFixed",
    "namespace": "com.schema",
    "fields": [
        {
            "name": "wrong_fixed",
            "type": "fixed",
            "size": 16
        }
    ]
}"#;
let result = Schema::parse_str(schema_str_fixed).unwrap_err();
assert_eq!(
    result.to_string(),
    "Invalid schema: There is no type called 'fixed', if you meant to define a non-primitive schema, it should be defined inside `type` attribute. Please review the specification"
);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/schema/mod.rs` around lines 5405 - 5425, Add a third assertion
checking that a "fixed" complex type used directly as a field (with a sibling
"size" key) produces the same invalid-schema error: in the test
avro_rs_476_enum_cannot_be_directly_in_field (or a new sibling test) create a
schema_str_fixed where the field "type" is "fixed" and includes "size": 16, call
Schema::parse_str(schema_str_fixed).unwrap_err(), and assert its to_string()
equals "Invalid schema: There is no type called 'fixed', if you meant to define
a non-primitive schema, it should be defined inside `type` attribute. Please
review the specification" so the suite covers record/enum/fixed cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@avro/src/error.rs`:
- Around line 408-409: Typo in the user-facing error message for the
GetRecordFieldTypeField variant: remove the duplicated "in" so the message reads
"No `type` in record field". Update the #[error(...)] attribute on the
GetRecordFieldTypeField enum variant in avro/src/error.rs to the corrected
string, leaving the variant name and surrounding code unchanged.

In `@avro/src/schema/record/field.rs`:
- Around line 88-89: Fix the typo in the warning string in field.rs by changing
"instide" to "inside" in the message emitted where the code constructs the
warning about logicalType placement (the string containing "Ignored the
{enclosing_record}.logicalType property (`{logical_type}`). It should probably
be nested instide the `type` for the field"). Update that literal so it reads
"inside" to correct the spelling.
- Around line 83-90: The code now drops field-level logicalType by calling
parser.parse(ty, ...) on the extracted "type" value; restore previous behavior
by calling parser.parse_complex(field, &enclosing_record.namespace) (the same
call previously used) so field-level attributes like "logicalType", "precision",
and "scale" are respected by the logical type conversion logic (replace the
parser.parse(...) call and remove the warn! about ignored logicalType), and if
you decide the new silent-drop behavior is intended instead, change the warn! to
error! and add an explicit entry to CHANGELOG.md documenting the breaking change
and migration steps.

---

Nitpick comments:
In `@avro/src/schema/mod.rs`:
- Around line 5405-5425: Add a third assertion checking that a "fixed" complex
type used directly as a field (with a sibling "size" key) produces the same
invalid-schema error: in the test avro_rs_476_enum_cannot_be_directly_in_field
(or a new sibling test) create a schema_str_fixed where the field "type" is
"fixed" and includes "size": 16, call
Schema::parse_str(schema_str_fixed).unwrap_err(), and assert its to_string()
equals "Invalid schema: There is no type called 'fixed', if you meant to define
a non-primitive schema, it should be defined inside `type` attribute. Please
review the specification" so the suite covers record/enum/fixed cases.

In `@avro/src/schema/record/field.rs`:
- Around line 190-197: The match arms in get_field_custom_attributes that
special-case "symbols", "size", "items", and "values" are now dead/unreachable
because those keys live inside nested type objects under the stricter schema
format; remove those four key-if arms from the match in the
get_field_custom_attributes function (leaving the core allowed keys: "type",
"name", "doc", "default", "order", "position", "aliases", "logicalType") or if
you prefer to keep them for defensive programming, add a clear comment above the
match explaining they are retained as a belt-and-suspenders guard for malformed
schemas; update the match to only continue on the actual field-level keys and
ensure custom_attributes.insert handles all other keys.

In `@avro/tests/avro-3787.rs`:
- Around line 271-278: The test in avro/tests/avro-3787.rs leaves several
assertions commented out, reducing coverage; restore or replace those commented
checks in the match arm for types::Value::Record(fields) so the record contents
are fully verified: assert the expected field names and enum values for
fields[0] and fields[1] (originally referencing "barInit"/"barParent" and
types::Value::Enum variants), or if uncertain add a // TODO(avro-3787) and a
clear check that fields.len() and the concrete types/values (types::Value::Enum
with the expected discriminant and string) match the expected deserialized
values to close the gap.

Comment on lines +408 to +409
#[error("No `type` in in record field")]
GetRecordFieldTypeField,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in error message.

There’s a duplicated “in” in the user-facing error text.

📝 Suggested fix
-    #[error("No `type` in in record field")]
+    #[error("No `type` in record field")]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[error("No `type` in in record field")]
GetRecordFieldTypeField,
#[error("No `type` in record field")]
GetRecordFieldTypeField,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/error.rs` around lines 408 - 409, Typo in the user-facing error
message for the GetRecordFieldTypeField variant: remove the duplicated "in" so
the message reads "No `type` in record field". Update the #[error(...)]
attribute on the GetRecordFieldTypeField enum variant in avro/src/error.rs to
the corrected string, leaving the variant name and surrounding code unchanged.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The CodeRabbit AI reviewer is correct! There is a minor typo in the error message for GetRecordFieldTypeField. It should be fixed to correct the grammar of this error message.

Comment on lines +83 to +90
let ty = field.get("type").ok_or(Details::GetRecordFieldTypeField)?;
let schema = parser.parse(ty, &enclosing_record.namespace)?;

if let Some(logical_type) = field.get("logicalType") {
warn!(
"Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested instide the `type` for the field"
);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's understand the file structure and locate the file in question
find . -type f -name "field.rs" | head -20

Repository: martin-augment/avro-rs

Length of output: 99


🏁 Script executed:

#!/bin/bash
# Let's examine the specific file and surrounding context
cat -n avro/src/schema/record/field.rs | head -100

Repository: martin-augment/avro-rs

Length of output: 4308


🏁 Script executed:

#!/bin/bash
# Search for logicalType handling throughout the codebase to understand the broader context
rg -n "logicalType" --type rust -B 2 -A 2 | head -100

Repository: martin-augment/avro-rs

Length of output: 6020


🏁 Script executed:

#!/bin/bash
# Search for breaking change documentation or CHANGELOG
find . -type f \( -name "CHANGELOG*" -o -name "BREAKING*" -o -name "MIGRATION*" \) | head -20

Repository: martin-augment/avro-rs

Length of output: 80


🏁 Script executed:

#!/bin/bash
# Look for test cases that might reveal if field-level logicalType is being tested
rg -n '"logicalType"' --type rust -B 5 -A 5 | grep -E "(test|fn |logicalType)" | head -50

Repository: martin-augment/avro-rs

Length of output: 3468


🏁 Script executed:

#!/bin/bash
# Look for the Parser implementation to understand parse() method
rg -n "impl Parser" --type rust -A 20 | head -100

Repository: martin-augment/avro-rs

Length of output: 1327


🏁 Script executed:

#!/bin/bash
# Search for parse method in Parser
rg -n "fn parse" avro/src/schema/ --type rust -B 2 -A 10 | grep -A 10 "fn parse"

Repository: martin-augment/avro-rs

Length of output: 16873


🏁 Script executed:

#!/bin/bash
# Check CHANGELOG for breaking changes related to logicalType
cat CHANGELOG.md | head -200

Repository: martin-augment/avro-rs

Length of output: 4709


🏁 Script executed:

#!/bin/bash
# Look for git history showing changes to field.rs (if available)
git log --oneline avro/src/schema/record/field.rs 2>/dev/null | head -20

Repository: martin-augment/avro-rs

Length of output: 131


🏁 Script executed:

#!/bin/bash
# Search for actual field-level logicalType in test data to see if it was ever a pattern
rg -n "logicalType" avro_test_helper/src/data.rs -B 10 | grep -E "\"name\"|\"type\"|logicalType" | head -60

Repository: martin-augment/avro-rs

Length of output: 3331


🏁 Script executed:

#!/bin/bash
# Let's see how parse_complex handles logicalType
rg -n "parse_complex\|logicalType" avro/src/schema/parser.rs | head -80

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Look at the git commit to understand what changed
git show 561c8cb --stat 2>/dev/null | head -50

Repository: martin-augment/avro-rs

Length of output: 2970


🏁 Script executed:

#!/bin/bash
# Check if there are any tests that specifically test field-level logicalType being processed
rg -n "field.*logicalType\|logicalType.*field" --type rust -B 5 -A 5 | head -100

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Look at parse_complex to see how it handles logicalType
sed -n '248,330p' avro/src/schema/parser.rs

Repository: martin-augment/avro-rs

Length of output: 3703


🏁 Script executed:

#!/bin/bash
# Let's check the git diff for the actual change to understand what happened
git show 561c8cb avro/src/schema/record/field.rs 2>/dev/null | head -150

Repository: martin-augment/avro-rs

Length of output: 5313


🏁 Script executed:

#!/bin/bash
# Let's check if there's documentation about the breaking change
git show 561c8cb CHANGELOG.md 2>/dev/null | grep -A 10 -B 10 "logicalType\|breaking\|field" | head -80

Repository: martin-augment/avro-rs

Length of output: 1820


🏁 Script executed:

#!/bin/bash
# Let's look for any tests that validate the field-level logicalType behavior
rg -n "field.*logicalType\|Ignored the.*logicalType" --type rust -B 10 -A 5 | head -150

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Check if the test data with field-level logicalType actually passes or is expected to be ignored
rg -n '"logicalType": "decimal"' avro_test_helper/src/data.rs -B 15 | head -100

Repository: martin-augment/avro-rs

Length of output: 2611


🏁 Script executed:

#!/bin/bash
# Let's check if the current code actually processes field-level logicalType in tests
rg -n "DECIMAL_LOGICAL_TYPE\|DATE_LOGICAL_TYPE\|TIMEMILLIS_LOGICAL_TYPE" --type rust -B 2 -A 2

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Look for test code that uses these test constants to understand if field-level logicalType is tested
rg -n "DECIMAL_LOGICAL_TYPE\|DATE_LOGICAL_TYPE" avro/tests --type rust -B 5 -A 10 | head -80

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Check the structure of these test data - are they schemas or fields?
rg -n "pub const DECIMAL_LOGICAL_TYPE" avro_test_helper/src/data.rs -A 100 | head -150

Repository: martin-augment/avro-rs

Length of output: 2643


🏁 Script executed:

#!/bin/bash
# Let's check what context these test cases are used in - are they record fields or top-level types?
rg -n "DECIMAL_LOGICAL_TYPE" avro/tests --type rust -B 5 -A 5 | head -100

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Let's look at how these test constants are actually used
fd -e rs -t f avro/tests | xargs grep -l "DECIMAL_LOGICAL_TYPE"

Repository: martin-augment/avro-rs

Length of output: 413


🏁 Script executed:

#!/bin/bash
# Check one of the test files to see how DECIMAL_LOGICAL_TYPE is used
rg -n "DECIMAL_LOGICAL_TYPE" avro/tests -A 15 | head -60

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Let's verify the Avro specification - check if field-level logicalType is valid
# by looking at how the parser currently handles it in parse_complex
sed -n '248,500p' avro/src/schema/parser.rs | head -200

Repository: martin-augment/avro-rs

Length of output: 9597


🏁 Script executed:

#!/bin/bash
# Look at the full parse_complex implementation to understand logicalType handling
rg -n "fn parse_complex" avro/src/schema/parser.rs -A 250 | head -300

Repository: martin-augment/avro-rs

Length of output: 12975


🏁 Script executed:

#!/bin/bash
# Let's understand the context better - check if this is actually a newly created file or modified
git log --oneline --all -- avro/src/schema/record/field.rs 2>/dev/null | head -5

Repository: martin-augment/avro-rs

Length of output: 450


🏁 Script executed:

#!/bin/bash
# Let's check if there's any documentation about breaking changes in the repo
cat CHANGELOG.md | grep -A 20 "Unreleased\|Breaking\|breaking\|logicalType\|field"

Repository: martin-augment/avro-rs

Length of output: 1755


🏁 Script executed:

#!/bin/bash
# Check if there are any existing tests that validate field-level logicalType behavior
fd . avro/tests --full-path -type f -name "*.rs" | xargs grep -l "logicalType.*name\|name.*logicalType" | head -5

Repository: martin-augment/avro-rs

Length of output: 236


🏁 Script executed:

#!/bin/bash
# Look for the actual test that uses these DECIMAL_LOGICAL_TYPE constants
rg -n "DECIMAL_LOGICAL_TYPE" --type rust -B 3 -A 15 | head -80

Repository: martin-augment/avro-rs

Length of output: 2628


🏁 Script executed:

#!/bin/bash
# Let's check how the DECIMAL_LOGICAL_TYPE examples are actually structured - are they field definitions or type definitions?
rg -n "pub const DECIMAL_LOGICAL_TYPE" avro_test_helper/src/data.rs -A 50 | sed -n '1,60p'

Repository: martin-augment/avro-rs

Length of output: 1394


🏁 Script executed:

#!/bin/bash
# Check how these test constants are used in actual tests
rg -n "DECIMAL_LOGICAL_TYPE\|DATE_LOGICAL_TYPE" --type rust | head -20

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Let's look at how these constants are used in the test suite
rg -n "valid_examples\|invalid_examples" avro_test_helper/src/data.rs -B 5 -A 10 | head -50

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Now let's verify: are these test data representing RECORD FIELDS or TOP-LEVEL SCHEMAS?
# Check how valid_examples is used in tests
rg -n "valid_examples\|invalid_examples" --type rust avro/tests -B 3 -A 10 | head -80

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Let's check if there's a test that uses these examples to parse schemas
rg -n "Schema::parse\|parse_str" --type rust avro/tests | grep -B 3 -A 3 "valid_examples\|invalid_examples" | head -50

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Let's understand the actual structure - check if these are field definitions by looking at context
rg -n "\"name\": \"" avro_test_helper/src/data.rs -B 3 -A 8 | grep -E "\"name\"|\"type\"|\"logicalType\"|\"precision\"" | head -80

Repository: martin-augment/avro-rs

Length of output: 4478


🏁 Script executed:

#!/bin/bash
# Let's check the git diff more carefully - what was the previous implementation before this change?
# Since this is a NEW file, there must have been a previous file
git log --all --full-history --diff-filter=D -- "*field*.rs" 2>/dev/null | head -20

Repository: martin-augment/avro-rs

Length of output: 428


🏁 Script executed:

#!/bin/bash
# Let's see if there was a previous record_field file that was renamed or replaced
git log --all --oneline -- "*/record*field*.rs" 2>/dev/null | head -10

Repository: martin-augment/avro-rs

Length of output: 621


🏁 Script executed:

#!/bin/bash
# Look at the commit before this one to see what the code looked like
git show a50c8cd:avro/src/schema/record_field.rs 2>/dev/null | head -150

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Let's verify the Avro specification compliance by checking Apache Avro spec documentation
# Check if there's any comment about spec compliance in the code
rg -n "spec\|SPEC\|Specification\|specification" avro/src/schema/ --type rust | grep -i "logical\|field" | head -20

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Let's look at the previous implementation before the revert
git show f90ccc6^:avro/src/schema/record_field.rs 2>/dev/null | head -120

Repository: martin-augment/avro-rs

Length of output: 4197


🏁 Script executed:

#!/bin/bash
# Check if there's any discussion in comments about field-level logicalType support
rg -n "logicalType.*field\|field.*logicalType" avro/src/schema/ -i -B 3 -A 3 | head -100

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Let's understand what the code was doing before - check the commit before this one
git show a50c8cd:avro/src/schema/record_field.rs 2>/dev/null | grep -A 30 "fn parse"

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Let's see if there are any tests that actually use field-level logicalType in a record context
rg -n '"fields".*logicalType\|logicalType.*"fields"' --type rust -B 10 -A 10 | head -100

Repository: martin-augment/avro-rs

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Let's verify the exact behavior difference by checking what parse_complex did with field objects
git show a50c8cd:avro/src/schema/parser.rs 2>/dev/null | grep -A 50 "fn parse_complex" | head -70

Repository: martin-augment/avro-rs

Length of output: 2084


🏁 Script executed:

#!/bin/bash
# Check if the test data examples actually represent fields (with name, type, logicalType)
# These look like they could be standalone field definitions being tested
python3 << 'EOF'
import json

# Test example from the data
test_json = r'''{
    "type": {
        "type": "fixed",
        "name": "TestDecimal",
        "size": 10
    },
    "logicalType": "decimal",
    "precision": 4,
    "scale": 2
}'''

data = json.loads(test_json)
# This has "type", "logicalType", "precision", "scale" - structure of a field with logicalType
# If this is a field definition in a record, the logicalType would be at field level
print("Keys in test data:", list(data.keys()))
print("Has 'type' key:", 'type' in data)
print("Has 'logicalType' key:", 'logicalType' in data)
print("Has 'precision' key:", 'precision' in data)
print("\nThis structure suggests it's a FIELD definition with field-level logicalType")
EOF

Repository: martin-augment/avro-rs

Length of output: 283


🏁 Script executed:

#!/bin/bash
# Let's confirm this is breaking by checking if there's documentation of the expected behavior
# Check if parse_complex would have handled this in the old code
git show a50c8cd:avro/src/schema/parser.rs 2>/dev/null | grep -B 5 -A 100 "logicalType" | head -150

Repository: martin-augment/avro-rs

Length of output: 7275


Field-level logicalType is now silently dropped — document this breaking change and consider whether a warning is sufficient.

The previous implementation called parser.parse_complex(field, ...) with the entire field object, which would process field-level logicalType attributes via the logical type conversion logic. The new code calls parser.parse(ty, ...) on only the extracted type value, silently discarding any logicalType at the field level with only a warn!() log message.

Test data in avro_test_helper/src/data.rs (e.g., lines 388–397) shows valid schemas with field-level logicalType and precision/scale attributes. Existing schemas following this pattern will now produce Schema::Long or Schema::Bytes instead of the expected Schema::TimestampMillis, Schema::Decimal, etc., with no hard error to alert users of the silent change.

The CHANGELOG.md does not document this breaking change. Consider either:

  • Explicitly documenting this as a breaking change and migration path, or
  • Escalating to error!() for field-level logicalType to make the incompatibility explicit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/schema/record/field.rs` around lines 83 - 90, The code now drops
field-level logicalType by calling parser.parse(ty, ...) on the extracted "type"
value; restore previous behavior by calling parser.parse_complex(field,
&enclosing_record.namespace) (the same call previously used) so field-level
attributes like "logicalType", "precision", and "scale" are respected by the
logical type conversion logic (replace the parser.parse(...) call and remove the
warn! about ignored logicalType), and if you decide the new silent-drop behavior
is intended instead, change the warn! to error! and add an explicit entry to
CHANGELOG.md documenting the breaking change and migration steps.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The logical-type attribute is ignored on the schema's "top-level" and due to this schemas which worked before now parse to their base type, e.g. Long instead of TimestampMillis

Comment on lines +88 to +89
"Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested instide the `type` for the field"
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in warning message: "instide" → "inside".

🔧 Proposed fix
-            "Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested instide the `type` for the field"
+            "Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested inside the `type` for the field"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested instide the `type` for the field"
);
"Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested inside the `type` for the field"
);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/schema/record/field.rs` around lines 88 - 89, Fix the typo in the
warning string in field.rs by changing "instide" to "inside" in the message
emitted where the code constructs the warning about logicalType placement (the
string containing "Ignored the {enclosing_record}.logicalType property
(`{logical_type}`). It should probably be nested instide the `type` for the
field"). Update that literal so it reads "inside" to correct the spelling.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The CodeRabbit AI reviewer is correct! There is a minor typo in the warning log message that should be fixed to avoid confusion in the readers of the logs.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Feb 20, 2026

🤖 Augment PR Summary

Summary: Tightens Avro schema parsing so non-primitive field schemas (and their logical-type metadata) must be expressed as nested schema objects under the field’s type attribute.

Changes:

  • Record-field parsing now parses only the value of type, rather than treating the whole field object as a schema.
  • Simplified the parser API by removing RecordSchemaParseLocation and updating Parser::parse_complex call sites.
  • Improved default-value type mismatch reporting for record fields to include the expected (independent) canonical schema form and the actual JSON value received.
  • Added a dedicated error for record fields missing a type attribute.
  • Updated numerous tests/fixtures to nest complex types and logical types within type (arrays, maps, enums, fixed, big-decimal, timestamps, uuid, etc.).
  • Added a regression test ensuring an enum cannot be declared directly via "type": "enum" in a record field.

Technical Notes: This is a breaking behavior change for schemas that previously relied on placing schema-specific keys (e.g., items, values, symbols, size, logicalType) at the record-field level instead of inside the nested schema object.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

#[error("No `type` in complex type")]
GetComplexTypeField,

#[error("No `type` in in record field")]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetRecordFieldTypeField error text has a duplicated word (“No type in in record field”), which is user-facing and will show up in schema parse errors.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! There is a minor typo in the error message for GetRecordFieldTypeField. It should be fixed to correct the grammar of this error message.


if let Some(logical_type) = field.get("logicalType") {
warn!(
"Ignored the {enclosing_record}.logicalType property (`{logical_type}`). It should probably be nested instide the `type` for the field"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning string has a typo (“instide”), and the {enclosing_record}.logicalType wording reads like the property is on the record rather than on the field, which could be confusing when debugging input schemas.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! There is a minor typo in the warning log message that should be fixed to avoid confusion in the readers of the logs.

@martin-augment
Copy link
Copy Markdown
Owner Author

2. Typo in warning message (avro/src/schema/record/field.rs:88)

"It should probably be nested instide the `type` for the field"

instideinside.

value:good-to-have; category:documentation; feedback: The Claude AI reviewer is correct! There is a minor typo in the warning log message that should be fixed to avoid confusion in the readers of the logs.

@martin-augment
Copy link
Copy Markdown
Owner Author

1. Double "in" in error message (avro/src/error.rs)

#[error("No `type` in in record field")]
GetRecordFieldTypeField,

Should be "No \type in record field".

value:good-to-have; category:documentation; feedback: The Claude AI reviewer is correct! There is a minor typo in the error message for GetRecordFieldTypeField. It should be fixed to correct the grammar of this error message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants