Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 24 additions & 7 deletions avro/src/schema/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -398,11 +398,13 @@ impl Schema {
/// Returns the [Parsing Canonical Form] of `self` that is self contained (not dependent on
/// any definitions in `schemata`)
///
/// If your require a self contained schema including `default` and `doc` attributes, see [`denormalize`][Schema::denormalize].
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in doc comment: "your require" → "you require"

📝 Fix
-    /// If your require a self contained schema including `default` and `doc` attributes, see [`denormalize`][Schema::denormalize].
+    /// If you require a self contained schema including `default` and `doc` attributes, see [`denormalize`][Schema::denormalize].
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// If your require a self contained schema including `default` and `doc` attributes, see [`denormalize`][Schema::denormalize].
/// If you require a self contained schema including `default` and `doc` attributes, see [`denormalize`][Schema::denormalize].
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/schema/mod.rs` at line 401, Fix the typo in the doc comment for
Schema::denormalize: change "If your require a self contained schema..." to "If
you require a self contained schema..." so the sentence reads correctly and
references Schema::denormalize; update the comment near the Schema::denormalize
documentation in mod.rs accordingly.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:typo; feedback: The CodeRabbit AI reviewer is correct! There is a minor typo in the documentation that should be fixed before merging the pull request.

///
/// [Parsing Canonical Form]:
/// https://avro.apache.org/docs/++version++/specification/#parsing-canonical-form-for-schemas
pub fn independent_canonical_form(&self, schemata: &[Schema]) -> Result<String, Error> {
let mut this = self.clone();
this.denormalize(schemata, &mut HashSet::with_capacity(schemata.len()))?;
this.denormalize(schemata)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

This call to denormalize is vulnerable to a stack overflow if schemata contains a cycle of Schema::Ref objects. While the Avro parser prevents such cycles in schemas it parses from JSON, this function is public and can be called with manually constructed Schema objects that form a cycle of references.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-but-wont-fix; category:bug; feedback: The Gemini AI reviewer is correct! But since this is a manually created Schema instance (in contrast to a parsed one) this is responsibility of the application developer to create a valid and usable Schema.

Ok(this.canonical_form())
}

Expand Down Expand Up @@ -688,7 +690,18 @@ impl Schema {
UnionSchema::new(schemas).map(Schema::Union)
}

fn denormalize(
/// Remove all external references from the schema.
///
/// `schemata` must contain all externally referenced schemas.
///
/// # Errors
/// Will return a [`Details::SchemaResolutionError`] if it fails to find
/// a referenced schema. This will put the schema in a partly denormalized state.
pub fn denormalize(&mut self, schemata: &[Schema]) -> AvroResult<()> {
self.denormalize_inner(schemata, &mut HashSet::new())
}
Comment on lines +693 to +702
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

New public API leaves self in a partially-denormalized state on error

The doc comment mentions this, but exposing it publicly makes it a real footgun: a caller who handles the Err and then continues using self will operate on a partially-inlined schema tree (some Refs replaced, some not). Because this is now part of the public API, atomicity is worth the extra clone:

🛡️ Proposed atomic implementation
 pub fn denormalize(&mut self, schemata: &[Schema]) -> AvroResult<()> {
-    self.denormalize_inner(schemata, &mut HashSet::new())
+    let mut candidate = self.clone();
+    candidate.denormalize_inner(schemata, &mut HashSet::with_capacity(schemata.len()))?;
+    *self = candidate;
+    Ok(())
 }

This also restores the with_capacity(schemata.len()) hint that was dropped when the method was split (minor allocation efficiency).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/schema/mod.rs` around lines 693 - 702, The public method denormalize
currently calls denormalize_inner in-place which can leave self partially
modified on Err; change denormalize to operate on a clone and only replace self
on success: clone self (use a Vec::with_capacity(schemata.len()) where you build
temporary schemata-related structures if applicable), call denormalize_inner on
the cloned instance, and if it returns Ok, swap or assign the cloned,
denormalized value back into self; if Err, return the error and leave the
original self untouched. Ensure you reference and use the existing denormalize,
denormalize_inner, and Schema abstractions when implementing the clone-and-swap
approach.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:valid-but-wont-fix; category:bug; feedback: The CodeRabbit AI reviewer is correct that this will solve the issue with the partially denormalized schema. But a partially denormalised schema is still a valid schema. If the user does not want a partially denormalized schema then they can clone it before attempting to denormalize.


fn denormalize_inner(
&mut self,
schemata: &[Schema],
defined_names: &mut HashSet<Name>,
Expand All @@ -708,7 +721,7 @@ impl Schema {
.find(|s| s.name().map(|n| *n == *name).unwrap_or(false));
if let Some(schema) = replacement_schema {
let mut denorm = schema.clone();
denorm.denormalize(schemata, defined_names)?;
denorm.denormalize_inner(schemata, defined_names)?;
*self = denorm;
} else {
return Err(Details::SchemaResolutionError(name.clone()).into());
Expand All @@ -717,18 +730,22 @@ impl Schema {
Schema::Record(record_schema) => {
defined_names.insert(record_schema.name.clone());
for field in &mut record_schema.fields {
field.schema.denormalize(schemata, defined_names)?;
field.schema.denormalize_inner(schemata, defined_names)?;
}
}
Schema::Array(array_schema) => {
array_schema.items.denormalize(schemata, defined_names)?;
array_schema
.items
.denormalize_inner(schemata, defined_names)?;
}
Schema::Map(map_schema) => {
map_schema.types.denormalize(schemata, defined_names)?;
map_schema
.types
.denormalize_inner(schemata, defined_names)?;
}
Schema::Union(union_schema) => {
for schema in &mut union_schema.schemas {
schema.denormalize(schemata, defined_names)?;
schema.denormalize_inner(schemata, defined_names)?;
}
}
schema if schema.is_named() => {
Expand Down