Skip to content

470: feat: Make Schema::denormalize public#50

Open
martin-augment wants to merge 1 commit intomainfrom
pr-470-2026-02-19-10-08-03
Open

470: feat: Make Schema::denormalize public#50
martin-augment wants to merge 1 commit intomainfrom
pr-470-2026-02-19-10-08-03

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

470: To review by AI

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 19, 2026

Walkthrough

The Schema type's denormalization functionality was refactored to improve the public API. A new public method denormalize was introduced that accepts a schemata array and manages internal state initialization. The previous logic was extracted into a new private helper method denormalize_inner that takes an additional HashSet parameter for tracking defined names. All internal recursive calls throughout the codebase were updated to invoke the new private helper instead of the original method. The independent_canonical_form method was updated to use the new public denormalize signature. Documentation comments were added to both the public method and private helper clarifying their purposes and behavior.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-470-2026-02-19-10-08-03

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the avro crate by exposing the Schema::denormalize method as a public API. This change provides users with direct control over removing external references from Avro schemas, facilitating the creation of self-contained schema definitions. The underlying denormalization logic was encapsulated in a private helper, ensuring a clean public interface while maintaining the integrity of the recursive process.

Highlights

  • Public API Exposure: The Schema::denormalize method has been made public, allowing external consumers to explicitly remove external references from Avro schemas.
  • Internal Refactoring: The core denormalization logic was refactored into a new private helper method, denormalize_inner, to manage internal state like defined_names and provide a cleaner public interface.
  • Documentation Update: Comprehensive documentation was added for the newly public denormalize method, detailing its purpose, parameters, and error handling.
  • Internal Call Adjustments: All internal recursive calls within the schema processing logic, including independent_canonical_form and various Schema enum variants, were updated to utilize the new denormalize_inner helper.
Changelog
  • avro/src/schema/mod.rs
    • Made Schema::denormalize public.
    • Renamed the internal denormalization logic to denormalize_inner.
    • Updated independent_canonical_form to use the new public denormalize method.
    • Added documentation for the public denormalize method.
    • Adjusted internal recursive calls to use denormalize_inner.
Activity
  • The pull request was submitted for review by AI.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Feb 19, 2026

🤖 Augment PR Summary

Summary: This PR makes schema denormalization a public API so callers can inline externally-referenced named schemas and work with a fully self-contained schema.

Changes:

  • Expose Schema::denormalize publicly and keep the recursive implementation in a private denormalize_inner.
  • Update independent_canonical_form to call the new public method.
  • Adjust internal recursive calls to use denormalize_inner to preserve the original behavior.

Technical Notes: Denormalization mutates the schema in place, tracks already-defined names to avoid infinite recursion, and returns a resolution error if any referenced schema is missing (potentially leaving a partially denormalized value).

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

@claude
Copy link
Copy Markdown

claude bot commented Feb 19, 2026

Code Review

This PR makes Schema::denormalize public by introducing a clean split into a public wrapper (denormalize) and a private recursive implementation (denormalize_inner). The approach is sound and the refactoring of independent_canonical_form to delegate to the new public method is correct.

Issues

Typo in doc comment (avro/src/schema/mod.rs, line 401):

If your require a self contained schema...

Should be: "If you require..."

Broken/incorrect rustdoc link in # Errors section (denormalize doc, line 698):

/// Will return a [`Details::SchemaResolutionError`] if it fails to find

Details is not re-exported from the crate root — it only appears via Error::details() and Error::into_details(). The intra-doc link [Details::SchemaResolutionError] will fail to resolve in user-facing documentation. Consider either:

  • Using the full path: [`crate::error::Details::SchemaResolutionError`]
  • Or simplifying to: Will return [Err] if it fails to find a referenced schema (schema resolution error).

Partial mutation on error is a public API footgun:
The doc correctly warns that an error "will put the schema in a partly denormalized state", but for a &mut self public API, leaving self in an unspecified/broken intermediate state on error is surprising behavior. Consider documenting more explicitly what "partly denormalized" means and whether the schema is still valid/usable afterwards, or if callers should discard it. A note like "On error, the schema is left in an indeterminate state and should be discarded" would be clearer.

Minor Notes

Missing doc example: Other public methods in this file include a # Examples section. Adding one to denormalize would improve discoverability and serve as a usage guide for callers who want to use it directly rather than via independent_canonical_form.

Minor performance regression: The original code in independent_canonical_form passed HashSet::with_capacity(schemata.len()). The new denormalize uses HashSet::new() (no pre-allocation). This is unlikely to matter in practice (the set tracks named types, not all of schemata), but it's worth noting the pre-allocation hint is now lost.

Summary

The core approach is correct and the API design (hiding defined_names from callers) is the right call. The main blocking issues are the typo and the broken intra-doc link. The partial-state-on-error behavior deserves clearer documentation before stabilizing as a public API.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request makes the Schema::denormalize function public, refactoring its implementation into a private denormalize_inner function for a cleaner API. However, a potential stack overflow vulnerability was identified in the recursive logic of denormalize_inner when handling cycles of schema references, which could lead to Denial of Service. It is recommended to implement cycle detection or a recursion depth limit. Additionally, a performance improvement is suggested for schema resolution within denormalize by using a HashMap for faster lookups.

Comment thread avro/src/schema/mod.rs
pub fn independent_canonical_form(&self, schemata: &[Schema]) -> Result<String, Error> {
let mut this = self.clone();
this.denormalize(schemata, &mut HashSet::with_capacity(schemata.len()))?;
this.denormalize(schemata)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

This call to denormalize is vulnerable to a stack overflow if schemata contains a cycle of Schema::Ref objects. While the Avro parser prevents such cycles in schemas it parses from JSON, this function is public and can be called with manually constructed Schema objects that form a cycle of references.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-but-wont-fix; category:bug; feedback: The Gemini AI reviewer is correct! But since this is a manually created Schema instance (in contrast to a parsed one) this is responsibility of the application developer to create a valid and usable Schema.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
avro/src/schema/mod.rs (1)

700-702: No direct unit test for the new pub fn denormalize

avro_rs_420_independent_canonical_form exercises denormalize only through independent_canonical_form. The new public method needs its own tests covering at minimum:

  1. Successful resolution of external Refs (asserting no Schema::Ref remain in the output).
  2. Error path: an unresolvable Ref returns Err(Details::SchemaResolutionError(...)).
  3. Mutual recursion (A ↔ B) terminates and produces the correct partially-inlined form.

Do you want me to generate a test suite for Schema::denormalize covering these scenarios?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/schema/mod.rs` around lines 700 - 702, Add direct unit tests for the
new public Schema::denormalize method (not only via independent_canonical_form).
Create tests that (1) construct schemas with external Ref entries and assert
denormalize returns Ok and that no Schema::Ref remain, (2) construct a schema
with an unresolvable Ref and assert denormalize returns
Err(Details::SchemaResolutionError(...)), and (3) create mutually recursive
schemas (A ↔ B) and assert denormalize terminates and produces the expected
partially-inlined structure; use denormalize (and denormalize_inner where
appropriate) to exercise resolution paths and mirror how
independent_canonical_form currently exercises it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@avro/src/schema/mod.rs`:
- Line 401: Fix the typo in the doc comment for Schema::denormalize: change "If
your require a self contained schema..." to "If you require a self contained
schema..." so the sentence reads correctly and references Schema::denormalize;
update the comment near the Schema::denormalize documentation in mod.rs
accordingly.
- Around line 693-702: The public method denormalize currently calls
denormalize_inner in-place which can leave self partially modified on Err;
change denormalize to operate on a clone and only replace self on success: clone
self (use a Vec::with_capacity(schemata.len()) where you build temporary
schemata-related structures if applicable), call denormalize_inner on the cloned
instance, and if it returns Ok, swap or assign the cloned, denormalized value
back into self; if Err, return the error and leave the original self untouched.
Ensure you reference and use the existing denormalize, denormalize_inner, and
Schema abstractions when implementing the clone-and-swap approach.

---

Nitpick comments:
In `@avro/src/schema/mod.rs`:
- Around line 700-702: Add direct unit tests for the new public
Schema::denormalize method (not only via independent_canonical_form). Create
tests that (1) construct schemas with external Ref entries and assert
denormalize returns Ok and that no Schema::Ref remain, (2) construct a schema
with an unresolvable Ref and assert denormalize returns
Err(Details::SchemaResolutionError(...)), and (3) create mutually recursive
schemas (A ↔ B) and assert denormalize terminates and produces the expected
partially-inlined structure; use denormalize (and denormalize_inner where
appropriate) to exercise resolution paths and mirror how
independent_canonical_form currently exercises it.

Comment thread avro/src/schema/mod.rs
/// Returns the [Parsing Canonical Form] of `self` that is self contained (not dependent on
/// any definitions in `schemata`)
///
/// If your require a self contained schema including `default` and `doc` attributes, see [`denormalize`][Schema::denormalize].
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in doc comment: "your require" → "you require"

📝 Fix
-    /// If your require a self contained schema including `default` and `doc` attributes, see [`denormalize`][Schema::denormalize].
+    /// If you require a self contained schema including `default` and `doc` attributes, see [`denormalize`][Schema::denormalize].
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// If your require a self contained schema including `default` and `doc` attributes, see [`denormalize`][Schema::denormalize].
/// If you require a self contained schema including `default` and `doc` attributes, see [`denormalize`][Schema::denormalize].
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/schema/mod.rs` at line 401, Fix the typo in the doc comment for
Schema::denormalize: change "If your require a self contained schema..." to "If
you require a self contained schema..." so the sentence reads correctly and
references Schema::denormalize; update the comment near the Schema::denormalize
documentation in mod.rs accordingly.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:typo; feedback: The CodeRabbit AI reviewer is correct! There is a minor typo in the documentation that should be fixed before merging the pull request.

Comment thread avro/src/schema/mod.rs
Comment on lines +693 to +702
/// Remove all external references from the schema.
///
/// `schemata` must contain all externally referenced schemas.
///
/// # Errors
/// Will return a [`Details::SchemaResolutionError`] if it fails to find
/// a referenced schema. This will put the schema in a partly denormalized state.
pub fn denormalize(&mut self, schemata: &[Schema]) -> AvroResult<()> {
self.denormalize_inner(schemata, &mut HashSet::new())
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

New public API leaves self in a partially-denormalized state on error

The doc comment mentions this, but exposing it publicly makes it a real footgun: a caller who handles the Err and then continues using self will operate on a partially-inlined schema tree (some Refs replaced, some not). Because this is now part of the public API, atomicity is worth the extra clone:

🛡️ Proposed atomic implementation
 pub fn denormalize(&mut self, schemata: &[Schema]) -> AvroResult<()> {
-    self.denormalize_inner(schemata, &mut HashSet::new())
+    let mut candidate = self.clone();
+    candidate.denormalize_inner(schemata, &mut HashSet::with_capacity(schemata.len()))?;
+    *self = candidate;
+    Ok(())
 }

This also restores the with_capacity(schemata.len()) hint that was dropped when the method was split (minor allocation efficiency).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/schema/mod.rs` around lines 693 - 702, The public method denormalize
currently calls denormalize_inner in-place which can leave self partially
modified on Err; change denormalize to operate on a clone and only replace self
on success: clone self (use a Vec::with_capacity(schemata.len()) where you build
temporary schemata-related structures if applicable), call denormalize_inner on
the cloned instance, and if it returns Ok, swap or assign the cloned,
denormalized value back into self; if Err, return the error and leave the
original self untouched. Ensure you reference and use the existing denormalize,
denormalize_inner, and Schema abstractions when implementing the clone-and-swap
approach.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:valid-but-wont-fix; category:bug; feedback: The CodeRabbit AI reviewer is correct that this will solve the issue with the partially denormalized schema. But a partially denormalised schema is still a valid schema. If the user does not want a partially denormalized schema then they can clone it before attempting to denormalize.

@martin-augment
Copy link
Copy Markdown
Owner Author

Typo in doc comment (avro/src/schema/mod.rs, line 401):

If your require a self contained schema...

Should be: "If you require..."

value:good-to-have; category:typo; feedback: The Claude AI reviewer is correct! There is a minor typo in the documentation that should be fixed before merging the pull request.

@martin-augment
Copy link
Copy Markdown
Owner Author

Broken/incorrect rustdoc link in # Errors section (denormalize doc, line 698):

/// Will return a [`Details::SchemaResolutionError`] if it fails to find

Details is not re-exported from the crate root — it only appears via Error::details() and Error::into_details(). The intra-doc link [Details::SchemaResolutionError] will fail to resolve in user-facing documentation. Consider either:

  • Using the full path: [`crate::error::Details::SchemaResolutionError`]
  • Or simplifying to: Will return [Err] if it fails to find a referenced schema (schema resolution error).

value:annoying; category:bug; feedback: The Claude AI reviewer is not correct! The documentation generates just fine! The Details struct is imported in the beginning of the file and this allows to use it in docstrings even if it is not a member of the crate's top-level exports

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants