Skip to content

493: feat!: Rework Name to be more performant#58

Open
martin-augment wants to merge 1 commit intomainfrom
pr-493-2026-03-02-07-24-33
Open

493: feat!: Rework Name to be more performant#58
martin-augment wants to merge 1 commit intomainfrom
pr-493-2026-03-02-07-24-33

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

493: To review by AI

This is achieved by:
- Making `Name` one allocation instead of two
- Returning references as much as possible to reduce cloning
  - `.namespace()` returns a `NamespaceRef<'_>` which is a `Option<&'_ str>`
  - `fully_qualified_name` returns a `Cow<'_, Name>` which reduces allocations
    when the name already has a namespace or there is enclosing namespace.
- Allow efficient construction of the name when there is an enclosing namespace
- `Display` and `fullname(None)` can now return the full name directly

This required changing the `SchemaNameValidator` trait which now needs to return
the index of the starting byte of the name. Included in this change is providing
default implementations for `*Validator::validate` functions. This allows users to
easily implement the trait by just returning a different regex.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 2, 2026

Walkthrough

This pull request refactors namespace handling across the avro-rs crate by introducing NamespaceRef (defined as Option<&'a str>) and replacing all &Namespace parameters with this new type. The Name struct transitions from public fields to private fields with public accessor methods (name() and namespace()). Updates affect schema parsing, validation, encoding/decoding, and serialization. The schema name validator trait signature changes from returning (String, Namespace) to returning usize. Related changes propagate through derive macros, serde implementations, and all call sites that construct or reference namespace values. No behavioral changes to Avro semantics are introduced.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-493-2026-03-02-07-24-33

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant refactoring of the Name and Namespace types within the Avro schema handling. The primary goal is to enhance performance by reducing string allocations through a more efficient internal representation of names and by transitioning to borrowed string references for namespaces. These changes are systematically applied across schema parsing, encoding, decoding, validation, and schema derivation components, resulting in a more performant and allocation-efficient Avro implementation.

Highlights

  • Performance Rework of Name and Namespace: The core Name struct has been refactored to store the full name and namespace in a single string, using an index to delineate the name part. This change aims to reduce string allocations and improve performance across the library.
  • Transition to Borrowed Namespace References: The Namespace type has been replaced with NamespaceRef<'a>, shifting from owned Option<String> to borrowed Option<&'a str>. This change is propagated throughout the codebase to minimize cloning and enhance efficiency.
  • Updated Name Resolution Logic: Methods like Name::new, Name::parse, fullname, and fully_qualified_name have been updated to leverage the new internal representation and NamespaceRef, often utilizing Cow for optimized ownership handling.
  • Refined Schema Validation: The SchemaNameValidator trait has been updated to return a usize representing the start index of the name part, aligning with the new Name struct's internal structure and simplifying validation logic.
Changelog
  • avro/src/decode.rs
    • Updated decode_internal to use NamespaceRef and name.namespace() for improved namespace handling.
  • avro/src/encode.rs
    • Modified encode_internal to utilize NamespaceRef and name.namespace() for namespace resolution, and into_owned() for error details.
  • avro/src/reader/block.rs
    • Adjusted calls to decode_internal and resolve_names to pass None directly for namespaces.
  • avro/src/reader/mod.rs
    • Updated decode_internal call to pass None for namespaces.
  • avro/src/reader/single_object.rs
    • Modified decode_internal call to pass None for namespaces.
  • avro/src/schema/mod.rs
    • Imported NamespaceRef and updated FixedSchema and Schema serialization and namespace accessors to use the new Name methods.
  • avro/src/schema/name.rs
    • Reworked the internal structure of the Name struct, replacing name and namespace fields with namespace_and_name and index_of_name.
    • Introduced NamespaceRef for borrowed namespace references.
    • Updated new, parse, fullname, and fully_qualified_name methods to reflect the new internal representation and leverage Cow.
    • Added invalid_empty_name for specific internal use cases.
    • Updated Debug and Display implementations for Name.
  • avro/src/schema/parser.rs
    • Updated parse and fetch_schema_ref methods to use NamespaceRef and new Name constructors.
    • Adjusted alias handling to align with the new Name structure.
  • avro/src/schema/record/field.rs
    • Modified RecordField::parse to use name.namespace() and name.into() for field name handling.
  • avro/src/schema/resolve.rs
    • Updated resolve_names and new_with_known_schemata to use NamespaceRef and into_owned() for Name instances.
  • avro/src/schema/union.rs
    • Changed find_schema_with_known_schemata to use NamespaceRef and simplified namespace logic.
  • avro/src/schema_compatibility.rs
    • Updated Checker::compare_named_schemas to use the name() method for comparison.
  • avro/src/schema_equality.rs
    • Modified FixedSchema initialization in tests to use Name::new.
  • avro/src/serde/derive.rs
    • Updated AvroSchemaComponent trait methods and macro implementations to use NamespaceRef and new Name constructors.
  • avro/src/serde/ser_schema/mod.rs
    • Modified SchemaAwareWriteSerializer::get_ref_schema to use fully_qualified_name.
    • Updated fixed schema name checks to use the name() method.
  • avro/src/serde/with.rs
    • Updated get_schema_in_ctxt and get_record_fields_in_ctxt functions to use NamespaceRef and new Name constructors.
  • avro/src/types.rs
    • Updated validate_internal and resolve_internal methods to use NamespaceRef and new Name methods.
  • avro/src/util.rs
    • Changed MapHelper::string to return Option<&str> and doc to map to Into::into.
  • avro/src/validator.rs
    • Reworked SchemaNameValidator trait to return usize for the name's start index and updated its default implementation.
  • avro/src/writer.rs
    • Modified validate_internal and encode_internal calls to use schema.namespace() directly.
  • avro/tests/avro_schema_component.rs
    • Updated get_schema_in_ctxt calls to pass None for namespaces.
  • avro/tests/get_record_fields.rs
    • Updated get_record_fields_in_ctxt and get_schema_in_ctxt calls to pass None for namespaces.
  • avro/tests/schema.rs
    • Updated assertions for name.name() and name.namespace() in tests.
  • avro/tests/serde_human_readable_true.rs
    • Updated error message assertion to reflect name.name() usage.
  • avro/tests/validators.rs
    • Modified CustomValidator to return usize for validate.
  • avro_derive/src/lib.rs
    • Updated macro-generated code for AvroSchemaComponent to use NamespaceRef and Name::new_with_enclosing_namespace.
  • avro_derive/tests/derive.rs
    • Updated tests to reflect changes in Name and Namespace handling, including name.namespace() and Name::new usage.
Activity
  • The pull request was created by martin-augment.
  • The pull request is marked for review by AI.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is an impressive and significant refactoring of the Name and Namespace handling to improve performance. The core changes, such as reworking the Name struct to use a single String with an index, introducing NamespaceRef to avoid string allocations, and using Cow for fully_qualified_name, are excellent optimizations.

The changes have been applied consistently and cleanly across the entire codebase. The refactoring of the validator traits to provide default validate implementations is also a nice improvement to the API.

Overall, this is a high-quality contribution that should bring noticeable performance benefits. I did not find any issues of medium or higher severity.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
avro/src/validator.rs (1)

95-103: Avoid panicking on misconfigured custom name-validator regex.

The default implementation can crash if a custom regex matches but omits the name capture. Returning an error is safer than panicking.

♻️ Proposed fix
     fn validate(&self, schema_name: &str) -> AvroResult<usize> {
         let regex = SchemaNameValidator::regex(self);
         let caps = regex
             .captures(schema_name)
             .ok_or_else(|| Details::InvalidSchemaName(schema_name.to_string(), regex.as_str()))?;
-        Ok(caps
-            .name("name")
-            .expect("Regex has no group named `name`")
-            .start())
+        caps.name("name")
+            .map(|m| m.start())
+            .ok_or_else(|| Details::InvalidSchemaName(schema_name.to_string(), regex.as_str()).into())
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/validator.rs` around lines 95 - 103, The validate method currently
panics if a custom regex matches but lacks the "name" capture; change it to
return an AvroResult error instead of calling expect. After obtaining caps from
SchemaNameValidator::regex(self), check caps.name("name") for Some; if None
return an appropriate Details::InvalidSchemaName (or create a new Details
variant like InvalidSchemaNameCapture) with the regex.as_str() and schema_name
to surface the misconfigured regex, otherwise proceed to use start() from the
capture; update the validate function to propagate this error rather than
panicking.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@avro/src/schema/parser.rs`:
- Around line 175-179: The current reserved-type guard erroneously rejects
qualified names like `foo.record` because it matches on
fully_qualified_name.name() alone; update the check in the match that returns
InvalidSchemaRecord (and the similar guards around lines 184-195) to only treat
a name as the reserved keyword when it is unqualified (i.e., when
fully_qualified_name.namespace().is_none()) and fully_qualified_name.name() is
"record" | "enum" | "fixed"; keep the normal lookup against input_schemas (used
later) unchanged so qualified references are allowed.

---

Nitpick comments:
In `@avro/src/validator.rs`:
- Around line 95-103: The validate method currently panics if a custom regex
matches but lacks the "name" capture; change it to return an AvroResult error
instead of calling expect. After obtaining caps from
SchemaNameValidator::regex(self), check caps.name("name") for Some; if None
return an appropriate Details::InvalidSchemaName (or create a new Details
variant like InvalidSchemaNameCapture) with the regex.as_str() and schema_name
to surface the misconfigured regex, otherwise proceed to use start() from the
capture; update the validate function to propagate this error rather than
panicking.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 682a3e3 and 814c712.

📒 Files selected for processing (27)
  • avro/src/decode.rs
  • avro/src/encode.rs
  • avro/src/reader/block.rs
  • avro/src/reader/mod.rs
  • avro/src/reader/single_object.rs
  • avro/src/schema/mod.rs
  • avro/src/schema/name.rs
  • avro/src/schema/parser.rs
  • avro/src/schema/record/field.rs
  • avro/src/schema/resolve.rs
  • avro/src/schema/union.rs
  • avro/src/schema_compatibility.rs
  • avro/src/schema_equality.rs
  • avro/src/serde/derive.rs
  • avro/src/serde/ser_schema/mod.rs
  • avro/src/serde/with.rs
  • avro/src/types.rs
  • avro/src/util.rs
  • avro/src/validator.rs
  • avro/src/writer.rs
  • avro/tests/avro_schema_component.rs
  • avro/tests/get_record_fields.rs
  • avro/tests/schema.rs
  • avro/tests/serde_human_readable_true.rs
  • avro/tests/validators.rs
  • avro_derive/src/lib.rs
  • avro_derive/tests/derive.rs

Comment on lines +175 to +179
match fully_qualified_name.name() {
"record" | "enum" | "fixed" => {
return Err(Details::InvalidSchemaRecord(name.to_string()).into());
return Err(
Details::InvalidSchemaRecord(fully_qualified_name.name().to_string()).into(),
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Reserved-type guard now rejects valid qualified names ending in record|enum|fixed.

At Line 175, the check uses fully_qualified_name.name(), so references like foo.record are treated as invalid keywords before lookup in input_schemas (Line 184). This is a functional regression for valid named-type references.

🔧 Proposed fix
-        // For good error reporting we add this check
-        match fully_qualified_name.name() {
-            "record" | "enum" | "fixed" => {
-                return Err(
-                    Details::InvalidSchemaRecord(fully_qualified_name.name().to_string()).into(),
-                );
-            }
-            _ => (),
-        }
-
         let value = self
             .input_schemas
             .remove(&fully_qualified_name)
             // TODO make a better descriptive error message here that conveys that a named schema cannot be found
             .ok_or_else(|| {
+                if matches!(name, "record" | "enum" | "fixed") {
+                    return Details::InvalidSchemaRecord(name.to_string());
+                }
                 let full_name = fully_qualified_name.fullname(None);
                 if full_name == "bool" {
                     Details::ParsePrimitiveSimilar(full_name, "boolean")
                 } else {
                     Details::ParsePrimitive(full_name)
                 }
             })?;

Also applies to: 184-195

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/schema/parser.rs` around lines 175 - 179, The current reserved-type
guard erroneously rejects qualified names like `foo.record` because it matches
on fully_qualified_name.name() alone; update the check in the match that returns
InvalidSchemaRecord (and the similar guards around lines 184-195) to only treat
a name as the reserved keyword when it is unqualified (i.e., when
fully_qualified_name.namespace().is_none()) and fully_qualified_name.name() is
"record" | "enum" | "fixed"; keep the normal lookup against input_schemas (used
later) unchanged so qualified references are allowed.

@claude
Copy link
Copy Markdown

claude bot commented Mar 2, 2026

PR Review: feat!: Rework Name to be more performant

Overview

Well-motivated performance refactor that consolidates the Name struct from two separate strings into a single string with a byte offset, plus a new NamespaceRef type alias. This eliminates many heap allocations in hot paths. The approach is sound and the implementation is thorough.

Issues

1. invalid_empty_name() is a silent panic footgun

Using index_of_name: usize::MAX as a sentinel means any call to .name(), .namespace(), or Debug::fmt() produces a hard-to-diagnose index-out-of-bounds panic. Even deriving Debug on a containing type like FixedSchema will silently trigger this in tests or error messages.

The only current use is FixedSchema::copy_only_size(). Safer alternatives: make that field Option, use a private marker type, or add a debug_assert!() in name() and namespace() that fires with a descriptive message rather than a raw slice panic.

2. Test accesses private fields directly

test_namespace_from_name_with_empty_value accesses name.namespace_and_name and name.index_of_name directly. Prefer testing through the public API (name.name(), name.namespace()) for resilience to future refactoring.

3. SchemaNameValidator::validate breaking change needs stronger docs

The signature changed from a required method returning AvroResult to a default method returning AvroResult. Custom validator implementors will get compile errors - this deserves mention in the CHANGELOG.

More critically, the new default implementation calls .expect(), panicking at runtime for custom regexes without a name capture group. Since SchemaNameValidator is part of the public API, this should return an AvroResult error rather than panic.

4. Silent behavioral change: removal of AVRO-3430 workaround

Name::parse previously had a FIXME block that read the name from the nested type.name field as a workaround for AVRO-3430. The new code removes this without mention. While removing known-incorrect behavior is good, it should be called out explicitly in the PR description and CHANGELOG so users can audit their schemas.

5. Minor: fully_qualified_name doc example is noisier

The required .into_owned() call is a usability regression in the doc example. A brief explanation of why Cow is returned would help readers understand the trade-off.

Positive aspects

  • Single-allocation Name: The dual-bound Into plus AsRef trick and inline comment are appreciated.
  • NamespaceRef: Systematic elimination of clones across all call sites is impressively thorough.
  • Cow in fully_qualified_name: Borrow-when-possible is exactly right.
  • MapHelper::string() returns str: Zero-cost, avoids transient allocations during JSON parsing.
  • Custom Debug impl: finish_non_exhaustive() is more future-proof; cleaner output in error messages.
  • Validator refactor: Default method plus empty impl block for SpecificationValidator is a nice simplification.

Summary

Main asks before merging:

  1. Address the invalid_empty_name panic footgun.
  2. Change the expect in the default validate to return an AvroResult error.
  3. Document the removal of the AVRO-3430 type_name workaround in the CHANGELOG.
  4. Test test_namespace_from_name_with_empty_value through the public API rather than private fields.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Mar 2, 2026

🤖 Augment PR Summary

Summary: Refactors schema naming to reduce allocations and speed up namespace handling by storing full names more compactly and borrowing namespaces where possible.

Changes:

  • Reworks schema::Name to store a single namespace_and_name string plus an index, exposing name()/namespace() accessors.
  • Introduces NamespaceRef = Option<&str> and threads it through parsing, resolution, encode/decode, reader/writer, and serde helpers.
  • Updates schema resolution to use Cow<Name> for fully-qualified references to avoid unnecessary cloning.
  • Adjusts validation APIs so SchemaNameValidator::validate now returns the byte offset of the name portion of a full name.
  • Updates derive macros, serde "with" modules, and tests to the new Name API.

Technical Notes: This is a breaking change (feat!) affecting public APIs that previously used owned Namespace values; callers and custom validators may need updates to the new borrowed namespace/index-based model.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

name: String::new(),
namespace: None,
},
name: Name::invalid_empty_name(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Name::invalid_empty_name() here creates a FixedSchema that will panic if it’s ever formatted or serialized, since FixedSchema’s Debug/Serialize paths call Name::name()/namespace(). If copy_only_size() leaks into an error/logging/serialization path, that could turn a recoverable failure into a hard panic.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Augment AI reviewer is correct! Using such special Name instance just for the need of FixedSchema::copy_only_size() is confusing and may lead to panics/application crash due to misusage. It would be good to rework this to a safer way that would never lead to a crash

))
Ok(caps
.name("name")
.expect("Regex has no group named `name`")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default SchemaNameValidator::validate will panic via expect(...) if a custom validator overrides regex() but forgets to define a name capture group. Since this is user-extensible API, consider returning a regular validation error instead of panicking on misconfiguration.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The validation should not lead to a panic but it should return an error and give the chance to the user application to handle it. Prevents an application crash due to invalid schema name.

// doesn't have namespace and `enclosing_namespace` is `Some`. Having `AsRef<str>` allows
// skipping that allocation.
let name_ref = name.as_ref();
let index_of_name = validate_schema_name(name_ref)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because validate_schema_name is user-configurable and now returns a raw byte index, a buggy custom validator could return an out-of-bounds or non-char-boundary index and cause slicing panics later in Name::name()/namespace(). It may be worth defensively sanity-checking index_of_name against name_ref before storing it.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! A user application may register a custom implementation of SchemaNameValidator that may return an index that could lead to an out of bounds error. It would be good to add a check that the index is within the boundaries and return an appropriate error here.

@martin-augment
Copy link
Copy Markdown
Owner Author

1. invalid_empty_name() is a silent panic footgun

Using index_of_name: usize::MAX as a sentinel means any call to .name(), .namespace(), or Debug::fmt() produces a hard-to-diagnose index-out-of-bounds panic. Even deriving Debug on a containing type like FixedSchema will silently trigger this in tests or error messages.

The only current use is FixedSchema::copy_only_size(). Safer alternatives: make that field Option, use a private marker type, or add a debug_assert!() in name() and namespace() that fires with a descriptive message rather than a raw slice panic.

value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct! Using such special Name instance just for the need of FixedSchema::copy_only_size() is confusing and may lead to panics/application crash due to misusage. It would be good to rework this to a safer way that would never lead to a crash

@martin-augment
Copy link
Copy Markdown
Owner Author

95-103: Avoid panicking on misconfigured custom name-validator regex.

The default implementation can crash if a custom regex matches but omits the name capture. Returning an error is safer than panicking.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The validation should not lead to a panic but it should return an error and give the chance to the user application to handle it. Prevents an application crash due to invalid schema name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants