493: feat!: Rework Name to be more performant#58
Conversation
This is achieved by:
- Making `Name` one allocation instead of two
- Returning references as much as possible to reduce cloning
- `.namespace()` returns a `NamespaceRef<'_>` which is a `Option<&'_ str>`
- `fully_qualified_name` returns a `Cow<'_, Name>` which reduces allocations
when the name already has a namespace or there is enclosing namespace.
- Allow efficient construction of the name when there is an enclosing namespace
- `Display` and `fullname(None)` can now return the full name directly
This required changing the `SchemaNameValidator` trait which now needs to return
the index of the starting byte of the name. Included in this change is providing
default implementations for `*Validator::validate` functions. This allows users to
easily implement the trait by just returning a different regex.
WalkthroughThis pull request refactors namespace handling across the avro-rs crate by introducing ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant refactoring of the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request is an impressive and significant refactoring of the Name and Namespace handling to improve performance. The core changes, such as reworking the Name struct to use a single String with an index, introducing NamespaceRef to avoid string allocations, and using Cow for fully_qualified_name, are excellent optimizations.
The changes have been applied consistently and cleanly across the entire codebase. The refactoring of the validator traits to provide default validate implementations is also a nice improvement to the API.
Overall, this is a high-quality contribution that should bring noticeable performance benefits. I did not find any issues of medium or higher severity.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
avro/src/validator.rs (1)
95-103: Avoid panicking on misconfigured custom name-validator regex.The default implementation can crash if a custom regex matches but omits the
namecapture. Returning an error is safer than panicking.♻️ Proposed fix
fn validate(&self, schema_name: &str) -> AvroResult<usize> { let regex = SchemaNameValidator::regex(self); let caps = regex .captures(schema_name) .ok_or_else(|| Details::InvalidSchemaName(schema_name.to_string(), regex.as_str()))?; - Ok(caps - .name("name") - .expect("Regex has no group named `name`") - .start()) + caps.name("name") + .map(|m| m.start()) + .ok_or_else(|| Details::InvalidSchemaName(schema_name.to_string(), regex.as_str()).into()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@avro/src/validator.rs` around lines 95 - 103, The validate method currently panics if a custom regex matches but lacks the "name" capture; change it to return an AvroResult error instead of calling expect. After obtaining caps from SchemaNameValidator::regex(self), check caps.name("name") for Some; if None return an appropriate Details::InvalidSchemaName (or create a new Details variant like InvalidSchemaNameCapture) with the regex.as_str() and schema_name to surface the misconfigured regex, otherwise proceed to use start() from the capture; update the validate function to propagate this error rather than panicking.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@avro/src/schema/parser.rs`:
- Around line 175-179: The current reserved-type guard erroneously rejects
qualified names like `foo.record` because it matches on
fully_qualified_name.name() alone; update the check in the match that returns
InvalidSchemaRecord (and the similar guards around lines 184-195) to only treat
a name as the reserved keyword when it is unqualified (i.e., when
fully_qualified_name.namespace().is_none()) and fully_qualified_name.name() is
"record" | "enum" | "fixed"; keep the normal lookup against input_schemas (used
later) unchanged so qualified references are allowed.
---
Nitpick comments:
In `@avro/src/validator.rs`:
- Around line 95-103: The validate method currently panics if a custom regex
matches but lacks the "name" capture; change it to return an AvroResult error
instead of calling expect. After obtaining caps from
SchemaNameValidator::regex(self), check caps.name("name") for Some; if None
return an appropriate Details::InvalidSchemaName (or create a new Details
variant like InvalidSchemaNameCapture) with the regex.as_str() and schema_name
to surface the misconfigured regex, otherwise proceed to use start() from the
capture; update the validate function to propagate this error rather than
panicking.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (27)
avro/src/decode.rsavro/src/encode.rsavro/src/reader/block.rsavro/src/reader/mod.rsavro/src/reader/single_object.rsavro/src/schema/mod.rsavro/src/schema/name.rsavro/src/schema/parser.rsavro/src/schema/record/field.rsavro/src/schema/resolve.rsavro/src/schema/union.rsavro/src/schema_compatibility.rsavro/src/schema_equality.rsavro/src/serde/derive.rsavro/src/serde/ser_schema/mod.rsavro/src/serde/with.rsavro/src/types.rsavro/src/util.rsavro/src/validator.rsavro/src/writer.rsavro/tests/avro_schema_component.rsavro/tests/get_record_fields.rsavro/tests/schema.rsavro/tests/serde_human_readable_true.rsavro/tests/validators.rsavro_derive/src/lib.rsavro_derive/tests/derive.rs
| match fully_qualified_name.name() { | ||
| "record" | "enum" | "fixed" => { | ||
| return Err(Details::InvalidSchemaRecord(name.to_string()).into()); | ||
| return Err( | ||
| Details::InvalidSchemaRecord(fully_qualified_name.name().to_string()).into(), | ||
| ); |
There was a problem hiding this comment.
Reserved-type guard now rejects valid qualified names ending in record|enum|fixed.
At Line 175, the check uses fully_qualified_name.name(), so references like foo.record are treated as invalid keywords before lookup in input_schemas (Line 184). This is a functional regression for valid named-type references.
🔧 Proposed fix
- // For good error reporting we add this check
- match fully_qualified_name.name() {
- "record" | "enum" | "fixed" => {
- return Err(
- Details::InvalidSchemaRecord(fully_qualified_name.name().to_string()).into(),
- );
- }
- _ => (),
- }
-
let value = self
.input_schemas
.remove(&fully_qualified_name)
// TODO make a better descriptive error message here that conveys that a named schema cannot be found
.ok_or_else(|| {
+ if matches!(name, "record" | "enum" | "fixed") {
+ return Details::InvalidSchemaRecord(name.to_string());
+ }
let full_name = fully_qualified_name.fullname(None);
if full_name == "bool" {
Details::ParsePrimitiveSimilar(full_name, "boolean")
} else {
Details::ParsePrimitive(full_name)
}
})?;Also applies to: 184-195
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/schema/parser.rs` around lines 175 - 179, The current reserved-type
guard erroneously rejects qualified names like `foo.record` because it matches
on fully_qualified_name.name() alone; update the check in the match that returns
InvalidSchemaRecord (and the similar guards around lines 184-195) to only treat
a name as the reserved keyword when it is unqualified (i.e., when
fully_qualified_name.namespace().is_none()) and fully_qualified_name.name() is
"record" | "enum" | "fixed"; keep the normal lookup against input_schemas (used
later) unchanged so qualified references are allowed.
PR Review: feat!: Rework Name to be more performantOverviewWell-motivated performance refactor that consolidates the Name struct from two separate strings into a single string with a byte offset, plus a new NamespaceRef type alias. This eliminates many heap allocations in hot paths. The approach is sound and the implementation is thorough. Issues1. invalid_empty_name() is a silent panic footgun Using index_of_name: usize::MAX as a sentinel means any call to .name(), .namespace(), or Debug::fmt() produces a hard-to-diagnose index-out-of-bounds panic. Even deriving Debug on a containing type like FixedSchema will silently trigger this in tests or error messages. The only current use is FixedSchema::copy_only_size(). Safer alternatives: make that field Option, use a private marker type, or add a debug_assert!() in name() and namespace() that fires with a descriptive message rather than a raw slice panic. 2. Test accesses private fields directly test_namespace_from_name_with_empty_value accesses name.namespace_and_name and name.index_of_name directly. Prefer testing through the public API (name.name(), name.namespace()) for resilience to future refactoring. 3. SchemaNameValidator::validate breaking change needs stronger docs The signature changed from a required method returning AvroResult to a default method returning AvroResult. Custom validator implementors will get compile errors - this deserves mention in the CHANGELOG. More critically, the new default implementation calls .expect(), panicking at runtime for custom regexes without a name capture group. Since SchemaNameValidator is part of the public API, this should return an AvroResult error rather than panic. 4. Silent behavioral change: removal of AVRO-3430 workaround Name::parse previously had a FIXME block that read the name from the nested type.name field as a workaround for AVRO-3430. The new code removes this without mention. While removing known-incorrect behavior is good, it should be called out explicitly in the PR description and CHANGELOG so users can audit their schemas. 5. Minor: fully_qualified_name doc example is noisier The required .into_owned() call is a usability regression in the doc example. A brief explanation of why Cow is returned would help readers understand the trade-off. Positive aspects
SummaryMain asks before merging:
|
🤖 Augment PR SummarySummary: Refactors schema naming to reduce allocations and speed up namespace handling by storing full names more compactly and borrowing namespaces where possible. Changes:
Technical Notes: This is a breaking change ( 🤖 Was this summary useful? React with 👍 or 👎 |
| name: String::new(), | ||
| namespace: None, | ||
| }, | ||
| name: Name::invalid_empty_name(), |
There was a problem hiding this comment.
Using Name::invalid_empty_name() here creates a FixedSchema that will panic if it’s ever formatted or serialized, since FixedSchema’s Debug/Serialize paths call Name::name()/namespace(). If copy_only_size() leaks into an error/logging/serialization path, that could turn a recoverable failure into a hard panic.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback: The Augment AI reviewer is correct! Using such special Name instance just for the need of FixedSchema::copy_only_size() is confusing and may lead to panics/application crash due to misusage. It would be good to rework this to a safer way that would never lead to a crash
| )) | ||
| Ok(caps | ||
| .name("name") | ||
| .expect("Regex has no group named `name`") |
There was a problem hiding this comment.
The default SchemaNameValidator::validate will panic via expect(...) if a custom validator overrides regex() but forgets to define a name capture group. Since this is user-extensible API, consider returning a regular validation error instead of panicking on misconfiguration.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Augment AI reviewer is correct! The validation should not lead to a panic but it should return an error and give the chance to the user application to handle it. Prevents an application crash due to invalid schema name.
| // doesn't have namespace and `enclosing_namespace` is `Some`. Having `AsRef<str>` allows | ||
| // skipping that allocation. | ||
| let name_ref = name.as_ref(); | ||
| let index_of_name = validate_schema_name(name_ref)?; |
There was a problem hiding this comment.
Because validate_schema_name is user-configurable and now returns a raw byte index, a buggy custom validator could return an out-of-bounds or non-char-boundary index and cause slicing panics later in Name::name()/namespace(). It may be worth defensively sanity-checking index_of_name against name_ref before storing it.
Severity: low
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Augment AI reviewer is correct! A user application may register a custom implementation of SchemaNameValidator that may return an index that could lead to an out of bounds error. It would be good to add a check that the index is within the boundaries and return an appropriate error here.
value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct! Using such special Name instance just for the need of FixedSchema::copy_only_size() is confusing and may lead to panics/application crash due to misusage. It would be good to rework this to a safer way that would never lead to a crash |
value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The validation should not lead to a panic but it should return an error and give the chance to the user application to handle it. Prevents an application crash due to invalid schema name. |
493: To review by AI