471: feat: Use HashSet instead of HashMap for AvroSchemaComponent#51
471: feat: Use HashSet instead of HashMap for AvroSchemaComponent#51martin-augment wants to merge 1 commit intomainfrom
HashSet instead of HashMap for AvroSchemaComponent#51Conversation
WalkthroughThe changes refactor the schema tracking mechanism by replacing ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the internal mechanism for tracking named schemas within the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
PR Review: Use
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
avro/src/serde/derive.rs (1)
310-335:⚠️ Potential issue | 🟠 MajorPublic trait signature change — semver-breaking change without major version bump.
The
AvroSchemaComponenttrait is public and its method signatures changed from&mut Names(HashMap<Name, Schema>) to&mut HashSet<Name>. This is a semver-breaking change for any external crate implementing this trait. However, the version was bumped to 0.22.0 (minor), which violates semantic versioning for breaking API changes—a major version bump is required. Additionally, the CHANGELOG contains no documentation of this breaking change or migration guidance for external implementors.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@avro/src/serde/derive.rs` around lines 310 - 335, The public trait AvroSchemaComponent broke its external API by changing parameter types from &mut Names (HashMap<Name, Schema>) to &mut HashSet<Name> for get_schema_in_ctxt and get_record_fields_in_ctxt; restore backward compatibility by reintroducing the original method signatures (using &mut Names / HashMap<Name, Schema>) as either the primary methods or as deprecated wrappers that forward to the new HashSet-based implementations (e.g., add deprecated get_schema_in_ctxt(&mut Names, enclosing_namespace: &Namespace) and get_record_fields_in_ctxt(first_field_position, &mut Names, enclosing_namespace) that convert/derive the needed HashSet<Name> and call the new impl), update CHANGELOG with migration guidance, and ensure the package version is bumped to a major version if you keep the breaking change instead of restoring compatibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@avro/src/serde/derive.rs`:
- Around line 310-335: The public trait AvroSchemaComponent broke its external
API by changing parameter types from &mut Names (HashMap<Name, Schema>) to &mut
HashSet<Name> for get_schema_in_ctxt and get_record_fields_in_ctxt; restore
backward compatibility by reintroducing the original method signatures (using
&mut Names / HashMap<Name, Schema>) as either the primary methods or as
deprecated wrappers that forward to the new HashSet-based implementations (e.g.,
add deprecated get_schema_in_ctxt(&mut Names, enclosing_namespace: &Namespace)
and get_record_fields_in_ctxt(first_field_position, &mut Names,
enclosing_namespace) that convert/derive the needed HashSet<Name> and call the
new impl), update CHANGELOG with migration guidance, and ensure the package
version is bumped to a major version if you keep the breaking change instead of
restoring compatibility.
🤖 Augment PR SummarySummary: This PR changes the schema-derivation context for Changes:
Technical Notes: The context now represents “names already defined/seen” for reference emission and recursion prevention, rather than a map used to cache/lookup full schema definitions. 🤖 Was this summary useful? React with 👍 or 👎 |
| "Name '{name}' should exist in `named_schemas` otherwise Ref is invalid: {named_schemas:?}" | ||
| ); | ||
| // Get the schema | ||
| let schema = schema_fn(named_schemas, enclosing_namespace); |
There was a problem hiding this comment.
In this Schema::Ref branch, calling schema_fn after temporarily removing name can add additional entries to named_schemas (since schema_fn may insert other names while rebuilding the schema). Is that intended given the helper’s goal of not “polluting” named_schemas / avoiding duplicate-name side effects?
Severity: low
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:good-but-wont-fix; category:bug; feedback: The Augment AI reviewer is correct! The Name of the Schema is inserted twice now but the first insertion happens in the default implementation of the trait method and there is no guarantee that the application developer will not provide a custom implementation of this method. It is better to be on the safe side and call the .insert() method twice than to leave this responsibility to the application developer and receive bug reports when they forget to do it.
There was a problem hiding this comment.
Code Review
This pull request refactors the AvroSchemaComponent trait and its implementations to use HashSet<Name> instead of HashMap<Name, Schema> for tracking named schemas during schema generation. This change is a significant improvement, as it employs a more semantically appropriate data structure for tracking seen schema names to handle recursive types. The value part of the HashMap was not essential for this purpose, and using a HashSet makes the intent clearer. The changes are consistently applied across the avro and avro_derive crates, including trait definitions, implementations for various types, derive macros, and tests. This refactoring also leads to cleaner code, for instance by removing #[expect(clippy::map_entry)] attributes. The implementation appears correct and well-executed.
value:annoying; category:bug; feedback: The CodeRabbit AI reviewer is not correct! The apache-avro crate has not reached v1.0.0 yet, so such API breaks are allowed without waiting for a new major release. |
value:annoying; category:bug; feedback: The Claude AI reviewer is not correct! The apache-avro crate has not reached v1.0.0 yet, so such API breaks are allowed without waiting for a new major release. |
value:good-but-wont-fix; category:bug; feedback: The Claude AI reviewer is correct! The Name of the Schema is inserted twice now but the first insertion happens in the default implementation of the trait method and there is no guarantee that the application developer will not provide a custom implementation of this method. It is better to be on the safe side and call the .insert() method twice than to leave this responsibility to the application developer and receive bug reports when they forget to do it. |
471: To review by AI