Skip to content

447: feat: Add posibility to append values to writer without validating#43

Open
martin-augment wants to merge 1 commit intomainfrom
pr-447-2026-01-27-22-09-41
Open

447: feat: Add posibility to append values to writer without validating#43
martin-augment wants to merge 1 commit intomainfrom
pr-447-2026-01-27-22-09-41

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

447: To review by AI

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the Avro Writer functionality by introducing new methods that allow users to append data without schema validation. This provides greater control over the writing process, potentially improving performance in scenarios where validation is handled externally or is not required. Concurrently, the existing append methods have been clarified to explicitly perform validation, maintaining data integrity for standard operations.

Highlights

  • Unvalidated Append Methods: Introduced unvalidated_append_value and unvalidated_append_value_ref to allow appending Avro Values to a Writer without schema validation, offering flexibility for advanced use cases where validation is handled externally or not required.
  • Explicit Validation in Existing Methods: The append and append_value_ref methods now explicitly perform schema validation, ensuring data integrity by default when using these standard write paths.
  • Code Refactoring: Simplified the internal writing logic by removing the write_value_ref_resolved helper function and integrating its validation and encoding steps directly into append_value_ref and the new unvalidated methods.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 27, 2026

Walkthrough

The Writer struct in avro/src/writer.rs has been refactored to separate validated and unvalidated write paths. The append method now delegates directly to append_value_ref without pre-writing headers. Two new public methods were added: unvalidated_append_value and unvalidated_append_value_ref, which bypass validation during encoding. The append_value_ref method was modified to perform per-value schema validation and return ValidationWithReason errors on validation failure. The internal helper write_value_ref_resolved was removed, and a test was added to validate unvalidated append behavior and error messaging. The net change accounts for 80 lines added and 29 lines removed.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Jan 27, 2026

PR Review: Add unvalidated append methods to Writer

Summary

This PR adds unvalidated_append_value and unvalidated_append_value_ref methods to allow writing values without schema validation, providing a performance optimization for scenarios where validation has already been performed externally.

Code Quality & Best Practices ✅

Strengths:

  • Good separation of concerns: validated methods now call the unvalidated ones after validation
  • Consistent API design: both value and value_ref variants provided
  • Comprehensive documentation with clear warnings about data corruption risks
  • Follows existing code patterns and style conventions

Concerns:

  1. Bug in append method (avro/src/writer.rs:182-185): The refactored append method no longer calls maybe_write_header(). This was previously called before validation, and now the header writing is deferred to unvalidated_append_value_ref. While this still works, it changes the behavior subtly - the header write now happens AFTER validation instead of before. This could be intentional but should be verified.

  2. Documentation inconsistency (avro/src/writer.rs:214, 230): The docs reference Writer::append_value and Writer::append_value_ref but should reference append and append_value_ref (no underscore in first method name based on line 182).

Performance Considerations ⚡

Positive:

  • Eliminating redundant validation is the right approach for performance-critical code
  • The refactoring actually reduces duplicate validation logic (removed write_value_ref_resolved)
  • Clear path for users who've already validated their data

Question:

  • What's the expected use case? If users are validating before calling this, they need to construct a ResolvedSchema themselves. Consider documenting the pattern:
// Example usage pattern could be helpful in docs
let value = ...; 
if value.validate(schema) {
    writer.unvalidated_append_value(value)?;
}

Security Concerns 🔒

Major concern - Insufficient safety documentation:

The methods correctly document that invalid data will cause corruption, but should be more explicit about the security implications:

  1. Data integrity: Corrupted Avro files can cause downstream systems to fail unpredictably
  2. Attack surface: If these methods are used with untrusted input, it creates an injection-like vulnerability where malformed data can be written
  3. Silent failures: Validation errors won't be caught, potentially masking bugs

Recommendations:

  • Consider adding # Safety documentation sections (even though these aren't unsafe in Rust terms)
  • Add examples of INCORRECT usage in docs
  • Consider whether these should have more restrictive names like append_value_unchecked_ref to signal danger
  • Document that Value::validate() should be used before calling these methods

Test Coverage ✅

Good:

  • Test covers both unvalidated_append_value and unvalidated_append_value_ref
  • Verifies that validation is actually skipped (corrupted data is written)
  • Confirms that normal append methods still validate correctly

Suggestions:

  1. Test name: avro_rs_310_append_unvalidated_value - the "310" is unclear. If this references an issue number, consider test_append_unvalidated_value_skips_validation
  2. Test assertion (avro/src/writer.rs:1842): The hardcoded buffer index check &buffer[buffer.len() - 18..buffer.len() - 16] is fragile. Consider:
    • Adding a comment explaining what these offsets represent
    • Or better: read back the corrupted file and verify it contains the wrong value
  3. Missing test cases:
    • Performance test showing the speedup (optional but valuable)
    • Test with valid data to ensure it still works correctly
    • Test with complex nested schemas

Additional Issues

  1. Return value inconsistency: append previously included header bytes in the return value (m + n), now it doesn't. The unvalidated_append_value_ref does include it. Verify this is intentional.

  2. Code duplication: unvalidated_append_value and unvalidated_append_value_ref have nearly identical docs. Consider using a doc macro or referencing one from the other.

  3. Missing from docs: Should document that encode_internal is being called directly, and what assumptions it makes.

Recommendations

Must fix:

  • Verify the header writing behavior change in append is intentional
  • Fix documentation references to method names

Should consider:

  • Enhance security warnings in documentation
  • Improve test coverage for edge cases
  • Add usage examples to documentation
  • Consider more explicit naming (e.g., append_unchecked_*)

Nice to have:

  • Add benchmark demonstrating the performance benefit
  • Document the common pattern for using these methods safely

Overall Assessment

This is a solid PR that addresses a legitimate performance concern. The code changes are clean and well-structured. The main areas for improvement are:

  1. Documentation completeness around security implications
  2. Test robustness and coverage
  3. Verification of the behavioral change in header writing

The feature itself is valuable for performance-critical applications that can validate data once and write many times.


Recommendation: Approve with minor revisions to documentation and test assertions.

🤖 Generated with Claude Code

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new functionality to append values to a Writer without performing schema validation, which can be useful for performance-critical scenarios where validation is handled upstream. The existing append and append_value_ref methods now delegate to these new unvalidated methods after performing validation, ensuring backward compatibility and a clear separation of concerns. A new test case has been added to cover the new functionality and verify the behavior of existing methods.

/// Append a compatible value to a `Writer`.
///
/// This function does **not** validate that the provided value matches the schema. If it does
/// not match, the file will contain corrupt data. Use [`Writer::append_value`] to have the
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation here refers to Writer::append_value, but this function does not exist. It should likely refer to Writer::append for the validated counterpart.

Suggested change
/// not match, the file will contain corrupt data. Use [`Writer::append_value`] to have the
/// not match, the file will contain corrupt data. Use [`Writer::append`] to have the

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:documentation; feedback:The Gemini AI reviewer is correct! There is no method with name append_value()! But for consistency append() should be deprecated and it should delegate to a new append_value(). This way all related methods (unvalidated_**, **_ref) will be consistently named.

/// Append a compatible value to a `Writer`.
///
/// This function does **not** validate that the provided value matches the schema. If it does
/// not match, the file will contain corrupt data. Use [`Writer::append_value_ref`] to have the
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous comment, the documentation refers to Writer::append_value_ref which is correct, but it also mentions Writer::append_value which does not exist. It should refer to Writer::append if the intent is to mention the top-level validated append function.

Suggested change
/// not match, the file will contain corrupt data. Use [`Writer::append_value_ref`] to have the
/// not match, the file will contain corrupt data. Use [`Writer::append_value_ref`] to have the

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@avro/src/writer.rs`:
- Around line 210-224: The docstring for unvalidated_append_value incorrectly
references Writer::append_value; update the documentation to reference the
actual validated APIs (e.g., Writer::append and/or Writer::append_value_ref) and
any other relevant symbols mentioned (Value::validate, Writer::flush,
Writer::unvalidated_append_value_ref) so callers are not misled — replace the
incorrect `Writer::append_value` occurrences with `Writer::append` (or note
`Writer::append_value_ref`) and ensure the cross-links point to the real public
methods.

Comment on lines +210 to +224
/// Append a compatible value to a `Writer`.
///
/// This function does **not** validate that the provided value matches the schema. If it does
/// not match, the file will contain corrupt data. Use [`Writer::append_value`] to have the
/// value validated during write or use [`Value::validate`] to validate the value.
///
/// Returns the number of bytes written (it might be 0, see below).
///
/// **NOTE**: This function is not guaranteed to perform any actual write, since it relies on
/// internal buffering for performance reasons. If you want to be sure the value has been
/// written, then call [`flush`](Writer::flush).
pub fn unvalidated_append_value<T: Into<Value>>(&mut self, value: T) -> AvroResult<usize> {
let value = value.into();
self.unvalidated_append_value_ref(&value)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix doc reference to the validated API.

The docs mention Writer::append_value, but the public validated method is Writer::append (or Writer::append_value_ref). This will mislead callers.

✏️ Proposed doc fix
-    /// not match, the file will contain corrupt data. Use [`Writer::append_value`] to have the
+    /// not match, the file will contain corrupt data. Use [`Writer::append`] (or
+    /// [`Writer::append_value_ref`]) to have the
🤖 Prompt for AI Agents
In `@avro/src/writer.rs` around lines 210 - 224, The docstring for
unvalidated_append_value incorrectly references Writer::append_value; update the
documentation to reference the actual validated APIs (e.g., Writer::append
and/or Writer::append_value_ref) and any other relevant symbols mentioned
(Value::validate, Writer::flush, Writer::unvalidated_append_value_ref) so
callers are not misled — replace the incorrect `Writer::append_value`
occurrences with `Writer::append` (or note `Writer::append_value_ref`) and
ensure the cross-links point to the real public methods.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:documentation; feedback:The CodeRabbit AI reviewer is correct! There is no method with name append_value()! But for consistency append() should be deprecated and it should delegate to a new append_value(). This way all related methods (unvalidated_**, **_ref) will be consistently named.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Jan 27, 2026

🤖 Augment PR Summary

Summary: Adds an opt-in way to append values to an Avro Writer without schema validation.

Changes:

  • Moves schema validation into append_value_ref and delegates encoding to a new unvalidated path
  • Adds unvalidated_append_value and unvalidated_append_value_ref to bypass validation and write directly via encode_internal
  • Simplifies append to delegate to append_value_ref and removes the now-unused helper that combined validation+encoding
  • Adds a regression test covering unvalidated writes and validating that the normal append path still errors on schema mismatch

Technical Notes: The new unvalidated APIs can produce corrupt Avro data if values don’t match the schema; validated appends now avoid writing headers when validation fails.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

/// Append a compatible value to a `Writer`.
///
/// This function does **not** validate that the provided value matches the schema. If it does
/// not match, the file will contain corrupt data. Use [`Writer::append_value`] to have the
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs reference [Writer::append_value], but there doesn't appear to be an append_value method on Writer (likely meant Writer::append). As written this looks like a broken intra-doc link and could confuse users.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:documentation; feedback:The Augment AI reviewer is correct! There is no method with name append_value()! But for consistency append() should be deprecated and it should delegate to a new append_value(). This way all related methods (unvalidated_**, **_ref) will be consistently named.

@martin-augment
Copy link
Copy Markdown
Owner Author

2. Documentation inconsistency (avro/src/writer.rs:214, 230): The docs reference Writer::append_value and Writer::append_value_ref but should reference append and append_value_ref (no underscore in first method name based on line 182).

value:useful; category:documentation; feedback:The Claude AI reviewer is correct! There is no method with name append_value()! But for consistency append() should be deprecated and it should delegate to a new append_value(). This way all related methods (unvalidated_**, **_ref) will be consistently named.

@martin-augment
Copy link
Copy Markdown
Owner Author

  1. Bug in append method (avro/src/writer.rs:182-185): The refactored append method no longer calls maybe_write_header(). This was previously called before validation, and now the header writing is deferred to unvalidated_append_value_ref. While this still works, it changes the behavior subtly - the header write now happens AFTER validation instead of before. This could be intentional but should be verified.

value:incorrect-but-reasonable; category:bug; feedback:The Claude AI reviewer is not correct! The proposed changes are better than before because now it first validates the passed data and writes the header only if the data is valid. Prevents writing the header information in the writer if the following data is corrupt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants