Skip to content

469: feat: Add support for resetting a Writer back by a clearable buffer#49

Open
martin-augment wants to merge 1 commit intomainfrom
pr-469-2026-02-19-07-22-30
Open

469: feat: Add support for resetting a Writer back by a clearable buffer#49
martin-augment wants to merge 1 commit intomainfrom
pr-469-2026-02-19-07-22-30

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

469: To review by AI

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the Writer component by enabling its efficient reuse. By adding a Clearable trait and a reset method, users can now clear the writer's internal state and underlying buffer without needing to re-instantiate it, which can improve performance and resource management when writing multiple Avro data blocks sequentially.

Highlights

  • New Clearable trait: Introduced a Clearable trait to define a contract for buffers that can be reset, with an implementation for Vec<u8>.
  • Writer::reset method: Added a reset method to the Writer struct, allowing the underlying buffer, internal state, and user metadata to be cleared for reuse.
  • Exported Clearable: The new Clearable trait is now publicly exported from the avro crate.
  • Test coverage: Included a new test case to verify the correct functionality of the Writer::reset method.
Changelog
  • avro/src/lib.rs
    • Exported the new Clearable trait.
  • avro/src/writer.rs
    • Defined the Clearable trait for types that can be reset.
    • Implemented Clearable for Vec<u8>.
    • Added the reset method to Writer, which clears the buffer, internal state, and metadata.
    • Included an example for Writer::reset usage.
    • Added a unit test to validate the reset functionality.
Activity
  • The pull request was created and is awaiting review.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 19, 2026

Walkthrough

The changes introduce a new Clearable trait that defines a method to reset buffers while preserving their capacity. An implementation of this trait is provided for Vec<u8>. The Writer struct is extended with a public reset method that clears its internal buffer, underlying writer, header flag, value count, and metadata. The new Clearable trait is exported from the crate's public API through the re-export list in lib.rs. The Drop implementation is adjusted to handle cleanup operations as no-ops when errors occur.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-469-2026-02-19-07-22-30

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a reset method for the Writer, allowing it to be reused without creating a new instance. This is achieved by adding a Clearable trait. The changes include the trait definition, an implementation for Vec<u8>, the reset method on Writer, and a corresponding test case.

My review has identified a critical issue: the implementation of Clearable for Vec<u8> contains a recursive call that will cause a stack overflow. I've also pointed out the use of a deprecated method in a new documentation example. Addressing these points will improve the correctness and quality of the code.

Comment on lines +552 to +554
fn clear(&mut self) {
self.clear();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The clear method in the impl Clearable for Vec<u8> is calling itself, which will lead to infinite recursion and a stack overflow. You should call the clear method from Vec<u8> instead to avoid this.

Suggested change
fn clear(&mut self) {
self.clear();
}
fn clear(&mut self) {
Vec::clear(self);
}

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback: The Gemini AI reviewer is not correct! The code works fine because the Rust compiler prefers inherent methods over trait methods. The code does look confusing, so using Vec::clear(self) would improve the maintainablity!

///
/// // Write some values again
/// for value in values {
/// writer.append(value)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example for the new reset method uses writer.append(), which is deprecated. Please use writer.append_value() for consistency with the rest of the example and to avoid promoting the use of deprecated APIs.

Suggested change
/// writer.append(value)?;
/// writer.append_value(value)?;

Copy link
Copy Markdown
Owner Author

@martin-augment martin-augment Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! The Writer::append() method is deprecated, so the docstring should use its replacement - Writer::append_value().

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Feb 19, 2026

🤖 Augment PR Summary

Summary: Adds support for resetting an Avro Writer when the underlying output is a clearable buffer.
Key changes:

  • Introduces a public Clearable trait (with a Vec<u8> implementation) and re-exports it from the crate root
  • Adds Writer::reset() for W: Write + Clearable to clear the output, internal buffer, header state, value count, and user metadata
  • Adds a regression test to ensure the same bytes are produced after a reset/rewrite cycle

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

///
/// // Write some values again
/// for value in values {
/// writer.append(value)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example uses writer.append(value)?;, but append is deprecated in favor of append_value (which the snippet already uses earlier). Consider using append_value here as well to keep the example consistent and avoid deprecation warnings in doctests.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

@martin-augment martin-augment Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! The Writer::append() method is deprecated, so the docstring should use its replacement - Writer::append_value().

@claude
Copy link
Copy Markdown

claude bot commented Feb 19, 2026

Code Review: Add Clearable trait and Writer::reset()

Overall this is a useful feature — being able to reset and reuse a Writer without re-allocation is a common pattern. The design is mostly sound, but there are a few issues worth addressing before merging.


Issues

1. Silent data loss on reset() — missing documentation warning (medium)

If reset() is called while there is unflushed data in buffer (i.e., between append_* calls and flush()), that data is silently discarded. The current doc comment does not warn about this.

pub fn reset(&mut self) {
    self.buffer.clear();   // unflushed records are silently dropped
    self.writer.clear();   // already-flushed-but-not-consumed bytes are dropped
    ...
}

Suggestion: Add a doc warning:

/// # Warning
/// Any data that has been appended but not yet flushed will be silently
/// discarded. Call [`flush`](Writer::flush) before `reset()` if you need
/// to preserve in-flight records.

2. marker is not regenerated on reset (medium)

After reset(), has_header is set to false so the next write emits a fresh Avro header — but self.marker is reused unchanged. The Avro spec sync marker is embedded in every block and in the header, and it is the primary mechanism consumers use to identify block boundaries. If downstream readers treat each reset-flush cycle as a separate, independent Avro file, reusing the same marker is technically incorrect and could confuse readers.

Suggestion: Either:

  • Regenerate the marker in reset(): self.marker = generate_sync_marker();
  • Or explicitly document that resets produce a continuation of the same logical Avro stream (same schema + same marker), and that the two halves cannot be read independently.

3. impl Clearable for Vec<u8> looks like infinite recursion (low)

impl Clearable for Vec<u8> {
    fn clear(&mut self) {
        self.clear()   // calls Vec::clear() (inherent method), not Clearable::clear()
    }
}

This is correct — Rust's method resolution favours inherent methods over trait methods — but it is visually alarming and has caused confusion in other codebases. A clearer spelling avoids any doubt:

impl Clearable for Vec<u8> {
    fn clear(&mut self) {
        Vec::clear(self)
    }
}

4. Doc example inconsistently uses append and append_value (low)

The reset() doc example uses append_value in the first loop and append in the second loop without explanation:

// first loop
writer.append_value(value)?;

// second loop (after reset)
writer.append(value)?;

Both methods exist, but they have different semantics (append_value validates against the schema; append does not). Using different methods in the same example without comment may mislead readers into thinking the reset changes which append variant should be used. Pick one and use it consistently (or add a comment explaining the difference).


5. Clearable trait doc overpromises (low)

The trait-level doc says:

/// A buffer that can be cleared.
pub trait Clearable {
    /// Clear the buffer, keeping the capacity.
    fn clear(&mut self);
}

"Keeping the capacity" is a property of Vec::clear() specifically — it is not a contract that the trait enforces. A future implementor (e.g., wrapping a BufWriter) has no obligation to preserve capacity. Either:

  • Move the capacity note to the Vec<u8> impl, or
  • Strengthen the trait doc to clearly state this is advisory, not required.

Test Coverage

The test avro_rs_469_reset_writer covers the core happy path well (write → flush → reset → write again → same bytes). Consider adding:

  • Reset with unflushed data: call reset() without a prior flush() and verify no panic and correct behaviour afterwards.
  • Metadata is cleared: call add_user_metadata(...), reset, and verify the metadata does not appear in the second file's header.
  • Marker consistency / independence: verify the sync marker in the header is (or is not, depending on the design decision above) the same across resets.

Minor Nit

The re-export in lib.rs is correctly alphabetised — no issue there.


In summary: the two medium-severity items (data loss documentation and marker regeneration) should be resolved before merge. The rest are low-priority polish items.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
avro/src/writer.rs (1)

558-596: reset() doc should note: (a) unflushed data is silently discarded, and (b) the sync marker is preserved.

Both are design choices that could surprise callers:

  • If reset() is called without a prior flush(), any buffered but un-flushed records are silently dropped. The example already shows the correct pattern (flush before reset), but a doc-comment # Panics / note on data loss would make it explicit.
  • After reset(), subsequent writes reuse the same self.marker. Two consecutive "sessions" therefore produce Avro files with identical sync markers. This is fine for buffer-reuse scenarios but could cause confusion if the output is treated as independent Avro files.
📝 Suggested documentation addition
     /// Reset the writer.
     ///
     /// This will clear the underlying writer and the internal buffer.
     /// It will also clear any user metadata added.
     ///
+    /// # Data loss
+    /// Any values appended since the last [`flush`](Writer::flush) call will be discarded.
+    /// Callers should [`flush`](Writer::flush) before calling `reset` if the buffered data
+    /// needs to be preserved.
+    ///
+    /// # Sync marker
+    /// The sync marker is **not** regenerated; the next write cycle will produce an Avro file
+    /// with the same marker as the previous one.
+    ///
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/writer.rs` around lines 558 - 596, Update the doc-comment for the
Writer::reset method to explicitly state two behaviors: (1) any in-memory
buffered records that have not been flushed are silently discarded when reset is
called (so callers should flush() before reset() if they need to preserve data),
and (2) the existing sync marker (self.marker) is preserved across reset calls
so subsequent writes reuse the same sync marker (i.e., two sessions from the
same Writer may produce Avro files with identical sync markers). Mention these
as a brief note or under a “Panics/Notes” section in the reset() doc so callers
are aware of data-loss and marker-preservation semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@avro/src/writer.rs`:
- Line 585: The doc example currently calls the deprecated Writer::append;
update the example to call Writer::append_value instead to avoid deprecation
warnings. Locate the doc snippet that shows writer.append(value)? and replace
that invocation with writer.append_value(value)? (keeping the same error
handling and context), and ensure any imports or trait bounds referenced in the
example still apply to append_value.
- Around line 551-554: In the impl Clearable for Vec<u8> the call self.clear()
is ambiguous and visually looks recursive; change it to invoke the Vec inherent
method explicitly by replacing self.clear() with the fully-qualified inherent
call <Vec<u8>>::clear(self) inside the impl Clearable for Vec<u8>::fn clear to
make the intent unambiguous and avoid potential recursion hazards.

---

Nitpick comments:
In `@avro/src/writer.rs`:
- Around line 558-596: Update the doc-comment for the Writer::reset method to
explicitly state two behaviors: (1) any in-memory buffered records that have not
been flushed are silently discarded when reset is called (so callers should
flush() before reset() if they need to preserve data), and (2) the existing sync
marker (self.marker) is preserved across reset calls so subsequent writes reuse
the same sync marker (i.e., two sessions from the same Writer may produce Avro
files with identical sync markers). Mention these as a brief note or under a
“Panics/Notes” section in the reset() doc so callers are aware of data-loss and
marker-preservation semantics.

Comment on lines +551 to +554
impl Clearable for Vec<u8> {
fn clear(&mut self) {
self.clear();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

self.clear() in impl Clearable for Vec<u8> looks like infinite recursion.

While not actually recursive — Rust's method resolution gives inherent methods priority over trait methods, so self.clear() dispatches to Vec::clear() — this is visually indistinguishable from a recursive call and is a maintenance hazard: any future type that implements Clearable with a same-named inherent method removed, or a naive copy of this pattern for a type without an inherent clear, will recurse infinitely.

Use the fully-qualified inherent call to make the intent unambiguous:

🐛 Proposed fix
 impl Clearable for Vec<u8> {
     fn clear(&mut self) {
-        self.clear();
+        Vec::clear(self);
     }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
impl Clearable for Vec<u8> {
fn clear(&mut self) {
self.clear();
}
impl Clearable for Vec<u8> {
fn clear(&mut self) {
Vec::clear(self);
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/writer.rs` around lines 551 - 554, In the impl Clearable for Vec<u8>
the call self.clear() is ambiguous and visually looks recursive; change it to
invoke the Vec inherent method explicitly by replacing self.clear() with the
fully-qualified inherent call <Vec<u8>>::clear(self) inside the impl Clearable
for Vec<u8>::fn clear to make the intent unambiguous and avoid potential
recursion hazards.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The CodeRabbit AI reviewer is correct! The code does look confusing, so using Vec::clear(self) would improve the maintainability!

///
/// // Write some values again
/// for value in values {
/// writer.append(value)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Doc example calls deprecated writer.append() — use writer.append_value() instead.

Writer::append has been deprecated since 0.22.0 in favour of append_value. Using it in a doc test will produce a deprecation warning (potentially an error if the crate's lint config escalates it).

📝 Proposed fix
-    ///     writer.append(value)?;
+    ///     writer.append_value(value)?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// writer.append(value)?;
/// writer.append_value(value)?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/writer.rs` at line 585, The doc example currently calls the
deprecated Writer::append; update the example to call Writer::append_value
instead to avoid deprecation warnings. Locate the doc snippet that shows
writer.append(value)? and replace that invocation with
writer.append_value(value)? (keeping the same error handling and context), and
ensure any imports or trait bounds referenced in the example still apply to
append_value.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The CodeRabbit AI reviewer is correct! The Writer::append() method is deprecated, so the docstring should use its replacement - Writer::append_value().

@martin-augment
Copy link
Copy Markdown
Owner Author

3. impl Clearable for Vec<u8> looks like infinite recursion (low)

impl Clearable for Vec<u8> {
    fn clear(&mut self) {
        self.clear()   // calls Vec::clear() (inherent method), not Clearable::clear()
    }
}

This is correct — Rust's method resolution favours inherent methods over trait methods — but it is visually alarming and has caused confusion in other codebases. A clearer spelling avoids any doubt:

impl Clearable for Vec<u8> {
    fn clear(&mut self) {
        Vec::clear(self)
    }
}

value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct! The code does look confusing, so using Vec::clear(self) would improve the maintainability!

@martin-augment
Copy link
Copy Markdown
Owner Author

1. Silent data loss on reset() — missing documentation warning (medium)

If reset() is called while there is unflushed data in buffer (i.e., between append_* calls and flush()), that data is silently discarded. The current doc comment does not warn about this.

pub fn reset(&mut self) {
    self.buffer.clear();   // unflushed records are silently dropped
    self.writer.clear();   // already-flushed-but-not-consumed bytes are dropped
    ...
}

Suggestion: Add a doc warning:

/// # Warning
/// Any data that has been appended but not yet flushed will be silently
/// discarded. Call [`flush`](Writer::flush) before `reset()` if you need
/// to preserve in-flight records.

value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct! The documentation of the new reset() method should mention that it will delete any non-flushed data, so it should be used with caution.

@martin-augment
Copy link
Copy Markdown
Owner Author

martin-augment commented Feb 22, 2026

4. Doc example inconsistently uses append and append_value (low)

The reset() doc example uses append_value in the first loop and append in the second loop without explanation:

// first loop
writer.append_value(value)?;

// second loop (after reset)
writer.append(value)?;

Both methods exist, but they have different semantics (append_value validates against the schema; append does not). Using different methods in the same example without comment may mislead readers into thinking the reset changes which append variant should be used. Pick one and use it consistently (or add a comment explaining the difference).

value:good-to-have; category:documentation; feedback: The Claude AI reviewer is correct! The Writer::append() method is deprecated, so the docstring should use its replacement - Writer::append_value().

@martin-augment
Copy link
Copy Markdown
Owner Author

5. Clearable trait doc overpromises (low)

The trait-level doc says:

/// A buffer that can be cleared.
pub trait Clearable {
    /// Clear the buffer, keeping the capacity.
    fn clear(&mut self);
}

"Keeping the capacity" is a property of Vec::clear() specifically — it is not a contract that the trait enforces. A future implementor (e.g., wrapping a BufWriter) has no obligation to preserve capacity. Either:

  • Move the capacity note to the Vec<u8> impl, or
  • Strengthen the trait doc to clearly state this is advisory, not required.

value:good-to-have; category:documentation; feedback: The Claude AI reviewer is correct! The capacity is not important/required by the Clearable trait, so removing this part of the documentation would make it more clear for any future implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants