Skip to content

445: feat!: Rework SpecificSingleObjectWriter#42

Open
martin-augment wants to merge 2 commits intomainfrom
pr-445-2026-01-27-15-42-44
Open

445: feat!: Rework SpecificSingleObjectWriter#42
martin-augment wants to merge 2 commits intomainfrom
pr-445-2026-01-27-15-42-44

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

445: To review by AI

`ResolvedOwnedSchema` now uses a self-referential struct when resolving the schema, so it does not need to clone all named schemas contained in it.

This also fixes a bug in `reader::Block::read_writer_schema` where it unnecessarily parses the schemata twice.
It now resolves the schema and caches it. It also no longer uses `GenericSingleObjectWriter` for the header and does it directly. This removes the need for a buffer.

This is a breaking change:
- `SpecificSingleObjectWriter::with_capcity` has been removed.
- `write_avro_datum_ref` now also takes a `names: &NamesRef` argument
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a substantial refactoring of the SpecificSingleObjectWriter component, aiming to enhance its performance and simplify its internal architecture. By adopting a self-referencing schema resolution pattern, the changes reduce memory overhead and streamline the process of writing single Avro objects. The update also modernizes the writer's API, making it more intuitive for consumers.

Highlights

  • Reworked SpecificSingleObjectWriter: The SpecificSingleObjectWriter has been significantly refactored to improve its internal design and efficiency. It now directly manages a ResolvedOwnedSchema and the Avro header, simplifying its usage and removing the need for an internal GenericSingleObjectWriter.
  • Optimized Schema Resolution with Ouroboros: The ResolvedOwnedSchema now leverages the ouroboros crate to create a self-referencing structure. This change eliminates unnecessary cloning of schemas during resolution, leading to more efficient memory usage and potentially better performance.
  • Simplified Writer API: The SpecificSingleObjectWriter API has been streamlined. The with_capacity constructor has been replaced by a simpler new() method, and a new_with_header_builder() method has been added for custom header generation. The write_value, write_ref, and write methods now operate on &self instead of &mut self, indicating that the writer itself is immutable after creation.
  • Enhanced Schema Resolution Functions: The resolve_names and resolve_names_with_schemata functions have been updated to accept known_schemata as an argument. This allows for resolving schemas against a set of external schemas without adding them to the current resolution context, providing more flexibility.
  • Dependency Updates: New dependencies, including ouroboros, aliasable, heck (version 0.4.1), proc-macro2-diagnostics, and static_assertions, have been added to support the new self-referencing schema resolution mechanism.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 27, 2026

Walkthrough

This pull request refactors the Avro schema resolution and single-object writer implementations. The changes introduce the ouroboros dependency to support self-referencing data structures. The ResolvedOwnedSchema is restructured to use an inner self-referencing struct. Schema resolution functions are updated to accept and propagate a known_schemata parameter. The SpecificSingleObjectWriter is simplified with a new new() constructor replacing with_capacity(), internal structure reorganized to use ResolvedOwnedSchema, and write methods converted from &mut self to &self. Corresponding updates are made to examples and tests.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Jan 27, 2026

Pull Request Review: #42 - Rework SpecificSingleObjectWriter

Summary

This PR refactors the SpecificSingleObjectWriter to eliminate the internal buffer and improve the API design. The changes also introduce the ouroboros crate to handle self-referential structures in ResolvedOwnedSchema.

Code Quality & Best Practices ✅

Strengths:

  1. Better API Design: Removing with_capacity() in favor of new() is cleaner and more intuitive. The writer no longer needs to maintain internal state between calls.

  2. Improved Immutability: Changing methods from &mut self to &self in SpecificSingleObjectWriter is excellent - makes the API safer and allows sharing the writer across threads more easily.

  3. Self-Referential Structure: Using ouroboros crate to handle the self-referential ResolvedOwnedSchema is a proper solution to avoid cloning schemas.

  4. Consistent Naming: The refactored code maintains consistent function signatures across write(), write_ref(), and write_value().

Potential Issues & Concerns 🔍

1. Breaking API Change ⚠️

The PR removes with_capacity() and changes method signatures from &mut self to &self.

Issue: This is marked as feat!: (breaking change), which is appropriate. However, consider:

  • Are there migration notes or changelog entries for users?
  • Should there be a deprecation period instead of immediate removal?

2. Performance Consideration 🐌

Previous design: Buffer allocated once, reused across calls
New design: Header is written on every call to write_ref()

Impact: The new design is cleaner but might have a small performance overhead for high-throughput scenarios. However, this is likely negligible since:

  • Headers are small (10-20 bytes)
  • The elimination of the internal buffer state is worth the tradeoff
  • Each message is independent (single-object encoding requirement)

3. Schema Resolution Changes 🔄

In resolve.rs, the refactoring changes from cloning schemas to storing references.

Concern: The lifetime management is more complex now. The changes look correct, but edge cases with nested schemas should be thoroughly tested.

4. New Dependency: ouroboros 📦

Adding ouroboros = "0.18.5" introduces a new dependency for self-referential structs.

Pros:

  • Eliminates the need for manual unsafe code
  • Well-tested crate for this specific use case

Cons:

  • Adds dependency weight (~3 transitive deps)
  • Generated code can be harder to debug

5. Reader.rs Changes 📖

Lines 166-180 in reader.rs show additional cloning that seems to reintroduce cloning that the PR aimed to eliminate.

Security Concerns 🔒

Assessment: No security vulnerabilities introduced. The ouroboros macro generates safe code, and there are no changes to input validation or credential handling.

Test Coverage 📊

Positive:

  • New test added: avro_rs_439_specific_single_object_writer_ref (tests recursive schema handling) ✅
  • All existing tests updated to use new API ✅

Gaps:

  1. No explicit performance benchmarks - Given the architectural change, benchmark comparisons would be valuable
  2. Error handling paths - Are all failure modes of the new builder pattern tested?
  3. Concurrent access - Since methods are now &self, consider adding tests for concurrent usage

Performance Considerations ⚡

Improvements:

  • Eliminates buffer management overhead
  • Reduces memory footprint (no persistent buffer per writer)
  • Avoids schema cloning in most cases

Potential Regressions:

  • Header written on every call (previously buffered)
  • Builder pattern may have overhead

Recommendation: Run benchmarks comparing before/after for high-throughput scenarios and schema resolution with deeply nested types.

Suggestions for Improvement 💡

  1. Documentation: Add migration guide in CHANGELOG showing the API changes

  2. Consider backward compatibility: Add deprecated with_capacity() as a no-op

  3. Reader cloning: Investigate if the cloning in reader.rs:180 can be eliminated

  4. Benchmarks: Add performance tests to ensure no regressions

  5. Error messages: Ensure error messages from the new builder are user-friendly

Conclusion

This is a solid refactoring that improves the API design and reduces complexity. The breaking changes are justified and well-executed.

Overall Assessment: 7.5/10 - Good refactoring with clear benefits, but needs documentation and performance validation.

Main recommendations:

  1. ✅ Approve with minor suggestions
  2. 📝 Add migration documentation
  3. 🔍 Review the reader.rs cloning necessity
  4. ⚡ Run performance benchmarks
  5. 📊 Consider additional test coverage for edge cases

Review generated by Claude Code

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ouroboros crate as a dependency and modifies the SpecificSingleObjectWriter to use ResolvedOwnedSchema for schema resolution. The with_capacity function is replaced with new and new_with_header_builder functions, and the write functions are updated to use the resolved schema and header. The changes also include updates to the resolve_names function to handle known schemata and use NamesRef.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Jan 27, 2026

🤖 Augment PR Summary

Summary: This PR reworks SpecificSingleObjectWriter to support schemas containing Schema::Ref (e.g., recursive types) without requiring expensive schema cloning.

Changes:

  • Replaces SpecificSingleObjectWriter::with_capacity with new() and new_with_header_builder(), caching the single-object header and a resolved, owned schema.
  • Introduces a self-referencing ResolvedOwnedSchema (via ouroboros) so the names table can borrow from the owned root schema.
  • Refactors schema name resolution APIs to work with NamesRef and support resolving against a set of “known schemata”.
  • Updates reader header parsing and writer serialization to thread the correct names map into encoding/decoding paths.
  • Updates examples/tests to the new writer API and adds a regression test covering recursive schema writing.

Technical Notes: Adds an ouroboros dependency and changes write_avro_datum_ref to accept an explicit names map for Schema::Ref resolution.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

&None,
&HashMap::new(),
)?;
self.names_refs = names.into_iter().map(|(n, s)| (n, s.clone())).collect();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

names_refs is a Names (owned Schema values), but here names is a NamesRef (&Schema values), so (n, s.clone()) looks like it would keep references rather than cloning the underlying Schema (and may not type-check). This matters because decode_internal later relies on names_refs to resolve Schema::Ref during decoding.

Other Locations
  • avro/src/reader.rs:242

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback:The Augment AI reviewer is not correct! There is no need to dereference a variable to clone it. Calling the method on the reference will still do the right thing and return an owned clone of that object.

/// if you don't know what you are doing, instead.
pub fn write_avro_datum_ref<T: Serialize, W: Write>(
schema: &Schema,
names: &NamesRef,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that write_avro_datum_ref takes an explicit names map, it seems worth documenting that callers must provide all named schemas needed to resolve any Schema::Ref (e.g., recursive types), otherwise serialization can fail at runtime. Without refs, callers can still pass an empty map.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback:The Augment AI reviewer is correct! The changes proposed in this Pull Request add a new parameter to the write_avro_datum_ref() function, so it has to be documented in the function's docstring. Prevents confusion in the developer trying to use this method and figuring out what this parameter is all about

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@avro/src/reader.rs`:
- Around line 228-243: When schemata is non-empty the code only fills names_refs
from the external schemata and skips named types declared inside the parsed
writer_schema; update the if-branch after calling
Schema::parse_with_names(&json, self.names_refs.clone())? to also call
resolve_names(&self.writer_schema, &mut names, &None, &HashMap::new())? (or a
merge variant) and then merge those discovered names into self.names_refs
(ensuring existing schemata entries are preserved or overridden as intended), so
resolve_names_with_schemata, self.writer_schema, names_refs, resolve_names and
Schema::parse_with_names are all involved in the merge step.

In `@avro/src/writer.rs`:
- Around line 633-671: Both write_value
(SpecificSingleObjectWriter::write_value) and write_ref
(SpecificSingleObjectWriter::write_ref) currently write self.header to the
writer before attempting validation/serialization, which can leave a partial
message on error; change each to first serialize/validate into a temporary
buffer (e.g., Vec<u8>) or validate the Value against self.resolved before
writing the header, using write_value_ref_owned_resolved and
write_avro_datum_ref to produce the payload in-memory, and only after successful
serialization write the header and then write the payload to the provided writer
so errors are returned without mutating the output stream.
🧹 Nitpick comments (1)
avro/src/writer.rs (1)

753-761: Document the new names requirement.

The doc comment should state that names must include all named types referenced by schema (e.g., from ResolvedSchema::get_names()), otherwise resolution can fail for refs.

✏️ Suggested doc tweak
 /// Write the referenced [Serialize]able object to the provided [Write] object.
 /// Returns a result with the number of bytes written.
+/// `names` must contain all named types referenced by `schema`
+/// (e.g., from `ResolvedSchema::get_names()`).

Comment on lines 228 to 243
if !self.schemata.is_empty() {
let rs = ResolvedSchema::try_from(self.schemata.clone())?;
let names: Names = rs
.get_names()
.iter()
.map(|(name, schema)| (name.clone(), (*schema).clone()))
.collect();
self.writer_schema = Schema::parse_with_names(&json, names)?;
let mut names = HashMap::new();
resolve_names_with_schemata(
self.schemata.iter().copied(),
&mut self.names_refs,
&mut names,
&None,
&HashMap::new(),
)?;
self.names_refs = names.into_iter().map(|(n, s)| (n, s.clone())).collect();
self.writer_schema = Schema::parse_with_names(&json, self.names_refs.clone())?;
} else {
self.writer_schema = Schema::parse(&json)?;
resolve_names(&self.writer_schema, &mut self.names_refs, &None)?;
let mut names = HashMap::new();
resolve_names(&self.writer_schema, &mut names, &None, &HashMap::new())?;
self.names_refs = names.into_iter().map(|(n, s)| (n, s.clone())).collect();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Writer schema names are not merged when schemata is provided.

In the non-empty schemata branch, names_refs is populated only from the provided schemata (Line 229+). Named types defined inside the writer schema itself won’t be added, so decode_internal can fail to resolve Schema::Ref (e.g., recursive records). Consider resolving names from self.writer_schema after parsing and merging them with the known schemata.

🛠️ Suggested merge of writer schema names
-            let mut names = HashMap::new();
+            let mut known = HashMap::new();
             resolve_names_with_schemata(
                 self.schemata.iter().copied(),
-                &mut names,
+                &mut known,
                 &None,
                 &HashMap::new(),
             )?;
-            self.names_refs = names.into_iter().map(|(n, s)| (n, s.clone())).collect();
-            self.writer_schema = Schema::parse_with_names(&json, self.names_refs.clone())?;
+            let mut names_refs: Names =
+                known.iter().map(|(n, s)| (n.clone(), (*s).clone())).collect();
+            self.writer_schema = Schema::parse_with_names(&json, names_refs.clone())?;
+            let mut writer_names = HashMap::new();
+            resolve_names(&self.writer_schema, &mut writer_names, &None, &known)?;
+            names_refs.extend(writer_names.into_iter().map(|(n, s)| (n, s.clone())));
+            self.names_refs = names_refs;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if !self.schemata.is_empty() {
let rs = ResolvedSchema::try_from(self.schemata.clone())?;
let names: Names = rs
.get_names()
.iter()
.map(|(name, schema)| (name.clone(), (*schema).clone()))
.collect();
self.writer_schema = Schema::parse_with_names(&json, names)?;
let mut names = HashMap::new();
resolve_names_with_schemata(
self.schemata.iter().copied(),
&mut self.names_refs,
&mut names,
&None,
&HashMap::new(),
)?;
self.names_refs = names.into_iter().map(|(n, s)| (n, s.clone())).collect();
self.writer_schema = Schema::parse_with_names(&json, self.names_refs.clone())?;
} else {
self.writer_schema = Schema::parse(&json)?;
resolve_names(&self.writer_schema, &mut self.names_refs, &None)?;
let mut names = HashMap::new();
resolve_names(&self.writer_schema, &mut names, &None, &HashMap::new())?;
self.names_refs = names.into_iter().map(|(n, s)| (n, s.clone())).collect();
}
if !self.schemata.is_empty() {
let mut known = HashMap::new();
resolve_names_with_schemata(
self.schemata.iter().copied(),
&mut known,
&None,
&HashMap::new(),
)?;
let mut names_refs: Names =
known.iter().map(|(n, s)| (n.clone(), (*s).clone())).collect();
self.writer_schema = Schema::parse_with_names(&json, names_refs.clone())?;
let mut writer_names = HashMap::new();
resolve_names(&self.writer_schema, &mut writer_names, &None, &known)?;
names_refs.extend(writer_names.into_iter().map(|(n, s)| (n, s.clone())));
self.names_refs = names_refs;
} else {
self.writer_schema = Schema::parse(&json)?;
let mut names = HashMap::new();
resolve_names(&self.writer_schema, &mut names, &None, &HashMap::new())?;
self.names_refs = names.into_iter().map(|(n, s)| (n, s.clone())).collect();
}
🤖 Prompt for AI Agents
In `@avro/src/reader.rs` around lines 228 - 243, When schemata is non-empty the
code only fills names_refs from the external schemata and skips named types
declared inside the parsed writer_schema; update the if-branch after calling
Schema::parse_with_names(&json, self.names_refs.clone())? to also call
resolve_names(&self.writer_schema, &mut names, &None, &HashMap::new())? (or a
merge variant) and then merge those discovered names into self.names_refs
(ensuring existing schemata entries are preserved or overridden as intended), so
resolve_names_with_schemata, self.writer_schema, names_refs, resolve_names and
Schema::parse_with_names are all involved in the merge step.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:incorrect-but-reasonable; category:bug; feedback:The CodeRabbit AI reviewer is not correct! The names of the parsed schemas are added to an internal data structure of the parser itself, so they are also visible to any schema that is parsed later.

Comment on lines +633 to +671
/// Write the value to the writer
///
/// Returns the number of bytes written.
///
/// Each call writes a complete single-object encoded message (header + data),
/// making each message independently decodable.
pub fn write_value<W: Write>(&self, data: T, writer: &mut W) -> AvroResult<usize> {
writer
.write_all(&self.header)
.map_err(Details::WriteBytes)?;
let value: Value = data.into();
let bytes = write_value_ref_owned_resolved(&self.resolved, &value, writer)?;
Ok(bytes + self.header.len())
}
}

impl<T> SpecificSingleObjectWriter<T>
where
T: AvroSchema + Serialize,
{
/// Write the referenced `Serialize` object to the provided `Write` object.
/// Write the object to the writer.
///
/// Returns the number of bytes written.
///
/// Each call writes a complete single-object encoded message (header + data),
/// making each message independently decodable.
pub fn write_ref<W: Write>(&mut self, data: &T, writer: &mut W) -> AvroResult<usize> {
// Always write the header for each message (single object encoding requires
// each message to be independently decodable)
pub fn write_ref<W: Write>(&self, data: &T, writer: &mut W) -> AvroResult<usize> {
writer
.write_all(self.inner.buffer.as_slice())
.write_all(&self.header)
.map_err(Details::WriteBytes)?;

let bytes_written =
self.inner.buffer.len() + write_avro_datum_ref(&self.schema, data, writer)?;
let bytes = write_avro_datum_ref(
self.resolved.get_root_schema(),
self.resolved.get_names(),
data,
writer,
)?;

Ok(bytes_written)
Ok(bytes + self.header.len())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid partial writes on validation/serialization failure.

Line 639/661 writes the header before validation/serialization. If validation fails (Value path) or serialization errors (serde path), the stream is left with a header-only or partially written message. Previously the buffer avoided this. Consider buffering per write (or at least validating before header for Value).

🛠️ One way to preserve atomic writes
 pub fn write_value<W: Write>(&self, data: T, writer: &mut W) -> AvroResult<usize> {
-    writer
-        .write_all(&self.header)
-        .map_err(Details::WriteBytes)?;
     let value: Value = data.into();
-    let bytes = write_value_ref_owned_resolved(&self.resolved, &value, writer)?;
-    Ok(bytes + self.header.len())
+    let mut buffer = Vec::new();
+    let bytes = write_value_ref_owned_resolved(&self.resolved, &value, &mut buffer)?;
+    writer.write_all(&self.header).map_err(Details::WriteBytes)?;
+    writer.write_all(&buffer).map_err(Details::WriteBytes)?;
+    Ok(bytes + self.header.len())
 }
 ...
 pub fn write_ref<W: Write>(&self, data: &T, writer: &mut W) -> AvroResult<usize> {
-    writer
-        .write_all(&self.header)
-        .map_err(Details::WriteBytes)?;
-
-    let bytes = write_avro_datum_ref(
-        self.resolved.get_root_schema(),
-        self.resolved.get_names(),
-        data,
-        writer,
-    )?;
+    let mut buffer = Vec::new();
+    let bytes = write_avro_datum_ref(
+        self.resolved.get_root_schema(),
+        self.resolved.get_names(),
+        data,
+        &mut buffer,
+    )?;
+    writer.write_all(&self.header).map_err(Details::WriteBytes)?;
+    writer.write_all(&buffer).map_err(Details::WriteBytes)?;
 
     Ok(bytes + self.header.len())
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Write the value to the writer
///
/// Returns the number of bytes written.
///
/// Each call writes a complete single-object encoded message (header + data),
/// making each message independently decodable.
pub fn write_value<W: Write>(&self, data: T, writer: &mut W) -> AvroResult<usize> {
writer
.write_all(&self.header)
.map_err(Details::WriteBytes)?;
let value: Value = data.into();
let bytes = write_value_ref_owned_resolved(&self.resolved, &value, writer)?;
Ok(bytes + self.header.len())
}
}
impl<T> SpecificSingleObjectWriter<T>
where
T: AvroSchema + Serialize,
{
/// Write the referenced `Serialize` object to the provided `Write` object.
/// Write the object to the writer.
///
/// Returns the number of bytes written.
///
/// Each call writes a complete single-object encoded message (header + data),
/// making each message independently decodable.
pub fn write_ref<W: Write>(&mut self, data: &T, writer: &mut W) -> AvroResult<usize> {
// Always write the header for each message (single object encoding requires
// each message to be independently decodable)
pub fn write_ref<W: Write>(&self, data: &T, writer: &mut W) -> AvroResult<usize> {
writer
.write_all(self.inner.buffer.as_slice())
.write_all(&self.header)
.map_err(Details::WriteBytes)?;
let bytes_written =
self.inner.buffer.len() + write_avro_datum_ref(&self.schema, data, writer)?;
let bytes = write_avro_datum_ref(
self.resolved.get_root_schema(),
self.resolved.get_names(),
data,
writer,
)?;
Ok(bytes_written)
Ok(bytes + self.header.len())
/// Write the value to the writer
///
/// Returns the number of bytes written.
///
/// Each call writes a complete single-object encoded message (header + data),
/// making each message independently decodable.
pub fn write_value<W: Write>(&self, data: T, writer: &mut W) -> AvroResult<usize> {
let value: Value = data.into();
let mut buffer = Vec::new();
let bytes = write_value_ref_owned_resolved(&self.resolved, &value, &mut buffer)?;
writer.write_all(&self.header).map_err(Details::WriteBytes)?;
writer.write_all(&buffer).map_err(Details::WriteBytes)?;
Ok(bytes + self.header.len())
}
}
impl<T> SpecificSingleObjectWriter<T>
where
T: AvroSchema + Serialize,
{
/// Write the object to the writer.
///
/// Returns the number of bytes written.
///
/// Each call writes a complete single-object encoded message (header + data),
/// making each message independently decodable.
pub fn write_ref<W: Write>(&self, data: &T, writer: &mut W) -> AvroResult<usize> {
let mut buffer = Vec::new();
let bytes = write_avro_datum_ref(
self.resolved.get_root_schema(),
self.resolved.get_names(),
data,
&mut buffer,
)?;
writer.write_all(&self.header).map_err(Details::WriteBytes)?;
writer.write_all(&buffer).map_err(Details::WriteBytes)?;
Ok(bytes + self.header.len())
}
🤖 Prompt for AI Agents
In `@avro/src/writer.rs` around lines 633 - 671, Both write_value
(SpecificSingleObjectWriter::write_value) and write_ref
(SpecificSingleObjectWriter::write_ref) currently write self.header to the
writer before attempting validation/serialization, which can leave a partial
message on error; change each to first serialize/validate into a temporary
buffer (e.g., Vec<u8>) or validate the Value against self.resolved before
writing the header, using write_value_ref_owned_resolved and
write_avro_datum_ref to produce the payload in-memory, and only after successful
serialization write the header and then write the payload to the provided writer
so errors are returned without mutating the output stream.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! The header bytes write could be postponed until the object is validated and serialized and then both could be written to make the operation more atomic. It could still fail while writing to the writer but there is nothing that could be done about this.

@martin-augment
Copy link
Copy Markdown
Owner Author

753-761: Document the new names requirement.

The doc comment should state that names must include all named types referenced by schema (e.g., from ResolvedSchema::get_names()), otherwise resolution can fail for refs.

value:good-to-have; category:documentation; feedback:The CodeRabbit AI reviewer is correct! The changes proposed in this Pull Request add a new parameter to the write_avro_datum_ref() function, so it has to be documented in the function's docstring. Prevents confusion in the developer trying to use this method and figuring out what this parameter is all about

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants