499: feat: Replace `to_datum*` functions with `GenericDatumWriter` by martin-augment · Pull Request #60 · martin-augment/avro-rs

martin-augment · 2026-03-04T06:09:13Z

499: To review by AI

coderabbitai · 2026-03-04T06:09:29Z

Walkthrough

This change refactors the Avro writing API by introducing a new public GenericDatumWriter<'s> struct with a builder-based construction pattern. The struct provides methods for serializing values to writers or byte vectors, including support for both regular values and serde-compatible types. Legacy functions (to_avro_datum, write_avro_datum_ref, to_avro_datum_schemata) are deprecated but maintained for backwards compatibility. The writer module is made public, and all internal usage across tests, benchmarks, and implementation files is migrated to the new builder pattern. New benchmark variants measure performance of the reused writer and no-validation approaches.

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-499-2026-03-04-06-09-10

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-04T06:09:34Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the Avro data writing mechanism by introducing a new GenericDatumWriter. This change aims to provide a more robust, efficient, and configurable API for serializing Avro values, moving away from the previously used standalone functions. The update also includes corresponding benchmark adjustments and deprecation notices for the older writing methods, ensuring a clear path for migration to the new, improved writer.

Highlights

New GenericDatumWriter Introduced: A new GenericDatumWriter struct has been added, providing a more flexible and configurable way to write Avro data with a builder pattern. This writer supports options like schema validation and reusing the writer instance for multiple values.
Deprecation of Old to_avro_datum Functions: The existing to_avro_datum, write_avro_datum_ref, and to_avro_datum_schemata functions have been marked as deprecated, with recommendations to use the new GenericDatumWriter instead.
Updated Benchmarks: The Avro benchmarks have been updated to utilize the new GenericDatumWriter and include new scenarios to measure performance with reused writers and with validation disabled.
Refactored Internal Writing Logic: Internal Avro writing logic has been refactored to centralize around the GenericDatumWriter, improving consistency and maintainability across the codebase.

Changelog

avro/benches/single.rs
- Updated benchmarks to use GenericDatumWriter for writing Avro records.
- Added new benchmark cases for GenericDatumWriter with writer reuse and disabled validation.
avro/src/lib.rs
- Removed direct to_avro_datum import.
- Moved writer module declaration to a public section.
- Added #[expect(deprecated)] attribute to the re-export of old writer functions.
avro/src/schema/mod.rs
- Imported GenericDatumWriter for test cases.
- Replaced calls to crate::to_avro_datum with GenericDatumWriter in test assertions.
avro/src/serde/de.rs
- Imported GenericDatumWriter for test cases.
- Replaced calls to crate::to_avro_datum with GenericDatumWriter in test assertions.
avro/src/writer/datum.rs
- Introduced GenericDatumWriter struct with a builder pattern for Avro datum writing.
- Implemented write_value, write_value_ref, write_value_to_vec, write_ser, and write_ser_to_vec methods for GenericDatumWriter.
- Deprecated to_avro_datum, write_avro_datum_ref, and to_avro_datum_schemata functions.
- Refactored deprecated functions to internally use GenericDatumWriter.
- Removed internal write_avro_datum, write_avro_datum_schemata, and write_value_ref_owned_resolved functions.
avro/src/writer/single_object.rs
- Removed direct imports of write_avro_datum_ref and write_value_ref_owned_resolved.
- Imported encode_internal and SchemaAwareWriteSerializer.
- Updated serialize_into method to use SchemaAwareWriteSerializer for data serialization.
- Moved write_value_ref_owned_resolved function to be a private helper within this file.
avro/tests/avro-3786.rs
- Imported GenericDatumWriter.
- Replaced calls to to_avro_datum with GenericDatumWriter in test cases.
avro/tests/avro-3787.rs
- Imported GenericDatumWriter.
- Replaced calls to to_avro_datum with GenericDatumWriter in test cases.
avro/tests/io.rs
- Imported GenericDatumWriter.
- Replaced calls to to_avro_datum with GenericDatumWriter in various test functions.
avro/tests/schema.rs
- Imported GenericDatumWriter and serde traits.
- Removed direct import of to_avro_datum.
- Replaced calls to to_avro_datum with GenericDatumWriter in test cases.
avro/tests/to_from_avro_datum_schemata.rs
- Imported GenericDatumWriter.
- Removed direct import of to_avro_datum_schemata.
- Replaced calls to to_avro_datum_schemata with GenericDatumWriter in test cases.

Activity

The pull request was created with a note for AI review, indicating no human activity has occurred yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request is a great feature enhancement, replacing the standalone to_datum* functions with a more idiomatic and performant GenericDatumWriter. The introduction of new benchmarks to showcase the performance improvements is also a valuable addition. I've found a few potential issues related to namespace handling during serialization which could cause problems with schemas that use relative name resolution. My comments below provide specific suggestions to address these.

_{Note: Security Review did not run due to the size of the PR.}

gemini-code-assist · 2026-03-04T06:11:54Z

avro/src/writer/datum.rs

+    pub fn write_value_ref<W: Write>(&self, writer: &mut W, value: &Value) -> AvroResult<usize> {
+        if self.validate
+            && self.resolved.get_schemata().iter().all(|s| {
+                value
+                    .validate_internal(s, self.resolved.get_names(), None)
+                    .is_some()
+            })
+        {
+            return Err(Details::Validation.into());
+        }
+        encode_internal(value, self.schema, self.resolved.get_names(), None, writer)
+    }


The calls to validate_internal and encode_internal are missing the enclosing namespace, which is hardcoded to None. This can lead to errors when resolving nested schemas that use relative names. The namespace of the schema being used should be passed to ensure correct name resolution.

Suggested change

pub fn write_value_ref<W: Write>(&self, writer: &mut W, value: &Value) -> AvroResult<usize> {

if self.validate

&& self.resolved.get_schemata().iter().all(|s| {

value

.validate_internal(s, self.resolved.get_names(), None)

.is_some()

})

{

return Err(Details::Validation.into());

}

encode_internal(value, self.schema, self.resolved.get_names(), None, writer)

}

pub fn write_value_ref<W: Write>(&self, writer: &mut W, value: &Value) -> AvroResult<usize> {

if self.validate

&& self.resolved.get_schemata().iter().all(|s| {

value

.validate_internal(s, self.resolved.get_names(), s.namespace())

.is_some()

})

{

return Err(Details::Validation.into());

}

encode_internal(value, self.schema, self.resolved.get_names(), self.schema.namespace(), writer)

}

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! Instead of passing None as an enclosing namespace the main schema's namespace should be used instead. This will help with the resolving of any schema references used in record fields.

gemini-code-assist · 2026-03-04T06:11:54Z

avro/src/writer/datum.rs

+        let mut serializer =
+            SchemaAwareWriteSerializer::new(writer, self.schema, self.resolved.get_names(), None);
+        value.serialize(&mut serializer)


The enclosing_namespace parameter to SchemaAwareWriteSerializer::new is hardcoded to None. It should be self.schema.namespace() to correctly resolve relative names in schemas.

Suggested change

let mut serializer =

SchemaAwareWriteSerializer::new(writer, self.schema, self.resolved.get_names(), None);

value.serialize(&mut serializer)

let mut serializer =

SchemaAwareWriteSerializer::new(writer, self.schema, self.resolved.get_names(), self.schema.namespace());

value.serialize(&mut serializer)

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! Instead of passing None as an enclosing namespace the main schema's namespace should be used instead. This will help with the resolving of any schema references used in record fields.

gemini-code-assist · 2026-03-04T06:11:54Z

avro/src/writer/single_object.rs

+        let mut serializer = SchemaAwareWriteSerializer::new(
+            writer,
            self.resolved.get_root_schema(),
            self.resolved.get_names(),
-            data,
-            writer,
-        )?;
+            None,
+        );
+        let bytes = data.serialize(&mut serializer)?;


The enclosing_namespace parameter to SchemaAwareWriteSerializer::new is hardcoded to None. It should be self.resolved.get_root_schema().namespace() to correctly resolve relative names in schemas.

Suggested change

let mut serializer = SchemaAwareWriteSerializer::new(

writer,

self.resolved.get_root_schema(),

self.resolved.get_names(),

data,

writer,

)?;

None,

);

let bytes = data.serialize(&mut serializer)?;

let mut serializer = SchemaAwareWriteSerializer::new(

writer,

self.resolved.get_root_schema(),

self.resolved.get_names(),

self.resolved.get_root_schema().namespace(),

);

let bytes = data.serialize(&mut serializer)?;

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! Instead of passing None as an enclosing namespace the main schema's namespace should be used instead. This will help with the resolving of any schema references used in record fields.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-04T06:12:54Z

avro/src/writer/datum.rs

+            })
+        {
+            return Err(Details::Validation.into());
+        }


Validation checks all schemata instead of primary schema

Medium Severity

The write_value_ref validation iterates over self.resolved.get_schemata() (all schemata) with .all(), meaning it only rejects a value if it's invalid for every schema. The old write_avro_datum_schemata validated only against the primary schema (self.schema). When auxiliary schemata are provided (via .schemata()), a value that's invalid for the primary schema but coincidentally valid for an auxiliary schema will bypass validation and be encoded with the wrong schema, potentially producing corrupt data. The check needs to validate against self.schema, not all resolved schemata.

value:useful; category:bug; feedback: The Bugbot AI reviewer is correct! The value should be validated against the main schema by using the names of all additional schemata, not against all schemata. This will return the correct error reason when there is a problem. Prevents extra calculations to validate against unrelated schemas and returning wrong reasons for the error.

claude · 2026-03-04T06:13:29Z

PR Review: Replace `to_datum*` functions with `GenericDatumWriter`

Overall this is a solid design direction — introducing a reusable GenericDatumWriter struct is a meaningful ergonomic improvement over the free-function API, and the builder pattern (via bon) integrates cleanly with the existing codebase patterns. The deprecation annotations are well-placed and the bench additions effectively illustrate the reuse benefit. That said, I have a few concerns worth addressing before merging.

🐛 Bug: `enclosing_namespace` silently dropped

avro/src/writer/datum.rs, write_value_ref:

encode_internal(value, self.schema, self.resolved.get_names(), None, writer)

The namespace is passed as None, but the prior implementation of write_avro_datum_schemata passed schema.namespace():

let enclosing_namespace = schema.namespace();
encode_internal(&avro, schema, names, enclosing_namespace, buffer)

encode_internal uses the enclosing namespace when resolving Schema::Ref nodes (see encode.rs:73). With None, any reference whose fully-qualified name depends on the enclosing namespace will fail to resolve. For simple schemas this is harmless (and even the old encode function used None), but this is a silent regression for callers of to_avro_datum_schemata that have namespaced schemas. The fix is straightforward: pass self.schema.namespace() instead.

The same issue applies to the validate_internal call just above it — validate_schemata (in types.rs:384) uses schema.namespace() per schema, but write_value_ref passes None.

🐛 Bug: Validation may not check the primary schema

When GenericDatumWriter is built with .schemata(schemata), the builder calls:

let resolved = ResolvedSchema::new_with_schemata(schemata)?;

This stores only schemata in resolved.schemata. However, the primary schema (self.schema, passed to builder(schema)) is not automatically added to resolved.schemata. In write_value_ref, validation iterates self.resolved.get_schemata(), which only covers the additional schemata — not self.schema itself (unless the caller happened to include it in the schemata vec).

The old write_avro_datum_schemata explicitly validated against the primary schema:

if let Some(_err) = avro.validate_internal(schema, names, enclosing_namespace) {
    return Err(Details::Validation.into());
}

Consider adding the primary schema as the first element in the schemata vec (in the new constructor) so get_schemata()[0] is always the primary schema and validation always covers it.

📉 Regression: Richer error type lost

write_value_ref_owned_resolved (which is now private to single_object.rs) returns a Details::ValidationWithReason error containing the value, schema, and the reason string. The new write_value_ref returns only Details::Validation, discarding all diagnostic context. For users hitting a validation error this is a step backward in debuggability. Consider returning Details::ValidationWithReason at least when a reason string is available.

📝 Missing doc comments on public methods

Three public methods are exported without documentation:

write_value_ref — the main low-level write path, worth explaining the borrow vs owned distinction vs write_value
write_ser — should clarify when to prefer this over write_value
write_ser_to_vec — same

💡 Minor: `writer` visibility change has API surface implications

Changing mod writer → pub mod writer in lib.rs makes the entire writer module hierarchy public. GenericDatumWriter is now reachable as apache_avro::writer::datum::GenericDatumWriter, but it is not re-exported from apache_avro root. This creates a two-tier situation where some writer types (Writer, WriterBuilder) are at the root and others require the subpath. Consider either re-exporting GenericDatumWriter from the root (aligned with the other pub-use items in lib.rs) or keeping the module private and only re-exporting specific items.

🔍 Observation: Benchmark now measures writer construction overhead

The updated bench_small_schema_write_record and bench_big_schema_write_record create a new GenericDatumWriter on every iteration (including schema resolution), which is fine for measuring the "no reuse" path. Just be aware this benchmark is no longer directly comparable to the old to_avro_datum baseline if the goal is apples-to-apples throughput comparison — though the added "reused writer" benchmarks nicely fill that gap.

Summary

Category	Issue
🐛 Bug	`enclosing_namespace` not passed to `encode_internal`/`validate_internal`
🐛 Bug	Validation may skip the primary schema when using `.schemata()`
📉 Regression	`Details::ValidationWithReason` dropped, only `Details::Validation` returned
📝 Style	Missing doc comments on 3 public methods
💡 Minor	`GenericDatumWriter` not re-exported from crate root, inconsistent with other writer types

The core abstraction is well-designed and the refactor is otherwise clean. Fixing the namespace and primary-schema validation issues is the main blocker before this is safe to merge.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

avro/benches/single.rs (1)

199-234: Reduce allocation noise in reused-writer benchmarks.

The reused/no-validation paths still allocate a fresh Vec each iteration, which can dominate the measurement and blur writer-reuse gains. Reusing a buffer per benchmark gives cleaner writer-focused numbers.

♻️ Suggested benchmark adjustment

 fn bench_small_schema_write_record_reuse_datum_writer(c: &mut Criterion) {
     let (schema, record) = make_small_record().unwrap();
     let writer = GenericDatumWriter::builder(&schema).build().unwrap();
+    let mut buf = Vec::new();
     c.bench_function("small record (reused writer)", |b| {
-        b.iter(|| writer.write_value_ref(&mut Vec::new(), &record))
+        b.iter(|| {
+            buf.clear();
+            writer.write_value_ref(&mut buf, &record).unwrap()
+        })
     });
 }

 fn bench_big_schema_write_record_reuse_datum_writer(c: &mut Criterion) {
     let (schema, record) = make_big_record().unwrap();
     let writer = GenericDatumWriter::builder(&schema).build().unwrap();
+    let mut buf = Vec::new();
     c.bench_function("big record (reused writer)", |b| {
-        b.iter(|| writer.write_value_ref(&mut Vec::new(), &record))
+        b.iter(|| {
+            buf.clear();
+            writer.write_value_ref(&mut buf, &record).unwrap()
+        })
     });
 }

 fn bench_small_schema_write_record_no_validation(c: &mut Criterion) {
     let (schema, record) = make_small_record().unwrap();
     let writer = GenericDatumWriter::builder(&schema)
         .validate(false)
         .build()
         .unwrap();
+    let mut buf = Vec::new();
     c.bench_function("small record (no validation)", |b| {
-        b.iter(|| writer.write_value_ref(&mut Vec::new(), &record))
+        b.iter(|| {
+            buf.clear();
+            writer.write_value_ref(&mut buf, &record).unwrap()
+        })
     });
 }

 fn bench_big_schema_write_record_no_validation(c: &mut Criterion) {
     let (schema, record) = make_big_record().unwrap();
     let writer = GenericDatumWriter::builder(&schema)
         .validate(false)
         .build()
         .unwrap();
+    let mut buf = Vec::new();
     c.bench_function("big record (no validation)", |b| {
-        b.iter(|| writer.write_value_ref(&mut Vec::new(), &record))
+        b.iter(|| {
+            buf.clear();
+            writer.write_value_ref(&mut buf, &record).unwrap()
+        })
     });
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@avro/benches/single.rs` around lines 199 - 234, Benchmarks allocate a fresh
Vec in each iteration which adds allocation noise; for the four functions
bench_small_schema_write_record_reuse_datum_writer,
bench_big_schema_write_record_reuse_datum_writer,
bench_small_schema_write_record_no_validation, and
bench_big_schema_write_record_no_validation, create a single mutable buffer
(e.g., let mut buf = Vec::new()) outside the b.iter closure and inside the
closure call buf.clear() then pass &mut buf to writer.write_value_ref so the
buffer is reused across iterations and removes allocation noise from the
measurement.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@avro/src/writer/datum.rs`:
- Around line 88-99: The validation in write_value_ref is using
self.resolved.get_schemata().iter().all(...) which lets values pass if any
auxiliary schema accepts them; instead validate only against the writer schema.
Replace the current all(...) check with a direct call to
value.validate_internal(self.schema, self.resolved.get_names(), None) and, if
that returns Some(...), return the Validation error; leave the subsequent call
to encode_internal(value, self.schema, ...) unchanged. This targets
write_value_ref, self.schema, self.resolved.get_schemata(), and
value.validate_internal to ensure validation is performed against the writer
schema only.

---

Nitpick comments:
In `@avro/benches/single.rs`:
- Around line 199-234: Benchmarks allocate a fresh Vec in each iteration which
adds allocation noise; for the four functions
bench_small_schema_write_record_reuse_datum_writer,
bench_big_schema_write_record_reuse_datum_writer,
bench_small_schema_write_record_no_validation, and
bench_big_schema_write_record_no_validation, create a single mutable buffer
(e.g., let mut buf = Vec::new()) outside the b.iter closure and inside the
closure call buf.clear() then pass &mut buf to writer.write_value_ref so the
buffer is reused across iterations and removes allocation noise from the
measurement.

ℹ️ Review info

Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5827b7eb-13bc-4b90-a71d-ff126deedf23

📥 Commits

Reviewing files that changed from the base of the PR and between b652119 and 1a75da7.

📒 Files selected for processing (11)

avro/benches/single.rs
avro/src/lib.rs
avro/src/schema/mod.rs
avro/src/serde/de.rs
avro/src/writer/datum.rs
avro/src/writer/single_object.rs
avro/tests/avro-3786.rs
avro/tests/avro-3787.rs
avro/tests/io.rs
avro/tests/schema.rs
avro/tests/to_from_avro_datum_schemata.rs

coderabbitai · 2026-03-04T06:16:46Z

avro/src/writer/datum.rs

+    pub fn write_value_ref<W: Write>(&self, writer: &mut W, value: &Value) -> AvroResult<usize> {
+        if self.validate
+            && self.resolved.get_schemata().iter().all(|s| {
+                value
+                    .validate_internal(s, self.resolved.get_names(), None)
+                    .is_some()
+            })
+        {
+            return Err(Details::Validation.into());
+        }
+        encode_internal(value, self.schema, self.resolved.get_names(), None, writer)
+    }


⚠️ Potential issue | 🔴 Critical

Validate against the writer schema only; current all(...) check can bypass validation.

Line 90 currently checks every resolved schema and only fails if all validations fail. That means a value valid for any auxiliary schema can pass, even when invalid for self.schema, then Line 98 encodes against self.schema anyway.

🐛 Suggested fix

pub fn write_value_ref<W: Write>(&self, writer: &mut W, value: &Value) -> AvroResult<usize> { - if self.validate - && self.resolved.get_schemata().iter().all(|s| { - value - .validate_internal(s, self.resolved.get_names(), None) - .is_some() - }) - { + if self.validate + && value + .validate_internal( + self.schema, + self.resolved.get_names(), + self.schema.namespace(), + ) + .is_some() + { return Err(Details::Validation.into()); } - encode_internal(value, self.schema, self.resolved.get_names(), None, writer) + encode_internal( + value, + self.schema, + self.resolved.get_names(), + self.schema.namespace(), + writer, + ) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub fn write_value_ref<W: Write>(&self, writer: &mut W, value: &Value) -> AvroResult<usize> {

if self.validate

&& self.resolved.get_schemata().iter().all(|s| {

value

.validate_internal(s, self.resolved.get_names(), None)

.is_some()

})

{

return Err(Details::Validation.into());

}

encode_internal(value, self.schema, self.resolved.get_names(), None, writer)

}

pub fn write_value_ref<W: Write>(&self, writer: &mut W, value: &Value) -> AvroResult<usize> {

if self.validate

&& value

.validate_internal(

self.schema,

self.resolved.get_names(),

self.schema.namespace(),

)

.is_some()

{

return Err(Details::Validation.into());

}

encode_internal(

value,

self.schema,

self.resolved.get_names(),

self.schema.namespace(),

writer,

)

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@avro/src/writer/datum.rs` around lines 88 - 99, The validation in write_value_ref is using self.resolved.get_schemata().iter().all(...) which lets values pass if any auxiliary schema accepts them; instead validate only against the writer schema. Replace the current all(...) check with a direct call to value.validate_internal(self.schema, self.resolved.get_names(), None) and, if that returns Some(...), return the Validation error; leave the subsequent call to encode_internal(value, self.schema, ...) unchanged. This targets write_value_ref, self.schema, self.resolved.get_schemata(), and value.validate_internal to ensure validation is performed against the writer schema only.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The value should be validated against the main schema by using the names of all additional schemata, not against all schemata. This will return the correct error reason when there is a problem. Prevents extra calculations to validate against unrelated schemas and returning wrong reasons for the error.

augmentcode · 2026-03-04T06:17:44Z

🤖 Augment PR Summary

Summary: This PR introduces GenericDatumWriter as the new API for encoding single Avro values (datum) and migrates call sites off to_avro_datum*.

Changes:

Added writer::datum::GenericDatumWriter with a builder API, optional validation, and helpers to write to any Write or to a Vec<u8>.
Deprecated to_avro_datum, to_avro_datum_schemata, and write_avro_datum_ref in favor of the new writer.
Updated schema/serde unit tests and regression tests to use GenericDatumWriter for datum encoding.
Refactored single-object writer serialization to go through SchemaAwareWriteSerializer and inlined the resolved-value write helper.
Expanded Criterion benchmarks to compare per-iteration writer construction vs reusing the writer, and to measure validation overhead.

Technical Notes: The writer module is now public so downstreams can access writer::datum::GenericDatumWriter directly while legacy exports remain temporarily available (deprecated).

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 1 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-03-04T06:17:45Z

avro/src/writer/datum.rs

+    }
+
+    pub fn write_value_ref<W: Write>(&self, writer: &mut W, value: &Value) -> AvroResult<usize> {
+        if self.validate


Value::validate_internal returns Some(reason) when the value is invalid, but this check only errors when all schemata return Some (and it validates against every schema in ResolvedSchema, including dependencies). That seems likely to let values that are invalid for the writer's root schema slip through validation (or validate against the wrong schema).

Severity: high

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The value should be validated against the main schema by using the names of all additional schemata, not against all schemata. This will return the correct error reason when there is a problem. Prevents extra calculations to validate against unrelated schemas and returning wrong reasons for the error.

martin-augment · 2026-03-04T07:12:46Z

🐛 Bug: Validation may not check the primary schema

When GenericDatumWriter is built with .schemata(schemata), the builder calls:
let resolved = ResolvedSchema::new_with_schemata(schemata)?;
This stores only schemata in resolved.schemata. However, the primary schema (self.schema, passed to builder(schema)) is not automatically added to resolved.schemata. In write_value_ref, validation iterates self.resolved.get_schemata(), which only covers the additional schemata — not self.schema itself (unless the caller happened to include it in the schemata vec).

The old write_avro_datum_schemata explicitly validated against the primary schema:
if let Some(_err) = avro.validate_internal(schema, names, enclosing_namespace) {
    return Err(Details::Validation.into());
}
Consider adding the primary schema as the first element in the schemata vec (in the new constructor) so get_schemata()[0] is always the primary schema and validation always covers it.

value:useful; category:bug; feedback: The Claude AI reviewer is correct! The value should be validated against the main schema by using the names of all additional schemata, not against all schemata. This will return the correct error reason when there is a problem. Prevents extra calculations to validate against unrelated schemas and returning wrong reasons for the error.

Kriskras99 and others added 2 commits March 3, 2026 22:45

feat: Replace to_datum* functions with GenericDatumWriter

309cf38

fmt

1a75da7

gemini-code-assist bot reviewed Mar 4, 2026

View reviewed changes

cursor bot reviewed Mar 4, 2026

View reviewed changes

coderabbitai bot reviewed Mar 4, 2026

View reviewed changes

augmentcode bot reviewed Mar 4, 2026

View reviewed changes

Conversation

martin-augment commented Mar 4, 2026

Uh oh!

coderabbitai bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 4, 2026

Choose a reason for hiding this comment

Validation checks all schemata instead of primary schema

Uh oh!

martin-augment Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Mar 4, 2026

PR Review: Replace to_datum* functions with GenericDatumWriter

🐛 Bug: enclosing_namespace silently dropped

🐛 Bug: Validation may not check the primary schema

📉 Regression: Richer error type lost

📝 Missing doc comments on public methods

💡 Minor: writer visibility change has API surface implications

🔍 Observation: Benchmark now measures writer construction overhead

Summary

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot commented Mar 4, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment commented Mar 4, 2026

🐛 Bug: Validation may not check the primary schema

Uh oh!

Reviewers

Assignees

Labels

coderabbitai bot commented Mar 4, 2026 •

edited

Loading

PR Review: Replace `to_datum*` functions with `GenericDatumWriter`

🐛 Bug: `enclosing_namespace` silently dropped

💡 Minor: `writer` visibility change has API surface implications