Skip to content

351: fix: Don't depend on Serde to provide fields in the right order#9

Open
martin-augment wants to merge 1 commit intomainfrom
pr-351-2025-12-08-06-55-31
Open

351: fix: Don't depend on Serde to provide fields in the right order#9
martin-augment wants to merge 1 commit intomainfrom
pr-351-2025-12-08-06-55-31

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

351: To review by AI

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 8, 2025

Walkthrough

This pull request modifies Avro serialization to handle out-of-order struct fields and enforce default values for skipped fields. Changes include updating the SerializeRecordFieldWithSchema error variant to use owned String field names instead of &'static str, and introducing a new MissingDefaultForSkippedField error variant. The SchemaAwareWriteSerializeStruct now buffers out-of-order field data using field_cache and next_field tracking. The end-of-struct logic validates that all fields are either written or have defaults, returning appropriate errors when defaults are missing. Tests are updated to annotate skipped fields with #[avro(default)] attributes and verify error handling for missing defaults.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-351-2025-12-08-06-55-31

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 36c861a and a186161.

📒 Files selected for processing (3)
  • avro/src/error.rs (1 hunks)
  • avro/src/ser_schema.rs (8 hunks)
  • avro/tests/avro-rs-226.rs (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
avro/src/ser_schema.rs (4)
avro/src/error.rs (1)
  • new (38-42)
avro/src/writer.rs (2)
  • new (95-97)
  • schema (171-173)
avro/src/ser.rs (13)
  • new (56-63)
  • new (67-77)
  • new (81-88)
  • new (92-96)
  • new (100-106)
  • end (312-314)
  • end (328-330)
  • end (344-346)
  • end (364-372)
  • end (386-388)
  • end (418-427)
  • end (445-447)
  • end (465-476)
avro/src/schema.rs (5)
  • new (253-259)
  • new (375-377)
  • name (379-381)
  • name (1208-1216)
  • schema (7114-7114)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: clippy_check (stable)
  • GitHub Check: ci (1.86.0, ubuntu-24.04, x86_64-unknown-linux-gnu)
  • GitHub Check: ci (stable, ubuntu-24.04-arm, aarch64-unknown-linux-gnu)
  • GitHub Check: ci (beta, ubuntu-24.04-arm, wasm32-unknown-unknown)
  • GitHub Check: claude-review
  • GitHub Check: ci (stable, ubuntu-24.04, wasm32-unknown-unknown)
  • GitHub Check: interop
  • GitHub Check: web-assembly
🔇 Additional comments (8)
avro/src/error.rs (1)

517-525: LGTM! Error variant changes are well-structured.

The change from &'static str to String for field_name properly supports dynamic field names from schema lookups. The new MissingDefaultForSkippedField variant provides clear error messaging for skipped fields without defaults.

avro/src/ser_schema.rs (4)

252-256: LGTM! Out-of-order field buffering design.

The field_cache and next_field tracking enables proper handling of fields that arrive out of schema order. The approach is straightforward and maintains correctness.


272-313: Well-implemented in-order vs out-of-order field handling.

The logic correctly:

  • Writes fields directly when in-order (fast path)
  • Buffers out-of-order fields for later emission
  • Flushes buffered fields as soon as their position is reached

The linear search in field_cache is acceptable for typical record sizes.


390-417: LGTM! skip_field correctly delegates to serialize_next_field.

The implementation properly:

  • Looks up the skipped field in the schema
  • Serializes the default value through serialize_next_field (maintaining field order)
  • Returns MissingDefaultForSkippedField when no default is available

2982-3026: Good test coverage for out-of-order field serialization.

The test correctly validates the core feature: a Rust struct with fields (a, b) serialized against a schema with fields (b, a). The roundtrip through Writer/Reader confirms correct behavior.

avro/tests/avro-rs-226.rs (3)

47-49: LGTM! Added required default for skip_serializing_if field.

The #[avro(default = "null")] annotation is necessary because the new implementation requires defaults for fields that may be skipped during serialization.


108-122: LGTM! Selective default annotations are correctly applied.

The defaults are appropriately added only to fields that may be skipped during serialization:

  • skip_serializing and skip_serializing_if: need #[avro(default = "null")]
  • skip_deserializing: correctly has no default (it's still serialized)
  • skip: uses #[avro(skip)] to exclude from schema entirely

137-201: LGTM! Comprehensive panic tests for missing defaults.

The three #[should_panic] tests properly validate error handling for:

  1. skip_serializing_if without a default
  2. skip without a default
  3. skip_serializing without a default

The expected messages correctly match the MissingDefaultForSkippedField error format.

Comment on lines +347 to +351
// Check if all fields were written
if self.next_field != self.record_schema.fields.len() {
let name = self.record_schema.fields[self.next_field].name.clone();
return Err(Details::GetField(name).into());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Dead code: This check can never trigger.

The while loop at line 317 only exits when self.next_field == self.record_schema.fields.len(). Therefore, the condition at line 348 is always false and this block is unreachable.

Remove the redundant check:

         }
 
-        // Check if all fields were written
-        if self.next_field != self.record_schema.fields.len() {
-            let name = self.record_schema.fields[self.next_field].name.clone();
-            return Err(Details::GetField(name).into());
-        }
         assert!(
             self.field_cache.is_empty(),
             "There should be no more unwritten fields at this point"
         );
         Ok(self.bytes_written)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Check if all fields were written
if self.next_field != self.record_schema.fields.len() {
let name = self.record_schema.fields[self.next_field].name.clone();
return Err(Details::GetField(name).into());
}
}
assert!(
self.field_cache.is_empty(),
"There should be no more unwritten fields at this point"
);
Ok(self.bytes_written)
🤖 Prompt for AI Agents
In avro/src/ser_schema.rs around lines 347 to 351, the final check that returns
Err(Details::GetField(...)) is dead code because the preceding while loop always
exits only when self.next_field == self.record_schema.fields.len(); remove this
redundant if-block entirely (delete the lines that fetch the field name and
return the error) and ensure any trailing code and formatting remain consistent
after its removal.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! There is no need to check that the tracked position and the number of fields are the same because this is already checked in the earlier whilez condition. Prevents adding unused code.

@claude
Copy link
Copy Markdown

claude bot commented Dec 8, 2025

Summary: This PR fixes issue apache#351 by handling cases where Serde serializes struct fields in a different order than the Avro schema expects.

Positive Aspects:

  • Well-designed core algorithm with minimal performance impact for in-order fields
  • Comprehensive test coverage including edge cases and error scenarios
  • Good error handling with new MissingDefaultForSkippedField error
  • Clear documentation

Issues Found:

  1. CRITICAL - Dead Code (ser_schema.rs:347-350): The check "if self.next_field != self.record_schema.fields.len()" is unreachable because the while loop above already ensures this condition. Remove this redundant check.

  2. Performance Issue (ser_schema.rs:287-290, 319-322): Linear search O(n) on field_cache for every field could cause O(n²) complexity with many out-of-order fields. Recommend using HashMap<usize, Vec> instead of Vec<(usize, Vec)> for O(1) lookups.

  3. Production Assert (ser_schema.rs:352-355): Using assert! in library code can panic. Replace with debug_assert! or proper error handling.

  4. Breaking Change (error.rs:518): Changing field_name from &static str to String is a breaking change. Document in release notes if public API.

  5. Memory Concern: Unbounded buffering for out-of-order fields could consume significant memory with large records.

Performance Analysis:

  • Best case (in-order fields): Excellent - zero overhead
  • Worst case (reverse order): O(n²) complexity
  • Memory: Unbounded growth for out-of-order fields

Recommendations:
MUST FIX: Remove dead code check, replace assert with debug_assert or error
SHOULD FIX: Use HashMap for O(n) performance, add performance documentation
NICE TO HAVE: More edge case tests, performance benchmarks

Overall: Solid implementation with sound algorithm. Ready to merge after fixing the assert and dead code. HashMap optimization can be follow-up work.

@martin-augment
Copy link
Copy Markdown
Owner Author

2. Performance Issue (ser_schema.rs:287-290, 319-322): Linear search O(n) on field_cache for every field could cause O(n²) complexity with many out-of-order fields. Recommend using HashMap<usize, Vec> instead of Vec<(usize, Vec)> for O(1) lookups.

value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct! Using a HashMap will be both more ergonomic and succinct. There won't be a need to iterate over the items to find but it could use a map lookup directly. Prevents suboptimal performance

@martin-augment
Copy link
Copy Markdown
Owner Author

  1. CRITICAL - Dead Code (ser_schema.rs:347-350): The check "if self.next_field != self.record_schema.fields.len()" is unreachable because the while loop above already ensures this condition. Remove this redundant check.

value:useful; category:bug; feedback:The Claude AI reviewer is correct! There is no need to check that the tracked position and the number of fields are the same because this is already checked in the earlier whilez condition. Prevents adding unused code.

@martin-augment
Copy link
Copy Markdown
Owner Author

3. Production Assert (ser_schema.rs:352-355): Using assert! in library code can panic. Replace with debug_assert! or proper error handling.

value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct! A software library should avoid panicking if possible. In the current case it should return an Err like for any other problem earlier or to use debug_assertion!() macro that will report the problem in dev mode only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants