Skip to content

3317: [ruby] Keep all union record errors and tag them accordingly#11

Open
martin-augment wants to merge 4 commits intomainfrom
pr-3317-2025-10-23-10-26-42
Open

3317: [ruby] Keep all union record errors and tag them accordingly#11
martin-augment wants to merge 4 commits intomainfrom
pr-3317-2025-10-23-10-26-42

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

@martin-augment martin-augment commented Oct 23, 2025

3317: To review by AI

Summary by CodeRabbit

  • Bug Fixes

    • Improved error reporting for schema validation failures. Union type mismatches now display comprehensive error messages with member type details and nested validation failures, providing clearer context about validation errors.
  • Tests

    • Updated test expectations to validate enhanced error reporting. Added test coverage for complex union scenarios with multiple schema types.

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

error
end.join("; ")
schema_name_prefix = "#{failed_complex_type[:schema_name]}: " if failed_complex_type[:schema_name]
complex_type_errors << "#{schema_name_prefix}#{error_msg}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema_name_prefix is assigned only when schema_name is present; on iterations without a name, the variable can retain the previous iteration’s value and incorrectly prefix errors. Consider resetting the prefix each loop iteration to avoid leaking a prior schema name.

🤖 React with 👍 or 👎 to let us know if the comment was useful.

Copy link
Copy Markdown
Owner Author

@martin-augment martin-augment Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback:the model wrongly assumes that the local variable would leak to the next iteration. This could happen with "for loop" in Ruby but the code here uses an iterator.

Here is a small test showing that temp does not survive to the next loop:

#!/usr/bin/env ruby

i = 0
collection = ['a', 'b', 'c']
collection.each do |x|
  temp = "#{x}" if i % 2 == 0
  print "#{i} #{x}: #{temp}\n"
  i = i + 1
end

It prints:

0 a: a
1 b:
2 c: c

@github-actions github-actions bot added the Ruby label Oct 23, 2025
@claude
Copy link
Copy Markdown

claude bot commented Oct 23, 2025

Code Review for PR #11: Keep all union record errors and tag them accordingly

Summary

This PR improves error reporting for union type validation failures by collecting and displaying all complex type errors with schema names, rather than just showing the first one. This is a valuable enhancement for debugging validation issues.

Code Quality & Best Practices

Strengths

Improved error messages: The new error format provides much better context by showing all failed union member validation errors
Schema name tagging: Adding schema names to complex type errors makes debugging significantly easier
Good test coverage: Two new comprehensive tests cover the new functionality well
Backward compatible: The changes enhance existing behavior without breaking the API

Issues & Suggestions

1. Code Style - Unnecessary mapping (line 205-207)

The .map operation is a no-op and should be simplified to:
error_msg = failed_complex_type[:result].errors.join("; ")

Location: lang/ruby/lib/avro/schema_validator.rb:205-207

2. Code Style - Trailing whitespace (line 203, 208)

Line 203 has double space after = and line 208 has trailing space

Location: lang/ruby/lib/avro/schema_validator.rb:203,208

3. Code Style - String literal choice (line 215)

Using %Q for a simple string with interpolation is unconventional. Standard double-quoted string is more idiomatic.

Location: lang/ruby/lib/avro/schema_validator.rb:215

4. Code Style - Redundant conditional blocks (line 222-224)

The original one-liner was more concise and Ruby-idiomatic. Consider reverting to:
next datum.nil? if schema.type_sym == :null

Location: lang/ruby/lib/avro/schema_validator.rb:222-224

Potential Bugs

⚠️ Enum types not included in complex type errors

Looking at the test test_validate_union_complex_and_simple_types, it includes an enum type (person) but the error output doesn't show enum-specific errors. The COMPLEX_TYPES constant is missing :enum. Consider whether :enum should be added to COMPLEX_TYPES for consistency.

Location: lang/ruby/lib/avro/schema_validator.rb:24

⚠️ Empty line gap (line 200-201)

There's an unnecessary blank line that was added.

Location: lang/ruby/lib/avro/schema_validator.rb:200-201

Performance Considerations

Minimal performance impact: The changes add a modest amount of string processing when validation fails, which is acceptable since this is an error path
No changes to happy path: Successful validations aren't affected
⚠️ Slight increase in memory: Collecting all errors instead of just the first complex type error will use marginally more memory, but this is negligible

Security Concerns

No security issues identified: The changes are purely related to error message formatting
No user input directly embedded: Error messages properly use string interpolation

Test Coverage

Excellent test additions: Two new comprehensive tests cover the new functionality
Tests verify exact error output: Using heredoc for multi-line expected errors is good practice
Edge cases covered: Tests cover unions with multiple record types and enum types

Suggestion

Consider adding a test case that specifically validates behavior when :enum types fail in a union to clarify whether the current behavior is intentional.

Recommendations

Must Fix:

  1. Remove unnecessary .map on line 205-207
  2. Fix trailing/extra whitespace on lines 203 and 208

Should Fix:
3. Consider whether :enum should be added to COMPLEX_TYPES
4. Replace %Q with standard double-quoted string on line 215
5. Revert to one-liner style for the nil check (lines 222-224)
6. Remove extra blank line (line 200-201)

Nice to Have:
7. Add a test case for enum validation failures in unions

Conclusion

This is a valuable improvement to error reporting that will help developers debug validation issues more effectively. The implementation is solid but needs some minor style cleanup. Once the code style issues are addressed, this will be ready to merge.

Overall Assessment: ✅ Approve with minor changes requested

@github-actions
Copy link
Copy Markdown

  • lang/ruby/lib/avro/schema_validator.rb:205 – The new branch only adds a single aggregated string to result.add_error, whereas we previously appended each complex-branch failure individually via result << error. Any caller that iterates ValidationError#result.errors now receives one multiline element instead of discrete field-level entries, which breaks the public API. Could we keep pushing each failing branch’s messages (perhaps with the new prefixes) and optionally add the summary as an extra entry so downstream code is not forced to split strings?

  • lang/ruby/lib/avro/schema_validator.rb:213 (and the matching schema_name capture at line 230) – We’re surfacing schema.name, which is just the short name. If a union contains records from different namespaces that share a short name, the output will show identical tags for different branches. Using schema.fullname (with a fallback to name when fullname is absent) would keep the error report unambiguous.

@martin-augment
Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 24, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 24, 2025

Walkthrough

The schema validator now aggregates errors from all failed union member types instead of stopping at the first failure, providing consolidated error messages that include schema names and structured error blocks for improved debugging.

Changes

Cohort / File(s) Change Summary
Union Validation Error Aggregation
lang/ruby/lib/avro/schema_validator.rb
Modified union validation logic to collect per-member error messages from all complex member types, optionally prefix errors with schema names, and append a consolidated "Union type specific errors" block. Enhanced error display to show union member types with names instead of simple type symbols.
Test Updates
lang/ruby/test/test_schema_validator.rb
Updated test error expectations in test_validate_union_of_nil_and_record_inside_array and test_validate_union_extra_fields to match new multi-line union-aware error format. Added new test test_validate_union_complex_and_simple_types to validate error reporting for complex unions with multiple record and enum types.

Sequence Diagram

sequenceDiagram
    participant Validator as Validator
    participant Union as Union Handler
    participant Members as Schema Members
    participant ErrorLog as Error Aggregator

    rect rgb(200, 220, 240)
    Note over Validator,ErrorLog: Old Flow: Stop at First Error
    Validator->>Union: validate union with [null, record1, record2]
    Union->>Members: check member 1 (null)
    Members-->>Union: OK
    Union->>Members: check member 2 (record1)
    Members-->>Union: FAIL
    Union-->>Validator: return first error only
    end

    rect rgb(220, 240, 200)
    Note over Validator,ErrorLog: New Flow: Aggregate All Errors
    Validator->>Union: validate union with [null, record1, record2]
    Union->>ErrorLog: initialize aggregator
    Union->>Members: check member 1 (null)
    Members-->>Union: OK
    Union->>Members: check member 2 (record1)
    Members-->>Union: FAIL → with schema name
    ErrorLog->>ErrorLog: record error for record1
    Union->>Members: check member 3 (record2)
    Members-->>Union: FAIL → with schema name
    ErrorLog->>ErrorLog: record error for record2
    Union->>ErrorLog: build consolidated message
    ErrorLog-->>Validator: return aggregated error block
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The changes involve logic modifications to error aggregation and reporting within union validation, paired with multiple test expectation updates across different test cases. The core logic change is contained but requires careful verification of the error consolidation behavior and message formatting. Test updates are consistent in pattern but span multiple test methods with varying union compositions.

Poem

🐰 The validator now gathers all tales,
Not stopping when first check fails,
Each union member's error told,
With schema names, both new and bold,
Better debugging, worth its weight in gold! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "Keep all union record errors and tag them accordingly" clearly and accurately reflects the main changes in the changeset. The modifications to the Ruby schema validator now aggregate all errors from failed union member types instead of stopping at the first failure, and enhance error reporting by tagging errors with schema names and appending a consolidated "Union type specific errors" block. The title is concise, specific, and free of vague terminology or noise. It effectively communicates the primary intent of the changeset to someone reviewing the commit history.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-3317-2025-10-23-10-26-42

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 RuboCop (1.81.1)
lang/ruby/test/test_schema_validator.rb

Error: The Gemspec/DateAssignment cop has been removed. Please use Gemspec/DeprecatedAttributeAssignment instead.
(obsolete configuration found in lang/ruby/.rubocop.yml, please update it)

lang/ruby/lib/avro/schema_validator.rb

Error: The Gemspec/DateAssignment cop has been removed. Please use Gemspec/DeprecatedAttributeAssignment instead.
(obsolete configuration found in lang/ruby/.rubocop.yml, please update it)


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
lang/ruby/lib/avro/schema_validator.rb (3)

205-207: Remove redundant map operation.

The .map do |error| error end is an identity function that returns each error unchanged. You can simplify this by directly calling .join("; ") on the errors array.

Apply this diff:

-        error_msg = failed_complex_type[:result].errors.map do |error|
-          error
-        end.join("; ")
+        error_msg = failed_complex_type[:result].errors.join("; ")

208-208: Remove trailing whitespace.

There's trailing whitespace after the string interpolation and before the if keyword.

Apply this diff:

-        schema_name_prefix = "#{failed_complex_type[:schema_name]}: " if failed_complex_type[:schema_name] 
+        schema_name_prefix = "#{failed_complex_type[:schema_name]}: " if failed_complex_type[:schema_name]

215-215: Consider using standard string interpolation.

The %Q{} syntax is unconventional here. Standard string interpolation with double quotes would be more idiomatic and readable.

Apply this diff:

-      type_mismatches = %Q{\nUnion type specific errors:\n#{complex_type_errors.join("\n")}} if complex_type_errors.any?
+      type_mismatches = "\nUnion type specific errors:\n#{complex_type_errors.join("\n")}" if complex_type_errors.any?
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a80fc3f and 6379691.

📒 Files selected for processing (2)
  • lang/ruby/lib/avro/schema_validator.rb (1 hunks)
  • lang/ruby/test/test_schema_validator.rb (2 hunks)
🔇 Additional comments (6)
lang/ruby/lib/avro/schema_validator.rb (3)

222-224: Verify the use of next within find block.

Using next inside a find block to return a boolean value is unconventional. The standard pattern would be to simply return the boolean expression directly. However, this may be intentional to skip to the next iteration when the datum is nil and the schema is null (indicating a match).

Please verify this logic is correct. The next datum.nil? will return true from the find block if datum is nil (causing find to return the schema), or false if datum is not nil (causing find to continue to the next schema). This appears correct for null compatibility checking.


201-217: Enhanced union error reporting looks good.

The implementation successfully aggregates errors from all complex union member types and presents them in a structured, debuggable format. The conditional inclusion of schema names provides valuable context when available, and the separation of complex type errors into a dedicated block improves readability.


228-232: Proper failure context tracking.

The enhancement to include schema_name in the failure hash when available provides valuable debugging context. The conditional assignment ensures backward compatibility when schemas don't have names.

lang/ruby/test/test_schema_validator.rb (3)

334-348: Test expectations correctly updated.

The expected error message properly reflects the new union error format with the "Union type specific errors" block. The multi-line format using heredoc improves readability.


564-569: Enhanced error message validation.

The updated expectation correctly validates the new format that includes both the union type list (with named types) and the specific per-member error details.


572-606: Excellent test coverage for complex unions.

This new test provides comprehensive coverage for a complex union scenario with multiple record types and an enum. It validates that:

  • All complex member types are evaluated
  • Per-member errors are collected and tagged with schema names
  • Extra field validation works across multiple union branches
  • Error messages are properly formatted and aggregated

The test data with intentionally missing required fields (name, species) and extra fields (namo, color) effectively exercises the error aggregation logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants