fix: accept JSON string defaults for decimal fields in union#533
Open
valdo404 wants to merge 1 commit intoapache:mainfrom
Open
fix: accept JSON string defaults for decimal fields in union#533valdo404 wants to merge 1 commit intoapache:mainfrom
valdo404 wants to merge 1 commit intoapache:mainfrom
Conversation
6fd67a7 to
5219763
Compare
Per Avro 1.12.0 Specification, §"Complex Types / Records", the JSON
encoding of a `bytes` field's default value is a string whose codepoints
0-255 map to byte values 0-255 (e.g. `"\u00FF"`). The same section
specifies that a union-typed field's default must correspond to the
first schema that matches in the union.
`decimal` is defined as a logical type over `bytes`, so this rule
transitively applies: a nullable decimal field expressed as
`[{bytes, logicalType: decimal}, null]` with a JSON string default
requires `resolve_decimal` to accept `Value::String` when validating
defaults at schema parse time. Before this change the parser rejected
such schemas with `GetDefaultUnion(Decimal, String)`, even though
Java and Python Avro accept them.
The added arm walks the string's codepoints, rejecting any above
0xFF, and returns a `Value::Decimal`. The precision check is skipped
because the spec does not require a default's byte length to cover
the declared precision — it only requires a valid `bytes` value.
Wire-level decoded records always reach `resolve_decimal` as
`Value::Bytes`, so this arm is exclusively a default-validation path.
Tests cover `\u0000`, a full 0..=255 round-trip, codepoints > 0xFF
being rejected, and end-to-end parsing of a nullable decimal record
schema.
5219763 to
5bb5c4b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Schema parse fails when a nullable
decimalfield is expressed as[{bytes, logicalType: decimal}, null]with a JSON string default(e.g.
default: "\u0000") —resolve_default_valuereportsGetDefaultUnion(Decimal, String)becauseresolve_decimalhas noValue::Stringarm. Java, Python and Ruby Avro accept theseschemas; the Rust implementation should too.
Related bugs already declared & fixed in other bindings
This is the Rust counterpart of two tickets that have already been
triaged in the Apache Avro tracker:
default" (AVRO-3773: [ruby] fix validator for decimal default avro#2275, resolved in 1.11.2 / 1.12.0). Same
exact schema shape (
[{bytes, logicalType: decimal}, null]with aJSON string default), same root cause: the validator was evaluating
the default against the union's logical type instead of its
underlying
bytestype. Fixed in Ruby; the Rust binding neverreceived the equivalent.
for Union type field" (AVRO-3847: [Rust] Support default value of pre-defined name for Union type field avro#2468, closed 2023-08-31). Same
error message (
"One union type X must match the default's value type Y"), for a different union variant (Refto a pre-definednamed record). The fix added the missing resolver path for that
variant. This PR extends the same pattern to the
Decimallogicaltype.
Spec citations
From Avro 1.12.0 Specification, §"Complex Types / Records":
The "field default values" table in the same section lists
byteswith json type
stringand example"\u00FF", andfixedsimilarlywith
"\u00ff".decimalis a logical type defined on top ofbytes(orfixed) in§"Logical Types / Decimal", so the rule transitively applies to
decimal defaults: when a decimal field sits inside a union whose
first branch is
{bytes, logicalType: decimal}, its JSON default isa string and needs to validate against the decimal schema at parse
time.
Change
Added a
Value::String(s)arm toresolve_decimalthat walks thestring's codepoints, rejects any value > 0xFF, and collects the rest
as bytes wrapped in
Value::Decimal. The precision check isdeliberately not applied here because the spec does not require a
default's byte length to cover the declared precision — only that
the value be a valid member of the underlying
bytestype.Wire-level decoded records always arrive as
Value::Bytes, so thisarm is exclusively a default-validation path and does not affect
record decoding.
Test plan
types::tests::resolve_decimal_from_string_default—"\u0000"default, full 0..=255 round-trip, and codepoint > 0xFF rejection
types::tests::parse_schema_with_nullable_decimal_string_default— end-to-end
Schema::parse_strof a record with a nullabledecimal field using a JSON string default
cargo test -p apache-avro --lib— 559 lib tests passschema.rs,union_schema.rs,big_decimal.rs,avro-rs-285-bytes_deserialization.rset al.all still pass