Skip to content

fix: accept JSON string defaults for decimal fields in union#533

Open
valdo404 wants to merge 1 commit intoapache:mainfrom
valdo404:fix/decimal-union-string-default
Open

fix: accept JSON string defaults for decimal fields in union#533
valdo404 wants to merge 1 commit intoapache:mainfrom
valdo404:fix/decimal-union-string-default

Conversation

@valdo404
Copy link
Copy Markdown

@valdo404 valdo404 commented Apr 11, 2026

Summary

Schema parse fails when a nullable decimal field is expressed as
[{bytes, logicalType: decimal}, null] with a JSON string default
(e.g. default: "\u0000") — resolve_default_value reports
GetDefaultUnion(Decimal, String) because resolve_decimal has no
Value::String arm. Java, Python and Ruby Avro accept these
schemas; the Rust implementation should too.

Related bugs already declared & fixed in other bindings

This is the Rust counterpart of two tickets that have already been
triaged in the Apache Avro tracker:

  • AVRO-3773 — "[Ruby] Decimal logical type fail to validate
    default" (AVRO-3773: [ruby] fix validator for decimal default avro#2275, resolved in 1.11.2 / 1.12.0). Same
    exact schema shape ([{bytes, logicalType: decimal}, null] with a
    JSON string default), same root cause: the validator was evaluating
    the default against the union's logical type instead of its
    underlying bytes type. Fixed in Ruby; the Rust binding never
    received the equivalent.
  • AVRO-3847 — "[Rust] Support default value of pre-defined name
    for Union type field" (AVRO-3847: [Rust] Support default value of pre-defined name for Union type field avro#2468, closed 2023-08-31). Same
    error message ("One union type X must match the default's value type Y"), for a different union variant (Ref to a pre-defined
    named record). The fix added the missing resolver path for that
    variant. This PR extends the same pattern to the Decimal logical
    type.

Spec citations

From Avro 1.12.0 Specification, §"Complex Types / Records":

"Default values for bytes and fixed fields are JSON strings, where
Unicode code points 0-255 are mapped to unsigned 8-bit byte values
0-255."

"Default values for union fields correspond to the first schema
that matches in the union."

The "field default values" table in the same section lists bytes
with json type string and example "\u00FF", and fixed similarly
with "\u00ff".

decimal is a logical type defined on top of bytes (or fixed) in
§"Logical Types / Decimal", so the rule transitively applies to
decimal defaults: when a decimal field sits inside a union whose
first branch is {bytes, logicalType: decimal}, its JSON default is
a string and needs to validate against the decimal schema at parse
time.

Change

Added a Value::String(s) arm to resolve_decimal that walks the
string's codepoints, rejects any value > 0xFF, and collects the rest
as bytes wrapped in Value::Decimal. The precision check is
deliberately not applied here because the spec does not require a
default's byte length to cover the declared precision — only that
the value be a valid member of the underlying bytes type.

Wire-level decoded records always arrive as Value::Bytes, so this
arm is exclusively a default-validation path and does not affect
record decoding.

Test plan

  • types::tests::resolve_decimal_from_string_default"\u0000"
    default, full 0..=255 round-trip, and codepoint > 0xFF rejection
  • types::tests::parse_schema_with_nullable_decimal_string_default
    — end-to-end Schema::parse_str of a record with a nullable
    decimal field using a JSON string default
  • Full cargo test -p apache-avro --lib — 559 lib tests pass
  • Integration test files schema.rs, union_schema.rs,
    big_decimal.rs, avro-rs-285-bytes_deserialization.rs et al.
    all still pass

@valdo404 valdo404 force-pushed the fix/decimal-union-string-default branch from 6fd67a7 to 5219763 Compare April 11, 2026 14:47
Per Avro 1.12.0 Specification, §"Complex Types / Records", the JSON
encoding of a `bytes` field's default value is a string whose codepoints
0-255 map to byte values 0-255 (e.g. `"\u00FF"`). The same section
specifies that a union-typed field's default must correspond to the
first schema that matches in the union.

`decimal` is defined as a logical type over `bytes`, so this rule
transitively applies: a nullable decimal field expressed as
`[{bytes, logicalType: decimal}, null]` with a JSON string default
requires `resolve_decimal` to accept `Value::String` when validating
defaults at schema parse time. Before this change the parser rejected
such schemas with `GetDefaultUnion(Decimal, String)`, even though
Java and Python Avro accept them.

The added arm walks the string's codepoints, rejecting any above
0xFF, and returns a `Value::Decimal`. The precision check is skipped
because the spec does not require a default's byte length to cover
the declared precision — it only requires a valid `bytes` value.
Wire-level decoded records always reach `resolve_decimal` as
`Value::Bytes`, so this arm is exclusively a default-validation path.

Tests cover `\u0000`, a full 0..=255 round-trip, codepoints > 0xFF
being rejected, and end-to-end parsing of a nullable decimal record
schema.
@valdo404 valdo404 force-pushed the fix/decimal-union-string-default branch from 5219763 to 5bb5c4b Compare April 11, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant