Skip to content

Commit 6fd67a7

Browse files
author
Laurent Valdes
committed
fix: accept JSON string defaults for decimal fields in union
Per Avro 1.12.0 Specification, §"Complex Types / Records", the JSON encoding of a `bytes` field's default value is a string whose codepoints 0-255 map to byte values 0-255 (e.g. `"\u00FF"`). The same section specifies that a union-typed field's default must correspond to the first schema that matches in the union. `decimal` is defined as a logical type over `bytes`, so this rule transitively applies: a nullable decimal field expressed as `[{bytes, logicalType: decimal}, null]` with a JSON string default requires `resolve_decimal` to accept `Value::String` when validating defaults at schema parse time. Before this change the parser rejected such schemas with `GetDefaultUnion(Decimal, String)`, even though Java and Python Avro accept them. The added arm walks the string's codepoints, rejecting any above 0xFF, and returns a `Value::Decimal`. The precision check is skipped because the spec does not require a default's byte length to cover the declared precision — it only requires a valid `bytes` value. Wire-level decoded records always reach `resolve_decimal` as `Value::Bytes`, so this arm is exclusively a default-validation path. Tests cover `\u0000`, a full 0..=255 round-trip, codepoints > 0xFF being rejected, and end-to-end parsing of a nullable decimal record schema.
1 parent 0470799 commit 6fd67a7

1 file changed

Lines changed: 129 additions & 0 deletions

File tree

avro/src/types.rs

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -787,6 +787,45 @@ impl Value {
787787
Ok(Value::Decimal(Decimal::from(bytes)))
788788
}
789789
}
790+
// The Avro 1.12.0 Specification, §"Complex Types / Records",
791+
// defines the JSON encoding of field default values. The
792+
// "field default values" table lists `bytes` and `fixed` with
793+
// json-type `string` (example `"\u00FF"`), and the surrounding
794+
// prose states:
795+
//
796+
// "Default values for bytes and fixed fields are JSON
797+
// strings, where Unicode code points 0-255 are mapped to
798+
// unsigned 8-bit byte values 0-255."
799+
//
800+
// The same section also states:
801+
//
802+
// "Default values for union fields correspond to the first
803+
// schema that matches in the union."
804+
//
805+
// `decimal` is a logical type whose underlying representation
806+
// is `bytes` (or `fixed`) — see §"Logical Types / Decimal" —
807+
// so the rule above transitively applies to decimal defaults.
808+
// When a nullable decimal field is expressed as a union whose
809+
// first branch is `{bytes, logicalType: decimal}`, its JSON
810+
// default is therefore a string, which reaches this function
811+
// as `Value::String`. Wire-level decoded records always
812+
// arrive as `Value::Bytes`, so this arm is exclusively a
813+
// default-value validation path. The precision check is
814+
// skipped because the spec does not require a default's byte
815+
// length to cover the full declared precision — it only
816+
// requires the value to be a valid member of the underlying
817+
// `bytes` type.
818+
Value::String(s) => {
819+
let mut bytes = Vec::with_capacity(s.len());
820+
for c in s.chars() {
821+
let cp = c as u32;
822+
if cp > 0xFF {
823+
return Err(Details::ResolveDecimal(Value::String(s)).into());
824+
}
825+
bytes.push(cp as u8);
826+
}
827+
Ok(Value::Decimal(Decimal::from(bytes)))
828+
}
790829
other => Err(Details::ResolveDecimal(other).into()),
791830
}
792831
}
@@ -1723,6 +1762,96 @@ Field with name '"b"' is not a member of the map items"#,
17231762
Ok(())
17241763
}
17251764

1765+
#[test]
1766+
fn resolve_decimal_from_string_default() -> TestResult {
1767+
// Avro 1.12.0 Specification, §"Complex Types / Records":
1768+
//
1769+
// "Default values for bytes and fixed fields are JSON strings,
1770+
// where Unicode code points 0-255 are mapped to unsigned 8-bit
1771+
// byte values 0-255."
1772+
//
1773+
// The `decimal` logical type (§"Logical Types / Decimal") is
1774+
// defined on top of `bytes`, so the same encoding rule applies to
1775+
// JSON-default-valued decimal fields, which reach `resolve_decimal`
1776+
// as `Value::String`.
1777+
let value = Value::String("\u{0000}".to_string());
1778+
let resolved = value.resolve(&Schema::Decimal(DecimalSchema {
1779+
precision: 10,
1780+
scale: 4,
1781+
inner: Box::new(Schema::Bytes),
1782+
}))?;
1783+
assert_eq!(resolved, Value::Decimal(Decimal::from(vec![0u8])));
1784+
1785+
// Full 0..=255 round-trip: every byte value is representable.
1786+
let mut all_bytes_str = String::new();
1787+
for b in 0u8..=255u8 {
1788+
all_bytes_str.push(char::from_u32(b as u32).unwrap());
1789+
}
1790+
let resolved = Value::String(all_bytes_str).resolve(&Schema::Decimal(DecimalSchema {
1791+
precision: 10,
1792+
scale: 0,
1793+
inner: Box::new(Schema::Bytes),
1794+
}))?;
1795+
assert_eq!(
1796+
resolved,
1797+
Value::Decimal(Decimal::from((0u8..=255u8).collect::<Vec<_>>()))
1798+
);
1799+
1800+
// Code points > 0xFF are not valid bytes defaults.
1801+
let value = Value::String("\u{0100}".to_string());
1802+
assert!(
1803+
value
1804+
.resolve(&Schema::Decimal(DecimalSchema {
1805+
precision: 10,
1806+
scale: 4,
1807+
inner: Box::new(Schema::Bytes),
1808+
}))
1809+
.is_err()
1810+
);
1811+
1812+
Ok(())
1813+
}
1814+
1815+
/// A nullable `decimal` field expressed as `[{bytes, logicalType:
1816+
/// decimal}, null]` with a JSON string default. Per Avro 1.12.0
1817+
/// Specification, §"Complex Types / Records":
1818+
///
1819+
/// "Default values for bytes and fixed fields are JSON strings,
1820+
/// where Unicode code points 0-255 are mapped to unsigned 8-bit
1821+
/// byte values 0-255."
1822+
///
1823+
/// and
1824+
///
1825+
/// "Default values for union fields correspond to the first schema
1826+
/// that matches in the union."
1827+
///
1828+
/// Without the `Value::String` arm in `resolve_decimal`, schema
1829+
/// parsing fails with `GetDefaultUnion(Decimal, String)`.
1830+
#[test]
1831+
fn parse_schema_with_nullable_decimal_string_default() -> TestResult {
1832+
let schema_json = r#"{
1833+
"type": "record",
1834+
"name": "NullableDecimal",
1835+
"fields": [
1836+
{
1837+
"name": "amount",
1838+
"type": [
1839+
{
1840+
"type": "bytes",
1841+
"scale": 4,
1842+
"precision": 10,
1843+
"logicalType": "decimal"
1844+
},
1845+
"null"
1846+
],
1847+
"default": "\u0000"
1848+
}
1849+
]
1850+
}"#;
1851+
Schema::parse_str(schema_json)?;
1852+
Ok(())
1853+
}
1854+
17261855
#[test]
17271856
fn resolve_decimal_invalid_scale() {
17281857
let value = Value::Decimal(Decimal::from(vec![1, 2]));

0 commit comments

Comments
 (0)