Skip to content

Commit 5219763

Browse files
author
Laurent Valdes
committed
fix: accept JSON string defaults for decimal fields in union
Per Avro 1.12.0 Specification, §"Complex Types / Records", the JSON encoding of a `bytes` field's default value is a string whose codepoints 0-255 map to byte values 0-255 (e.g. `"\u00FF"`). The same section specifies that a union-typed field's default must correspond to the first schema that matches in the union. `decimal` is defined as a logical type over `bytes`, so this rule transitively applies: a nullable decimal field expressed as `[{bytes, logicalType: decimal}, null]` with a JSON string default requires `resolve_decimal` to accept `Value::String` when validating defaults at schema parse time. Before this change the parser rejected such schemas with `GetDefaultUnion(Decimal, String)`, even though Java and Python Avro accept them. The added arm walks the string's codepoints, rejecting any above 0xFF, and returns a `Value::Decimal`. The precision check is skipped because the spec does not require a default's byte length to cover the declared precision — it only requires a valid `bytes` value. Wire-level decoded records always reach `resolve_decimal` as `Value::Bytes`, so this arm is exclusively a default-validation path. Tests cover `\u0000`, a full 0..=255 round-trip, codepoints > 0xFF being rejected, and end-to-end parsing of a nullable decimal record schema.
1 parent 016d0f6 commit 5219763

File tree

1 file changed

+129
-0
lines changed

1 file changed

+129
-0
lines changed

avro/src/types.rs

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -846,6 +846,45 @@ impl Value {
846846
Ok(Value::Decimal(Decimal::from(bytes)))
847847
}
848848
}
849+
// The Avro 1.12.0 Specification, §"Complex Types / Records",
850+
// defines the JSON encoding of field default values. The
851+
// "field default values" table lists `bytes` and `fixed` with
852+
// json-type `string` (example `"\u00FF"`), and the surrounding
853+
// prose states:
854+
//
855+
// "Default values for bytes and fixed fields are JSON
856+
// strings, where Unicode code points 0-255 are mapped to
857+
// unsigned 8-bit byte values 0-255."
858+
//
859+
// The same section also states:
860+
//
861+
// "Default values for union fields correspond to the first
862+
// schema that matches in the union."
863+
//
864+
// `decimal` is a logical type whose underlying representation
865+
// is `bytes` (or `fixed`) — see §"Logical Types / Decimal" —
866+
// so the rule above transitively applies to decimal defaults.
867+
// When a nullable decimal field is expressed as a union whose
868+
// first branch is `{bytes, logicalType: decimal}`, its JSON
869+
// default is therefore a string, which reaches this function
870+
// as `Value::String`. Wire-level decoded records always
871+
// arrive as `Value::Bytes`, so this arm is exclusively a
872+
// default-value validation path. The precision check is
873+
// skipped because the spec does not require a default's byte
874+
// length to cover the full declared precision — it only
875+
// requires the value to be a valid member of the underlying
876+
// `bytes` type.
877+
Value::String(s) => {
878+
let mut bytes = Vec::with_capacity(s.len());
879+
for c in s.chars() {
880+
let cp = c as u32;
881+
if cp > 0xFF {
882+
return Err(Details::ResolveDecimal(Value::String(s)).into());
883+
}
884+
bytes.push(cp as u8);
885+
}
886+
Ok(Value::Decimal(Decimal::from(bytes)))
887+
}
849888
other => Err(Details::ResolveDecimal(other).into()),
850889
}
851890
}
@@ -1776,6 +1815,96 @@ Field with name '"b"' is not a member of the map items"#,
17761815
Ok(())
17771816
}
17781817

1818+
#[test]
1819+
fn resolve_decimal_from_string_default() -> TestResult {
1820+
// Avro 1.12.0 Specification, §"Complex Types / Records":
1821+
//
1822+
// "Default values for bytes and fixed fields are JSON strings,
1823+
// where Unicode code points 0-255 are mapped to unsigned 8-bit
1824+
// byte values 0-255."
1825+
//
1826+
// The `decimal` logical type (§"Logical Types / Decimal") is
1827+
// defined on top of `bytes`, so the same encoding rule applies to
1828+
// JSON-default-valued decimal fields, which reach `resolve_decimal`
1829+
// as `Value::String`.
1830+
let value = Value::String("\u{0000}".to_string());
1831+
let resolved = value.resolve(&Schema::Decimal(DecimalSchema {
1832+
precision: 10,
1833+
scale: 4,
1834+
inner: InnerDecimalSchema::Bytes,
1835+
}))?;
1836+
assert_eq!(resolved, Value::Decimal(Decimal::from(vec![0u8])));
1837+
1838+
// Full 0..=255 round-trip: every byte value is representable.
1839+
let mut all_bytes_str = String::new();
1840+
for b in 0u8..=255u8 {
1841+
all_bytes_str.push(char::from_u32(b as u32).unwrap());
1842+
}
1843+
let resolved = Value::String(all_bytes_str).resolve(&Schema::Decimal(DecimalSchema {
1844+
precision: 10,
1845+
scale: 0,
1846+
inner: InnerDecimalSchema::Bytes,
1847+
}))?;
1848+
assert_eq!(
1849+
resolved,
1850+
Value::Decimal(Decimal::from((0u8..=255u8).collect::<Vec<_>>()))
1851+
);
1852+
1853+
// Code points > 0xFF are not valid bytes defaults.
1854+
let value = Value::String("\u{0100}".to_string());
1855+
assert!(
1856+
value
1857+
.resolve(&Schema::Decimal(DecimalSchema {
1858+
precision: 10,
1859+
scale: 4,
1860+
inner: InnerDecimalSchema::Bytes,
1861+
}))
1862+
.is_err()
1863+
);
1864+
1865+
Ok(())
1866+
}
1867+
1868+
/// A nullable `decimal` field expressed as `[{bytes, logicalType:
1869+
/// decimal}, null]` with a JSON string default. Per Avro 1.12.0
1870+
/// Specification, §"Complex Types / Records":
1871+
///
1872+
/// "Default values for bytes and fixed fields are JSON strings,
1873+
/// where Unicode code points 0-255 are mapped to unsigned 8-bit
1874+
/// byte values 0-255."
1875+
///
1876+
/// and
1877+
///
1878+
/// "Default values for union fields correspond to the first schema
1879+
/// that matches in the union."
1880+
///
1881+
/// Without the `Value::String` arm in `resolve_decimal`, schema
1882+
/// parsing fails with `GetDefaultUnion(Decimal, String)`.
1883+
#[test]
1884+
fn parse_schema_with_nullable_decimal_string_default() -> TestResult {
1885+
let schema_json = r#"{
1886+
"type": "record",
1887+
"name": "NullableDecimal",
1888+
"fields": [
1889+
{
1890+
"name": "amount",
1891+
"type": [
1892+
{
1893+
"type": "bytes",
1894+
"scale": 4,
1895+
"precision": 10,
1896+
"logicalType": "decimal"
1897+
},
1898+
"null"
1899+
],
1900+
"default": "\u0000"
1901+
}
1902+
]
1903+
}"#;
1904+
Schema::parse_str(schema_json)?;
1905+
Ok(())
1906+
}
1907+
17791908
#[test]
17801909
fn resolve_decimal_invalid_scale() {
17811910
let value = Value::Decimal(Decimal::from(vec![1, 2]));

0 commit comments

Comments
 (0)