Enum serialization doesn't seem to work correctly. I've prepared a test that demonstrates the issue, but basically I couldn't find a way to serialize an enum to match the following schema, other than manually building a Value, and even then schema resolution fails to find the correct union variant:
{
"name": "Root",
"type": "record",
"fields": [
{"name": "field_union", "type": [
{
"name": "A",
"type": "record",
"fields": []
},
{
"name": "B",
"type": "record",
"fields": []
},
{
"name": "C",
"type": "record",
"fields": [
{"name": "field_a", "type": "long"},
{"name": "field_b", "type": ["null", "string"]}
]
},
{
"name": "D",
"type": "record",
"fields": [
{"name": "field_a", "type": "float"},
{"name": "field_b", "type": "int"}
]
}
]},
{"name": "field_f", "type": "string"}
]
}
Here's the enum definition that's supposed to work with this schema, but it doesn't currently:
#[derive(Serialize)]
struct Root {
field_union: Enum,
field_f: String,
}
#[derive(Serialize)]
enum Enum {
A {},
B {},
C {
field_a: i64,
field_b: Option<String>,
},
D {
field_a: f32,
field_b: i32
},
}
So I've looked into the implementation, and found two issues:
- For some reason enum variants get serialized as records with two fields:
- "type" with an Avro enum for the discriminator,
- "value" with a union variant containing the fields.
instead of just serializing union variant directly. Because of this bytecode ends up being different from what's expected (there's an additional enum byte for "type"), but also schema resolution fails because it expects to find actual fields at the top level, while they are under "value".
- Separately from the first issue, even if you try to use
resolve() on a Value built manually, schema resolution ignores the variant number coming from Value::Union and tries to find a schema by simply matching the fields, which causes serialization to lose field data in some cases, depending on the order in which these variants are listed in the schema. In the example above, A is tested first, and since it doesn't have any fields, it matches any value.
Enum serialization doesn't seem to work correctly. I've prepared a test that demonstrates the issue, but basically I couldn't find a way to serialize an enum to match the following schema, other than manually building a
Value, and even then schema resolution fails to find the correct union variant:{ "name": "Root", "type": "record", "fields": [ {"name": "field_union", "type": [ { "name": "A", "type": "record", "fields": [] }, { "name": "B", "type": "record", "fields": [] }, { "name": "C", "type": "record", "fields": [ {"name": "field_a", "type": "long"}, {"name": "field_b", "type": ["null", "string"]} ] }, { "name": "D", "type": "record", "fields": [ {"name": "field_a", "type": "float"}, {"name": "field_b", "type": "int"} ] } ]}, {"name": "field_f", "type": "string"} ] }Here's the enum definition that's supposed to work with this schema, but it doesn't currently:
So I've looked into the implementation, and found two issues:
instead of just serializing union variant directly. Because of this bytecode ends up being different from what's expected (there's an additional enum byte for "type"), but also schema resolution fails because it expects to find actual fields at the top level, while they are under "value".
resolve()on aValuebuilt manually, schema resolution ignores the variant number coming fromValue::Unionand tries to find a schema by simply matching the fields, which causes serialization to lose field data in some cases, depending on the order in which these variants are listed in the schema. In the example above,Ais tested first, and since it doesn't have any fields, it matches any value.