I noticed another inconsistent behavior btw while working through schema aware serialization, related to #512
Is it intended to supporting serializing non-union like structs using a union schema (if they are a variant of the union) by doing lookup in the union for the correct branch?
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
use apache_avro::Schema;
use apache_avro::writer::datum::GenericDatumWriter;
use apache_avro_test_helper::TestResult;
use pretty_assertions::assert_eq;
use serde::Serialize;
fn serialize<T: Serialize>(value: &T, schema: &Schema) -> Result<Vec<u8>, apache_avro::Error> {
GenericDatumWriter::builder(schema)
.build()?
.write_ser_to_vec(value)
}
fn union_schema() -> Schema {
Schema::parse_str(
r#"
[
"int",
{
"type": "record",
"name": "TestRecord",
"fields": [
{"name": "a", "type": "int"}
]
},
{
"type": "enum",
"name": "TestEnum",
"symbols": ["A", "B"]
}
]
"#,
)
.unwrap()
}
#[derive(Debug, Serialize)]
struct TestRecord {
a: i32,
}
#[allow(dead_code)]
#[derive(Debug, Serialize)]
enum TestEnum {
A,
B,
}
#[test]
fn primitive_can_serialize_with_union_schema() -> TestResult {
let serialized = serialize(&42_i32, &union_schema())?;
assert_eq!(serialized, vec![0, 84]);
Ok(())
}
#[test]
fn struct_can_serialize_with_union_schema() -> TestResult {
let serialized = serialize(&TestRecord { a: 27 }, &union_schema())?;
assert_eq!(serialized, vec![2, 54]);
Ok(())
}
#[test]
#[should_panic(expected = "Expected Schema::Null | Schema::Record(name: A, fields: []) at index 0 in the union")]
fn enum_cannot_serialize_with_union_schema() {
let serialized= serialize(&TestEnum::A, &union_schema()).unwrap();
assert_eq!(serialized, vec![4, 0]);
}
In the example, i32 and TestRecord can be serialized using a UnionSchema because its one of the variants of the union
|
Schema::Union(union) => { |
If this is intended, then that means theres a bug for enum serialization
|
Schema::Union(union) => match self.get_resolved_union_variant(union, variant_index)? { |
In serialize_newtype_variant. It assumes the Rust Enum being serialized is a Rust enum representing an avro union holding different branches instead of a Rust enum representing an avro enum, which causes it to assume that a unit variant should either be a null or record.
To fix this, there would need to be some edge case handling, specifically by using the name to lookup and check if we are serializing a (rust enum thats representing an avro enum), or a (rust enum thats representing an avro union). (Its finicky because rust doesn't different these at the type level).
My preference would just to disallow serializing Non union rust representations using a union schema that contains that representation, but that's also a breaking change because I believe previously it was supported, although also sometimes unreliable).
If we want to support it, I can create a separate PR to try to support this, although I'm not even sure if it's possible because it would be impossible to tell when in serialize unit variant to identify if we are a rust enum representing enum or rust enum representing union... which is why my preference would be to disallow this behavior entirely since right now it is a halfway there implementation which could be confusing
I noticed another inconsistent behavior btw while working through schema aware serialization, related to #512
Is it intended to supporting serializing non-union like structs using a union schema (if they are a variant of the union) by doing lookup in the union for the correct branch?
In the example,
i32andTestRecordcan be serialized using aUnionSchemabecause its one of the variants of the unionavro-rs/avro/src/serde/ser_schema/mod.rs
Line 627 in 016d0f6
If this is intended, then that means theres a bug for enum serialization
avro-rs/avro/src/serde/ser_schema/mod.rs
Line 429 in 016d0f6
In
serialize_newtype_variant. It assumes the Rust Enum being serialized is a Rust enum representing an avro union holding different branches instead of a Rust enum representing an avro enum, which causes it to assume that a unit variant should either be a null or record.To fix this, there would need to be some edge case handling, specifically by using the name to lookup and check if we are serializing a (rust enum thats representing an avro enum), or a (rust enum thats representing an avro union). (Its finicky because rust doesn't different these at the type level).
My preference would just to disallow serializing Non union rust representations using a union schema that contains that representation, but that's also a breaking change because I believe previously it was supported, although also sometimes unreliable).
If we want to support it, I can create a separate PR to try to support this, although I'm not even sure if it's possible because it would be impossible to tell when in serialize unit variant to identify if we are a rust enum representing enum or rust enum representing union... which is why my preference would be to disallow this behavior entirely since right now it is a halfway there implementation which could be confusing