Skip to content

Avro serialization #1

@romain-gilles-ultra

Description

@romain-gilles-ultra

Avro serialization

Our entire data stack is based on Avro capability by now dkafka supports JSON serialization format.
In order to remove an intermediary component in our stack that transforms the JSON into Avro, we must add Avro format to dkafka.

By doing so, we will reduce the development time and the mapping development which by itself is error-prone. This will also avoid an intermediary transformation stage in our processing pipeline and improve the latency.

Schema management

Avro is a schema first serialization format. The schema needs to be known by the reader before starting to deserialize it.
To exchange the schema you can put the schema at the beginning of the message or use a schema registry.

Schema registry

We will use a schema registry.
In this approach, the serialization of the message must follow a wire format. Here is the example of how it is implemented in Java AbstractKafkaAvroSerializer.java.

Schema generation

Avro schema is declared in JSON format which makes it easier to programmatically generate them than other IDL approaches.
You can find the format specification here at the time of this documentation we are supporting 1.x version and the current version is 1.10.2.

dkafka evolution

  • it must support another format and in this case it's Avro.
  • it must support the confluent wire format
  • it must be able to generate an Avro schema based on the smart contract action
  • it must collaborate with the schema registry to register the generated schema and serialize the corresponding id in the message.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions