Skip to content

feat(avro-to-json): add wireFormat configuration parameter for deterministic Avro parsing #337

@katriendg

Description

@katriendg

Summary

Application: src/500-application/512-avro-to-json
WASM Operator for Azure IoT Operations

Add an optional wireFormat configuration parameter to the avro-to-json operator that lets users declare the wire format of incoming Avro messages. This replaces the current implicit try/fallback approach with deterministic, format-specific parsing when the format is known, while preserving backward-compatible auto-detection as the default.

Motivation

The current parse_with_schema function attempts raw Avro parsing first, and on failure retries after stripping a 5-byte Confluent Schema Registry prefix. This has two problems:

  1. Ambiguitystrip_confluent_header triggers on any payload starting with 0x00 longer than 5 bytes. If raw Avro data legitimately starts with 0x00 and fails for an unrelated reason (schema mismatch, truncation), the retry produces a misleading error.
  2. Extensibility — other schema registries (Apicurio, Karapace) use different header layouts. Adding more fallback branches creates a fragile chain of guesses.

A wireFormat parameter makes parsing deterministic when the format is known, and keeps the auto fallback for users who prefer zero-config.

Configuration Values

Value Behavior
auto Default. Current try-raw-then-strip-and-retry logic (backward-compatible).
confluent Always strip 5-byte Confluent prefix (0x00 + 4-byte schema ID) before parsing.
raw Parse directly with no prefix handling. Fail immediately if parsing fails.

Implementation Scope

Rust Code (operators/avro-to-json/src/lib.rs)

  • Add WireFormat enum (Auto, Confluent, Raw) and a WIRE_FORMAT OnceLock static
  • Parse wireFormat from configuration properties in avro_init
  • Refactor parse_with_schema to dispatch on WireFormat instead of implicit try/fallback
  • Extract parse_with_schema_inner(data, schema, wire_format) for testability
  • Keep strip_confluent_header unchanged
  • Add ~7 unit tests covering each variant

Graph YAML

  • Add wireFormat parameter to moduleConfigurations in resources/graphs/graph-avro-to-json.yaml

Documentation

  • Update configuration table in src/500-application/512-avro-to-json/README.md
  • Add "Wire Format" section explaining the three options
  • Update Avro Format Detection priority table
  • Add troubleshooting entry for Confluent prefix errors
  • Update blueprints/full-single-node-cluster/terraform/dataflow-graphs-avro-json.tfvars.example

Files to Modify

File Change
operators/avro-to-json/src/lib.rs Add WireFormat enum, static, init parsing, refactor parse_with_schema, add tests
resources/graphs/graph-avro-to-json.yaml Add wireFormat parameter definition
blueprints/full-single-node-cluster/terraform/dataflow-graphs-avro-json.tfvars.example Add wireFormat configuration entry
src/500-application/512-avro-to-json/README.md Config table, Wire Format section, detection table, troubleshooting

Version Impact

Non-breaking, additive change. Default auto behavior matches the current implementation exactly. Existing deployments require no configuration changes. Bump minor version in Cargo.toml.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions