-
Notifications
You must be signed in to change notification settings - Fork 33
feat(avro-to-json): add wireFormat configuration parameter for deterministic Avro parsing #337
Description
Summary
Application: src/500-application/512-avro-to-json
WASM Operator for Azure IoT Operations
Add an optional wireFormat configuration parameter to the avro-to-json operator that lets users declare the wire format of incoming Avro messages. This replaces the current implicit try/fallback approach with deterministic, format-specific parsing when the format is known, while preserving backward-compatible auto-detection as the default.
Motivation
The current parse_with_schema function attempts raw Avro parsing first, and on failure retries after stripping a 5-byte Confluent Schema Registry prefix. This has two problems:
- Ambiguity —
strip_confluent_headertriggers on any payload starting with0x00longer than 5 bytes. If raw Avro data legitimately starts with0x00and fails for an unrelated reason (schema mismatch, truncation), the retry produces a misleading error. - Extensibility — other schema registries (Apicurio, Karapace) use different header layouts. Adding more fallback branches creates a fragile chain of guesses.
A wireFormat parameter makes parsing deterministic when the format is known, and keeps the auto fallback for users who prefer zero-config.
Configuration Values
| Value | Behavior |
|---|---|
auto |
Default. Current try-raw-then-strip-and-retry logic (backward-compatible). |
confluent |
Always strip 5-byte Confluent prefix (0x00 + 4-byte schema ID) before parsing. |
raw |
Parse directly with no prefix handling. Fail immediately if parsing fails. |
Implementation Scope
Rust Code (operators/avro-to-json/src/lib.rs)
- Add
WireFormatenum (Auto,Confluent,Raw) and aWIRE_FORMATOnceLockstatic - Parse
wireFormatfrom configuration properties inavro_init - Refactor
parse_with_schemato dispatch onWireFormatinstead of implicit try/fallback - Extract
parse_with_schema_inner(data, schema, wire_format)for testability - Keep
strip_confluent_headerunchanged - Add ~7 unit tests covering each variant
Graph YAML
- Add
wireFormatparameter tomoduleConfigurationsinresources/graphs/graph-avro-to-json.yaml
Documentation
- Update configuration table in
src/500-application/512-avro-to-json/README.md - Add "Wire Format" section explaining the three options
- Update Avro Format Detection priority table
- Add troubleshooting entry for Confluent prefix errors
- Update
blueprints/full-single-node-cluster/terraform/dataflow-graphs-avro-json.tfvars.example
Files to Modify
| File | Change |
|---|---|
operators/avro-to-json/src/lib.rs |
Add WireFormat enum, static, init parsing, refactor parse_with_schema, add tests |
resources/graphs/graph-avro-to-json.yaml |
Add wireFormat parameter definition |
blueprints/full-single-node-cluster/terraform/dataflow-graphs-avro-json.tfvars.example |
Add wireFormat configuration entry |
src/500-application/512-avro-to-json/README.md |
Config table, Wire Format section, detection table, troubleshooting |
Version Impact
Non-breaking, additive change. Default auto behavior matches the current implementation exactly. Existing deployments require no configuration changes. Bump minor version in Cargo.toml.