fix(application): avro-to-json handle JSON-string-encoded schema and Confluent wire format prefix#336
Open
fix(application): avro-to-json handle JSON-string-encoded schema and Confluent wire format prefix#336
Conversation
…coded schemas - implement unwrap_schema_value to handle JSON string wrapping - add tests for unwrap_schema_value covering various cases - ensure schema parsing works correctly with unwrapped values
… in Avro parser - add strip_confluent_header to detect and remove 5-byte Confluent prefix - retry Avro parsing after stripping prefix when initial parse fails - add tests for Confluent header stripping and prefixed record parsing 🔧 - Generated by Copilot
WilliamBerryiii
approved these changes
Apr 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The
avro-to-jsonoperator had two encoding-related failures preventing it from processing Avro messages in production.JSON-string-encoded schema configuration
The operator failed to initialize when the AIO pipeline runtime delivered the
avroSchemaconfiguration value as a JSON-string-wrapped object rather than a raw JSON object. The operator loggedFailed to parse provided Avro schema: Invalid schema nameand returnedfalsefromavro_init, preventing the dataflow graph from loading.A new
unwrap_schema_valuefunction detects and transparently unwraps JSON-string-encoded configuration values before passing them to the schema parser.Confluent Schema Registry wire format prefix
After the schema initialization fix, the operator failed at runtime with
Cannot convert i64 to usize: -56when parsing Avro messages from Kafka. The messages included a 5-byte Confluent Schema Registry wire format prefix (magic byte0x00+ 4-byte schema ID) before the actual Avro payload. The Avro parser interpreted the prefix bytes as part of the record data, producing a negative zigzag-decoded length.A new
strip_confluent_headerfunction detects the prefix, andparse_with_schemanow retries parsing after stripping the prefix when the initial attempt fails.Related Issue
Fixes #335
Type of Change
Implementation Details
All changes are in src/500-application/512-avro-to-json/operators/avro-to-json/src/lib.rs.
Schema configuration unwrapping
Extracted
unwrap_schema_value(raw: &str) -> Cow<'_, str>which:serde_json::from_str::<serde_json::Value>Value::String(JSON-string-encoded schema)Cow<'_, str>to avoid heap allocation when no unwrapping is neededThe
avro_initfunction callsunwrap_schema_value(v)beforeSchema::parse_str().Confluent wire format handling
Added
strip_confluent_header(data: &[u8]) -> Option<&[u8]>which detects the Confluent Schema Registry wire format prefix (magic byte0x00+ 4-byte big-endian schema ID) and returns the payload after the 5-byte header.Rewrote
parse_with_schemato attempt raw Avro parsing first. On failure, if a Confluent header is detected, it retries parsing with the prefix stripped. This handles both raw Avro and Confluent-prefixed messages transparently.Testing Performed
Thirteen tests were added across both fixes:
unwrap_schema_valueunit tests (4): raw JSON object passthrough, JSON-string-encoded unwrap, primitive schema unwrap, non-JSON passthroughschema_parse_json_string_encoded_fails_without_unwrapconfirmsSchema::parse_strfails on double-encoded input without the fixAll 41 tests pass (28 existing + 13 new) via
cargo test --target x86_64-unknown-linux-gnu.Validation Steps
cd src/500-application/512-avro-to-json/operators/avro-to-json && cargo test --target x86_64-unknown-linux-gnucargo build --releaseto confirm the WASM binary compiles cleanlyavroSchemaconfiguration to confirm end-to-end processing succeedsChecklist
terraform fmton all Terraform codeterraform validateon all Terraform codeaz bicep formaton all Bicep codeaz bicep buildto validate all Bicep codeSecurity Review
Additional Notes
serde_jsonwas already present in Cargo.toml