Describe the bug
There are two separate issues:
Bug 1 – Reader nullable union wrapping breaks decoding of plain writer fields
When a writer produces Avro records with plain (non-nullable) field types, but the reader schema wraps those same fields in ["null", T] unions the decoder will misread the data. Because the writer never emits a union branch index byte, but the decoder expects one, it falls out of sync with the byte stream. The result is garbage field values for every record after the first that is affected.
Bug 2 – Skipper omits writer-only fields when the writer schema uses named type references
When the writer schema uses Avro named type references (e.g., "type": "Timestamp" after Timestamp has been defined once), and the reader schema requests fewer fields than the writer wrote (either by narrowing a nested record or omitting a field entirely), the Skipper uses the wrong field list. It builds its skip plan from the reader's narrowed view of the type rather than the writer's full definition. As a result, it does not consume all the bytes the writer emitted for those fields, leaving the buffer out of sync. Every subsequent record is then decoded from the wrong byte offset, producing corrupted values.
Errors reported:
READ ERROR after 0 rows: Avro error: Parser error: offset overflow reading avro bytes
READ ERROR after 0 rows: Avro error: EOF: Unexpected EOF reading bytes
To Reproduce
See the unit tests from #9605
Expected behavior
Correct decoding
Describe the bug
There are two separate issues:
Bug 1 – Reader nullable union wrapping breaks decoding of plain writer fields
When a writer produces Avro records with plain (non-nullable) field types, but the reader schema wraps those same fields in ["null", T] unions the decoder will misread the data. Because the writer never emits a union branch index byte, but the decoder expects one, it falls out of sync with the byte stream. The result is garbage field values for every record after the first that is affected.
Bug 2 – Skipper omits writer-only fields when the writer schema uses named type references
When the writer schema uses Avro named type references (e.g., "type": "Timestamp" after Timestamp has been defined once), and the reader schema requests fewer fields than the writer wrote (either by narrowing a nested record or omitting a field entirely), the Skipper uses the wrong field list. It builds its skip plan from the reader's narrowed view of the type rather than the writer's full definition. As a result, it does not consume all the bytes the writer emitted for those fields, leaving the buffer out of sync. Every subsequent record is then decoded from the wrong byte offset, producing corrupted values.
Errors reported:
To Reproduce
See the unit tests from #9605
Expected behavior
Correct decoding