-
Notifications
You must be signed in to change notification settings - Fork 125
Concurrent Decoder #589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Concurrent Decoder #589
Conversation
… related reader modifications.
This reverts commit 2bffcc1.
| writer := avro.NewWriter(w, 512, avro.WithWriterConfig(cfg.EncodingConfig)) | ||
| writer := avro.NewWriter(w, 512, avro.WithWriterConfig(avro.DefaultConfig)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix bug #590
| writer := avro.NewWriter(w, 512, avro.WithWriterConfig(cfg.EncodingConfig)) | ||
| writer := avro.NewWriter(w, 512, avro.WithWriterConfig(avro.DefaultConfig)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same above
| func newDecoderConfig(opts ...DecoderFunc) *decoderConfig { | ||
| cfg := decoderConfig{ | ||
| DecoderConfig: avro.DefaultConfig, | ||
| SchemaCache: avro.DefaultSchemaCache, | ||
| CodecOptions: codecOptions{ | ||
| DeflateCompressionLevel: flate.DefaultCompression, | ||
| }, | ||
| } | ||
| for _, opt := range opts { | ||
| opt(&cfg) | ||
| } | ||
| return &cfg | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure that this change helped anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoiding duplicate code.
| need := min(r.tail-r.head, tokenLen-1) | ||
|
|
||
| // Construct boundary window: stash + beginning of new buffer | ||
| boundary := make([]byte, len(stash)+need) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a known size, allocate once and reuse instead of constantly re-allocating.
| copy(boundary, stash) | ||
| copy(boundary[len(stash):], r.buf[r.head:r.head+need]) | ||
|
|
||
| if idx := bytes.Index(boundary, token); idx >= 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, surely the reader has advanced too far, as the start of the token is no longer in the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a case when the token extrapolate the buffer, só a bigger buffer is needed
| data: []byte{0x38, 0x36}, | ||
| }, | ||
| { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Auto format
| data: []byte{0x38, 0x36}, | ||
| }, | ||
| { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Auto format
Goal of this PR
Expose Reader Offset and OCF Block Status for Concurrent Decoder
Description
This PR enhances the
ReaderandOCF Decoderby exposing internal state information that is critical for advanced processing scenarios, such as progress tracking, concurrently splitting, and debugging.Key Changes
Reader.InputOffset(): Adds a method to theReaderto retrieve the current input offset. This allows consumers to know exactly where in the underlying stream the reader is currently positioned.OCF Decoder.BlockStatus(): Introduces aBlockStatus()method (and corresponding struct) to the OCF Decoder. This provides a snapshot of the current block being processed, including:Current: The index of the current record within the block.Count: The total number of records in the current block.Size: The size (in bytes) of the current block.Offset: The input offset provided by the underlying reader.Motivation
Currently, the
avropackage abstracts away the underlying stream position and block details. While this is fine for simple reading, it limits users who need to:Use Case Example
A data processing pipeline can now use
BlockStatus()to log precise progress or checkpoint processing at specific block offsets, improving reliability and observability.