feat(bgzf): add position tracking to MultithreadedWriter#371
Draft
nh13 wants to merge 2 commits intozaeleus:masterfrom
Draft
feat(bgzf): add position tracking to MultithreadedWriter#371nh13 wants to merge 2 commits intozaeleus:masterfrom
nh13 wants to merge 2 commits intozaeleus:masterfrom
Conversation
Add block-level position tracking to enable building BAM indexes during multi-threaded BGZF compression. This follows the htslib pattern of tracking block positions as they are written. New public API: - BlockInfo: Block completion info (block_number, compressed_start, compressed_size, uncompressed_size) - block_info_receiver(): Get receiver for block completion notifications - current_block_number(): Get next block number to be written - blocks_written(): Get number of blocks fully written - position(): Get current compressed file position - buffer_offset(): Get current uncompressed buffer offset The writer thread sends BlockInfo through an unbounded channel after each block is written, allowing callers to build indexes with accurate virtual positions without requiring per-record flushes.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds position tracking to
MultithreadedWriterto enable building BAM indexes during multi-threaded compression.Motivation
When using
MultithreadedWriterfor parallel BAM compression, there's currently no way to determine the compressed file positions needed for BAI/CSI index construction. The standardWriterallows position tracking through its synchronous API, butMultithreadedWritercompresses and writes blocks asynchronously, making position correlation difficult.This change enables building indexes during multi-threaded writes by:
Use Case
This supports the parallel BAM processing pipeline I'm building (related to #364). The workflow:
(block_number, uncompressed_offset)BlockInfonotifications when blocks are written(compressed_position, uncompressed_offset)virtual offsetsNew Public API
BlockInfo- struct withblock_number,compressed_start,compressed_size,uncompressed_sizeBlockInfoRx- type alias for the receiver channelblock_info_receiver()- get the notification receivercurrent_block_number()- next block number to be assignedblocks_written()- count of blocks fully writtenposition()- current compressed file positionbuffer_offset()- bytes in staging buffer since last flushAlternatives Considered
MultithreadedWriterprovidesTest Plan