perf(fasta,fastq): high-throughput parsing optimizations#385
Draft
nh13 wants to merge 3 commits intozaeleus:masterfrom
Draft
perf(fasta,fastq): high-throughput parsing optimizations#385nh13 wants to merge 3 commits intozaeleus:masterfrom
nh13 wants to merge 3 commits intozaeleus:masterfrom
Conversation
Add benchmarks measuring record iteration throughput with synthetically generated data. Compares noodles (read_record, records iterator, builder) against needletail and helicase across short reads (150bp) and long reads (10Kbp) for FASTQ, and multi-line/single-line sequences for FASTA.
Add bulk record parsing to read_record(): try_read_record_bulk() finds all 4 newlines in the current BufRead buffer in a single pass. When the full record fits, this avoids 4 separate BufRead operations. Falls back to the existing line-by-line path when records span the buffer boundary. Add a Reader Builder with 256 KiB default buffer capacity, ensuring most records fit in a single buffer fill for the bulk path. Add an internal ChunkReader for zero-copy access, processing data in large blocks and returning borrowed slices into the internal buffer.
Reuse sequence buffer across Records iterations: use std::mem::replace to move the filled buffer into Sequence and replace it with a pre-allocated Vec, eliminating repeated reallocations for records of similar length. Optimize consume_empty_lines() to scan contiguous newline bytes in bulk with a single fill_buf() call, instead of two calls per newline. Add an internal ChunkReader for zero-copy access, processing data in large blocks and finding record boundaries with memchr.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Optimize FASTA and FASTQ record parsing throughput with zero new dependencies.
FASTQ:
read_record(): find all 4 newlines in a singlememchrpass when the record fits in theBufReadbuffer, avoiding 4 separateBufReadoperations per record. Falls back to the existing line-by-line path for records spanning buffer boundaries.ReaderBuilderwith 256 KiB default buffer capacity to maximize the bulk parsing hit rate.FASTA:
Recordsiterations viastd::mem::replace, eliminating repeatedVecgrowth from zero for similar-length records.consume_empty_lines()to count contiguous\r/\nbytes in a singlefill_buf()call instead of two calls per newline character.Benchmarks
Real data, from disk:
In-memory throughput (Criterion):
Test plan
cargo test -p noodles-fastq -p noodles-fasta)cargo bench -p noodles-fastq --bench reader/cargo bench -p noodles-fasta --bench reader)Reader::new(),read_record(), andrecords()work identicallyBufReader::with_capacity(8, ...))@/+, empty sequences, missing trailing newline