Skip to content

perf(fasta,fastq): high-throughput parsing optimizations#385

Draft
nh13 wants to merge 3 commits intozaeleus:masterfrom
nh13:nh/fasta-fastq-benchmarks
Draft

perf(fasta,fastq): high-throughput parsing optimizations#385
nh13 wants to merge 3 commits intozaeleus:masterfrom
nh13:nh/fasta-fastq-benchmarks

Conversation

@nh13
Copy link
Copy Markdown
Contributor

@nh13 nh13 commented Mar 23, 2026

Summary

Optimize FASTA and FASTQ record parsing throughput with zero new dependencies.

FASTQ:

  • Add bulk newline scanning to read_record(): find all 4 newlines in a single memchr pass when the record fits in the BufRead buffer, avoiding 4 separate BufRead operations per record. Falls back to the existing line-by-line path for records spanning buffer boundaries.
  • Add a Reader Builder with 256 KiB default buffer capacity to maximize the bulk parsing hit rate.

FASTA:

  • Reuse sequence buffer across Records iterations via std::mem::replace, eliminating repeated Vec growth from zero for similar-length records.
  • Optimize consume_empty_lines() to count contiguous \r/\n bytes in a single fill_buf() call instead of two calls per newline character.

Benchmarks

Real data, from disk:

Test Before After Speedup
FASTQ 176 MB short reads (default 8K buf) 88ms 75ms 1.17x
FASTQ 176 MB short reads (Builder 256K buf) 88ms 63ms 1.40x
FASTA 3.0 GB GRCh38 reference 1.83s 1.55s 1.18x

In-memory throughput (Criterion):

Test Before After
FASTQ read_record 150bp × 100K 4.4 GiB/s 6.0 GiB/s
FASTQ Builder 150bp × 100K 6.1 GiB/s
FASTA records() 1Kbp × 10K w80 1.6 GiB/s 2.0 GiB/s
FASTA records() single-line 1Kbp × 10K 2.0 GiB/s 6.0 GiB/s

Test plan

  • All existing tests pass (cargo test -p noodles-fastq -p noodles-fasta)
  • Benchmark suite runs (cargo bench -p noodles-fastq --bench reader / cargo bench -p noodles-fasta --bench reader)
  • No public API changes to existing types — Reader::new(), read_record(), and records() work identically
  • Bulk path correctly falls back for records spanning buffer boundaries (tested with BufReader::with_capacity(8, ...))
  • Edge cases: CRLF, quality containing @/+, empty sequences, missing trailing newline

nh13 added 3 commits March 22, 2026 22:12
Add benchmarks measuring record iteration throughput with synthetically
generated data. Compares noodles (read_record, records iterator, builder)
against needletail and helicase across short reads (150bp) and long reads
(10Kbp) for FASTQ, and multi-line/single-line sequences for FASTA.
Add bulk record parsing to read_record(): try_read_record_bulk() finds
all 4 newlines in the current BufRead buffer in a single pass. When the
full record fits, this avoids 4 separate BufRead operations. Falls back
to the existing line-by-line path when records span the buffer boundary.

Add a Reader Builder with 256 KiB default buffer capacity, ensuring
most records fit in a single buffer fill for the bulk path.

Add an internal ChunkReader for zero-copy access, processing data in
large blocks and returning borrowed slices into the internal buffer.
Reuse sequence buffer across Records iterations: use std::mem::replace
to move the filled buffer into Sequence and replace it with a
pre-allocated Vec, eliminating repeated reallocations for records of
similar length.

Optimize consume_empty_lines() to scan contiguous newline bytes in bulk
with a single fill_buf() call, instead of two calls per newline.

Add an internal ChunkReader for zero-copy access, processing data in
large blocks and finding record boundaries with memchr.
@nh13 nh13 marked this pull request as draft March 26, 2026 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant