perf(fasta,fastq): high-throughput parsing optimizations by nh13 · Pull Request #385 · zaeleus/noodles

nh13 · 2026-03-23T05:30:14Z

Summary

Optimize FASTA and FASTQ record parsing throughput with zero new dependencies.

FASTQ:

Add bulk newline scanning to read_record(): find all 4 newlines in a single memchr pass when the record fits in the BufRead buffer, avoiding 4 separate BufRead operations per record. Falls back to the existing line-by-line path for records spanning buffer boundaries.
Add a Reader Builder with 256 KiB default buffer capacity to maximize the bulk parsing hit rate.

FASTA:

Reuse sequence buffer across Records iterations via std::mem::replace, eliminating repeated Vec growth from zero for similar-length records.
Optimize consume_empty_lines() to count contiguous \r/\n bytes in a single fill_buf() call instead of two calls per newline character.

Benchmarks

Real data, from disk:

Test	Before	After	Speedup
FASTQ 176 MB short reads (default 8K buf)	88ms	75ms	1.17x
FASTQ 176 MB short reads (Builder 256K buf)	88ms	63ms	1.40x
FASTA 3.0 GB GRCh38 reference	1.83s	1.55s	1.18x

In-memory throughput (Criterion):

Test	Before	After
FASTQ read_record 150bp × 100K	4.4 GiB/s	6.0 GiB/s
FASTQ Builder 150bp × 100K	—	6.1 GiB/s
FASTA records() 1Kbp × 10K w80	1.6 GiB/s	2.0 GiB/s
FASTA records() single-line 1Kbp × 10K	2.0 GiB/s	6.0 GiB/s

Test plan

All existing tests pass (cargo test -p noodles-fastq -p noodles-fasta)
Benchmark suite runs (cargo bench -p noodles-fastq --bench reader / cargo bench -p noodles-fasta --bench reader)
No public API changes to existing types — Reader::new(), read_record(), and records() work identically
Bulk path correctly falls back for records spanning buffer boundaries (tested with BufReader::with_capacity(8, ...))
Edge cases: CRLF, quality containing @/+, empty sequences, missing trailing newline

Add benchmarks measuring record iteration throughput with synthetically generated data. Compares noodles (read_record, records iterator, builder) against needletail and helicase across short reads (150bp) and long reads (10Kbp) for FASTQ, and multi-line/single-line sequences for FASTA.

Add bulk record parsing to read_record(): try_read_record_bulk() finds all 4 newlines in the current BufRead buffer in a single pass. When the full record fits, this avoids 4 separate BufRead operations. Falls back to the existing line-by-line path when records span the buffer boundary. Add a Reader Builder with 256 KiB default buffer capacity, ensuring most records fit in a single buffer fill for the bulk path. Add an internal ChunkReader for zero-copy access, processing data in large blocks and returning borrowed slices into the internal buffer.

Reuse sequence buffer across Records iterations: use std::mem::replace to move the filled buffer into Sequence and replace it with a pre-allocated Vec, eliminating repeated reallocations for records of similar length. Optimize consume_empty_lines() to scan contiguous newline bytes in bulk with a single fill_buf() call, instead of two calls per newline. Add an internal ChunkReader for zero-copy access, processing data in large blocks and finding record boundaries with memchr.

nh13 added 3 commits March 22, 2026 22:12

nh13 marked this pull request as draft March 26, 2026 20:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(fasta,fastq): high-throughput parsing optimizations#385

perf(fasta,fastq): high-throughput parsing optimizations#385
nh13 wants to merge 3 commits intozaeleus:masterfrom
nh13:nh/fasta-fastq-benchmarks

nh13 commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nh13 commented Mar 23, 2026

Summary

Benchmarks

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant