Skip to content

cram: Bring reader and writer to htslib parity for CRAM 2.0–4.0#380

Draft
nh13 wants to merge 1 commit intozaeleus:masterfrom
nh13:nh/cram-version-parity
Draft

cram: Bring reader and writer to htslib parity for CRAM 2.0–4.0#380
nh13 wants to merge 1 commit intozaeleus:masterfrom
nh13:nh/cram-version-parity

Conversation

@nh13
Copy link
Copy Markdown
Contributor

@nh13 nh13 commented Mar 2, 2026

Summary

Full read/write support for CRAM versions 2.0, 2.1, 3.0, 3.1, and 4.0, reaching feature parity with htslib/samtools. See #374 for upstream context.

Reader

  • Legacy codec support (beta, gamma, Golomb, Huffman, subexp) for v2.x
  • CRAM 4.0 EOF detection, slice header encoding, and codec offsets
  • Async reader parity across all versions
  • Overflow protection for VLQ integer readers
  • Fix ref_seq_id encoding and codec argument signedness

Writer

  • Version-gated CRC32 checksums and qs_seq_orient encoding
  • fqzcomp HAVE_QMAP, HAVE_QTAB, MULTI_PARAM encoder features
  • Owned tag data and reference-free mode support
  • Name tokenizer tok_dup and fqzcomp DO_REV support

API additions

  • Reader::query_unmapped() and Reader::for_each_record()
  • IndexedReader delegation of the above methods

Test infrastructure

  • Programmatic in-memory test data creation (no on-disk test files)
  • Integration tests for round-trip, reader methods, query APIs, and writer options across all CRAM versions

Files changed

97 files, +8285 / -1351 lines

@nh13
Copy link
Copy Markdown
Contributor Author

nh13 commented Mar 2, 2026

Moved from nh13#1.

@nh13 nh13 force-pushed the nh/cram-version-parity branch 2 times, most recently from a5c042b to d60c67f Compare March 2, 2026 00:18
Add full read/write support for CRAM versions 2.0, 2.1, 3.0, 3.1, and
4.0, reaching feature parity with htslib/samtools. This includes:

Reader changes:
- Legacy codec support (beta, gamma, Golomb, Huffman, subexp) for v2.x
- CRAM 4.0 EOF detection, slice header encoding, and codec offsets
- Async reader parity across all versions
- Overflow protection for VLQ integer readers
- Fix ref_seq_id encoding and codec argument signedness

Writer changes:
- Version-gated CRC32 checksums and qs_seq_orient encoding
- fqzcomp HAVE_QMAP, HAVE_QTAB, MULTI_PARAM encoder features
- Owned tag data and reference-free mode support
- Name tokenizer tok_dup and fqzcomp DO_REV support

API additions:
- Reader::query_unmapped() and Reader::for_each_record()
- IndexedReader delegation of the above methods

Test infrastructure:
- Programmatic in-memory test data creation (no on-disk test files)
- Integration tests for round-trip, reader methods, query APIs, and
  writer options across all CRAM versions
@nh13 nh13 force-pushed the nh/cram-version-parity branch from d60c67f to 2155f74 Compare March 2, 2026 16:08
@nh13 nh13 marked this pull request as ready for review March 2, 2026 16:25
@nh13 nh13 marked this pull request as draft March 26, 2026 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant