Skip to content

feat(bam): add raw byte access for bam::Record#373

Draft
nh13 wants to merge 1 commit intozaeleus:masterfrom
nh13:feat/bam-record-raw-bytes
Draft

feat(bam): add raw byte access for bam::Record#373
nh13 wants to merge 1 commit intozaeleus:masterfrom
nh13:feat/bam-record-raw-bytes

Conversation

@nh13
Copy link
Copy Markdown
Contributor

@nh13 nh13 commented Feb 11, 2026

Summary

Add AsRef<[u8]>, TryFrom<Vec<u8>>, and into_inner() to bam::Record, enabling direct
access to the underlying BAM record buffer (without the leading 4-byte block size).

  • AsRef<[u8]> — borrow the raw BAM bytes
  • TryFrom<Vec<u8>> — construct a Record from raw bytes (validates and indexes the buffer)
  • into_inner(self) -> Vec<u8> — consume the record and extract the byte buffer

Motivation

Tools like fgumi need to work with raw BAM record
bytes for performance-critical pipelines (e.g. parallel BGZF block processing, custom
serialization) while still being able to use the alignment::Record trait for field access.
Currently there is no public way to get bytes out of or put bytes into a bam::Record.

This pairs well with the codec encode/decode functions (#364) to give a complete raw-bytes
workflow:

// Encode any Record to raw BAM bytes
let mut buf = Vec::new();
bam::record::codec::encode(&mut buf, &header, &record)?;

// Wrap in a bam::Record — get alignment::Record trait for free
let bam_record = bam::Record::try_from(buf)?;

// Borrow or extract the bytes
let bytes: &[u8] = bam_record.as_ref();
let owned: Vec<u8> = bam_record.into_inner();

Why raw byte access rather than upstreaming all operations?

Tools like fgumi perform three categories of operations on BAM record bytes
that go beyond noodles' current (and appropriate) scope:

  1. In-place mutation (tag append/update/remove, base/quality modification,
    flag updates) — bam::Record is intentionally read-only, and a full
    mutation API would be a significant design change
  2. Application-specific algorithms (overlap detection, template-coordinate
    sorting, virtual hard clipping) — domain logic that doesn't belong in a
    format library
  3. Batched/hot-path accessors (multi-tag extraction in a single pass,
    zero-allocation comparators) — performance-tuned for specific workflows

Raw byte access via AsRef/TryFrom/into_inner is the minimal API that
lets tools like fgumi leverage noodles for I/O and the alignment::Record
trait while performing these operations directly on the buffer.

Test plan

  • All existing noodles-bam tests pass (128 unit + 67 doc-tests)
  • Doc-tests cover into_inner() and TryFrom<Vec<u8>> round-trip

Add AsRef<[u8]>, TryFrom<Vec<u8>>, and into_inner() to bam::Record,
enabling direct access to the underlying BAM record bytes without the
leading 4-byte block size. This supports tools like fgumi that need
efficient raw byte manipulation while retaining the alignment::Record
trait implementation.
@zaeleus zaeleus added the bam label Feb 13, 2026
@nh13 nh13 marked this pull request as draft March 26, 2026 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants