feat: add read_tabix and read_bam_references convenience functions#171
feat: add read_tabix and read_bam_references convenience functions#171pkerpedjiev wants to merge 5 commits intoabdenlab:mainfrom
Conversation
- read_tabix: queries a BGZF tabix-indexed file for records in a genomic region, returning Arrow IPC bytes with chrom/start/end/raw columns. Accepts file paths and file-like objects; index can be a .tbi/.csi file path or file-like. - read_bam_references: reads reference sequence names and lengths from a BAM file header, returning Arrow IPC bytes with name/length columns. Useful for building chromsizes without scanning the full file. These functions maintain backward compatibility with clients that relied on the same API in earlier oxbow versions (e.g. HiGlass/clodius). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cf35486 to
db333a7
Compare
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks, @pkerpedjiev ! For the bam references, doesn't this do the job already? The same property exists for all data sources. chromsizes = ox.from_bam(...).chrom_sizesFor tabix/CSI-indexed files, the BED datasource already does what you need ( DataSources are the preferred API for oxbow, returning iterators that expose record batches from Rust to Python with zero copy. If I were to add something, I do think that we need a more generic BED-like TSV reader for the tabix use case where chrom, start, end are not the first 3 fields, as tabix allows this. |
|
Yeah, I think I can use both of those. I'll try to update the clodius PR with those changes and reopen and modify this if it doesn't work. |
read_tabix: queries a BGZF tabix-indexed file for records in a genomic region, returning Arrow IPC bytes with chrom/start/end/raw columns. Accepts file paths and file-like objects; index can be a .tbi/.csi file path or file-like.
read_bam_references: reads reference sequence names and lengths from a BAM file header, returning Arrow IPC bytes with name/length columns. Useful for building chromsizes without scanning the full file.
These functions maintain backward compatibility with clients that relied on the same API in earlier oxbow versions (e.g. HiGlass/clodius).