Release py-oxbow@v0.7.0 · abdenlab/oxbow

New features

New selection semantics (None, list, "*") in #172

All DataSource constructors now accept the value "*" for all field declaration parameters (referring to all standard fields, all info/format fields in a header, all samples in a header, etc.) in addition to a list or None (which now means: "omit entirely"). Previously, None was used as the "all fields" sentinel, which was ambiguous. Parameter defaults have been updated to reflect these new semantics, keeping the same defaults, except for those listed below.

Customizable BED schemas for BED and BBI files in #169

Support for fully custom BED schemas (field name and type definitions) from a tuple of (str, dict[str, str]), where the first item is a bed3-12 string specifier for the initial standard fields, and the second item is a dictionary of field names to type names for the remaining fields, parsed using an AutoSql-inspired type system with additional Rust numeric type aliases. This enables programmatic schema construction for formats like narrowPeak. The autoSql-based type system is now shared and harmonized across the BED and BBI models for extended BED fields, but standard BED fields are interpreted using format-native (BigBed) or spec-compliant (BED) types.

Nested samples table in VCF/BCF DataSources in #170

VcfFile/from_vcf and BcfFile/from_bcf gain a samples_nested boolean parameter. When true, all sample genotype data is emitted as a single "samples" struct column rather than N top-level per-sample or per-field columns. This makes it straightforward to treat genotype data as an atomic projection unit. The default is false, preserving existing behavior. Resolves #167

API changes

Tag and attribute discovery is no longer automatic (breaking)

Previously, alignment and annotation file constructors would scan an initial number of records to discover tag/attribute definitions and include them in the schema by default. This auto-discovery has been removed. Tag and attribute definition and discovery is now opt-in.

tag_scan_rows parameter removed from SamFile/from_sam, BamFile/from_bam, CramFile/from_cram.
attribute_scan_rows parameter removed from GtfFile/from_gtf, GffFile/from_gff.
tag_defs and attribute_defs now default to None, which omits the "tags" / "attributes" column entirely
Use the new with_tags() and with_attributes() builder methods (below) to opt in. (Recommended)

Sample genotype data is no longer projected by default (breaking)

from_vcf and from_bcf previously defaulted to projecting all samples defined in the header, including all sample genotype columns. The default is now samples=None, omitting genotype data entirely.
Use the new with_samples() builder method (below) to opt in. (Recommended)

New builder methods for tags, attributes and samples

`with_tags()` — opt-in tag discovery for alignment files

df = ox.from_bam("sample.bam").with_tags().pl()

Call with_tags() on any SamFile, BamFile, or CramFile to discover tag definitions by scanning an initial number of records. Pass explicit definitions to skip discovery:

ds = ox.from_bam("sample.bam").with_tags([("NM", "i"), ("MD", "Z")])

The scan_rows keyword argument controls how many records are scanned (default: 1024; pass -1 to scan the whole file).

`with_attributes()` — opt-in attribute discovery for annotation files

df = ox.from_gff("sample.gff").with_attributes().pl()

Same pattern as with_tags(), for GtfFile and GffFile. The scan_rows keyword argument is also supported.

`with_samples()` — nested sample genotype data for variant files

Calling with_samples() on a VcfFile or BcfFile includes all sample genotype data nested under a single "samples" struct column. Accepts optional samples, genotype_fields, and group_by arguments:

df = ox.from_vcf("sample.vcf.gz").with_samples().pl()
df.unnest("samples")

ds = (
    ox.from_vcf("sample.vcf.gz")
    .with_samples(["NA12891", "NA12892"], genotype_fields=["GT", "DP"], group_by="field")
)

Full Changelog: https://github.com/abdenlab/oxbow/compare/py-oxbow@v0.6.0...py-oxbow@v0.7.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

py-oxbow@v0.7.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

New features

New selection semantics (None, list, "*") in #172

Customizable BED schemas for BED and BBI files in #169

Nested samples table in VCF/BCF DataSources in #170

API changes

Tag and attribute discovery is no longer automatic (breaking)

Sample genotype data is no longer projected by default (breaking)

New builder methods for tags, attributes and samples

`with_tags()` — opt-in tag discovery for alignment files

`with_attributes()` — opt-in attribute discovery for annotation files

`with_samples()` — nested sample genotype data for variant files

Uh oh!

py-oxbow@v0.7.0

New features

New selection semantics (None, list, "*") in #172

Customizable BED schemas for BED and BBI files in #169

Nested samples table in VCF/BCF DataSources in #170

API changes

Tag and attribute discovery is no longer automatic (breaking)

Sample genotype data is no longer projected by default (breaking)

New builder methods for tags, attributes and samples

with_tags() — opt-in tag discovery for alignment files

with_attributes() — opt-in attribute discovery for annotation files

with_samples() — nested sample genotype data for variant files

Uh oh!

`with_tags()` — opt-in tag discovery for alignment files

`with_attributes()` — opt-in attribute discovery for annotation files

`with_samples()` — nested sample genotype data for variant files