suggested API

@endrebak asked for suggestions on a possible API to make this a more general purpose read-my-bam tool.

My thought is a `read_bam` function that returns a pandas.DataFrame (similar to what is already here) but that
1. Reads all alignments and all fields by default (including unmapped reads)
2. Supports subselecting the fields (columns) being read for efficiency using a parameter, say, `fields`. For example `fields=["Chromosome", "Start", "End", "Strand"]` would only read in the specified columns and return a DataFrame with only those columns. Similar to `usecols` in [pandas.read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html).
3. Supports subselecting the alignments (rows) being read to specified regions (and uses the BAM index for doing this). E.g. `regions=[("chr1", 100, 10000)]` would subselect to `chr1:100-10000`.
4. Supports subselecting the alignments (rows) being read according to the BAM record flags. I think adding particular parameters for each of these would be the most user friendly. E.g. `only_mapped=True` would be the equivalent of passing `-F 4` to samtools. I think really helpful to use named parameters here rather than making the user do bit arithmetic with binary flag codes. Basically implement [this](https://broadinstitute.github.io/picard/explain-flags.html) as named arguments.
5. Has a `max_alignments` argument so the user can read just the first 10 records by passing `max_alignments=10`

I think one function that implements this would handle the majority of my use cases for reading BAMs in Python, and provide a much simpler API to get started with and use than pysam


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

suggested API #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

suggested API #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions