Skip to content

suggested API #5

@timodonnell

Description

@timodonnell

@endrebak asked for suggestions on a possible API to make this a more general purpose read-my-bam tool.

My thought is a read_bam function that returns a pandas.DataFrame (similar to what is already here) but that

  1. Reads all alignments and all fields by default (including unmapped reads)
  2. Supports subselecting the fields (columns) being read for efficiency using a parameter, say, fields. For example fields=["Chromosome", "Start", "End", "Strand"] would only read in the specified columns and return a DataFrame with only those columns. Similar to usecols in pandas.read_csv.
  3. Supports subselecting the alignments (rows) being read to specified regions (and uses the BAM index for doing this). E.g. regions=[("chr1", 100, 10000)] would subselect to chr1:100-10000.
  4. Supports subselecting the alignments (rows) being read according to the BAM record flags. I think adding particular parameters for each of these would be the most user friendly. E.g. only_mapped=True would be the equivalent of passing -F 4 to samtools. I think really helpful to use named parameters here rather than making the user do bit arithmetic with binary flag codes. Basically implement this as named arguments.
  5. Has a max_alignments argument so the user can read just the first 10 records by passing max_alignments=10

I think one function that implements this would handle the majority of my use cases for reading BAMs in Python, and provide a much simpler API to get started with and use than pysam

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions