Skip to content

Incorrect 3' end coordinates imported using bamUtils::read.bam #11

@mlweilert

Description

@mlweilert

To our knowledge, bamUtils doesn't correctly import the 3' of a .bam file due to 0-based .bam files not correctly being converted to a 1-based GRanges object.

The start of any read imported through bamUtils is correct, but the base of the end of the read is off by one base upstream. This is true no matter the strand orientation of the aligned read. Reads seem to be imported from 5' to 3' even if they are mapped to the reverse strand of the genome. The orientation is stored in the flag. So when a read maps on the reverse strand, it takes the end coordinate, which still is off by one base. When a read map on the sense strand, it takes the start which is correct.

We would suggest re-checking the conversion of 0-based .bam coordinates to 1-based GRanges coordinates. In the case of our applications looking transcription initiation sites, this could be a meaningful bug in many applications.

Below is a sample of the code we used for bamUtils as well as a screenshot showing the result of using an aligned .bam file, applying bamUtils:read.bam, and the resulting .bw file, which is off by one base.

reads <- bamUtils::read.bam(bam = opt$input_bamfile, 
                              intervals = chrom_sizes_gr[seqnames(chrom_sizes_gr) == chromosome],
                              bai = paste(opt$input_bamfile, ".bai", sep = ""), 
                              pairs.grl = FALSE,
                              stripstrand = TRUE,
                              what = scanBamWhat(),
                              unpack.flag = FALSE,
                              verbose = FALSE,
                              tag = NULL,
                              isPaired = NA,
                              isProperPair = NA,
                              isUnmappedQuery = NA,
                              hasUnmappedMate = NA,
                              isNotPassingQualityControls = FALSE,
                              isDuplicate = NA,
                              pairs.grl.split = FALSE,
                              ignore.indels = TRUE, all = TRUE)
  reads <- GenomicRanges::resize(x = reads, width = 1, fix = 'start')

#Exported the file to coverage object and .bigwig after this

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions