Skip to content

Multi-mapping reads filtering support in countNonSplicedReads #91

@mincej

Description

@mincej

In the documentation from FRASER countRNAData, there is the following parameters:

...  Further parameters passed on to Rsubread::featureCounts.

However, when running with the parameters:

reads_onerun <- countRNAData(
    sampleID ="sample1",
    fds = fds, 
    NcpuPerSample = 8,
    recount = TRUE,
    keepNonStandardChromosomes = TRUE,
    genome = c("BSgenome.Hsapiens.UCSC.hg38"),
    countMultiMappingReads = FALSE
)

an error is returned:

Error in getNonSplitReadCountsForAllSamples(fds = fds, splitCountRanges = splitCountRanges,  : 
  unused arguments (sampleID = "sample1", countMultiMappingReads = TRUE)

In the nonspliced read counts function, the countMultiMappingReads parameter is hardcoded to FALSE.

FRASER/R/countRNAseqData.R

Lines 904 to 925 in ee6f279

rsubreadCounts <- featureCounts(files=bamFile, annot.ext=anno,
minOverlap=minAnchor*2,
allowMultiOverlap=TRUE,
checkFragLength=FALSE,
minMQS=bamMapqFilter(scanBamParam(fds)),
strandSpecific=strand,
# activating long read mode
isLongRead=longRead,
# multi-mapping reads
countMultiMappingReads=TRUE,
# unstranded case: for counting only non spliced reads we
# skip this information
isPairedEnd=isPairedEnd,
# sorting only needed for paired-end reads
autosort=doAutosort,
nthreads=NcpuPerSample,
tmpDir=file.path(file_path_as_absolute(workingDir(fds)), "cache")
)

This is different than countSplicedReads, which uses GenomicAlignment functions for counting which can use ScanBamParams and tagFilters for filtering bam files.

FRASER/R/countRNAseqData.R

Lines 534 to 537 in ee6f279

countsList <- bplapply(chromosomes, FUN=countSplitReadsPerChromosome,
bamFile=bamfile, pairedEnd=pairedend, genome=genome,
strandMode=strandmode, scanBamParam=scanbamparam,
BPPARAM=getBPParam(NcpuPerSample, length(chromosomes)))

FRASER/R/countRNAseqData.R

Lines 567 to 569 in ee6f279

galignment <- readGAlignmentPairs(
bamFile, param=param, strandMode=strandMode)
}

FRASER/R/countRNAseqData.R

Lines 613 to 614 in ee6f279

jc <- summarizeJunctions(galignment, genome=genome,
with.revmap=(as.logical(strandMode) && pairedEnd) )

What is the reason that you cannot alter the multimapping count parameters for nonspliced reads, but can for spliced reads? Or am I misunderstanding how these functions work in tandem, and that the filtering for multimapping reads is done by extension when defining the spliced ranges resulting from countSplicedReads and needed as input for countNonSplicedReads?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions