Skip to content

Long Read Abundances and Unbinned #278

@ttubb

Description

@ttubb

I have tried using CoverM to calculate relative abundance of MAGs in a metagenome sample. After reading the documentation and inspecting results, I am wondering how suitable this is for my long read data. From the README.md

Formula: 0.02235294/0.02235294*(2/6)
Explanation: If the contig is considered a genome, then its mean coverage is 0.02235294. There is a total of 0.02235294 mean coverage across all genomes, and 2 out of 6 reads (1 out of 3 pairs) map. This coverage calculation is only available in 'genome' mode.

The MAG abundances are scaled based on the fraction of reads mapping to any MAG. Intuitively, I expected number of read bases instead of read count for this scaling (in case of a long-read dataset). I am scared of read-length-biases between assembled and unassembled fractions. Do you have any experience on whether this makes a meaningful difference? Are you open to including a flag that changes the scaling approach?

And while I am suggesting features: When analyzing datasets in our group, the total abundance of the non-binned contigs is a metric we always check and report. This can of course not be calculated from coverage, but needs to happen based on read or base count. For my current dataset I did this with a custom script after running CoverM. But of course it would be nice to provide a .fasta with unbinned contigs and get this data directly from CoverM.

Thank you for any help and of course for providing this tool,
~Tom

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions