Skip to content

incorrect hits with bad alignment still gives "1.0" read assignment probability #74

@Robvh-git

Description

@Robvh-git

Dear,

We are analyzing nematode full 18S data (Nanopore reads) and are stumbling into some peculiar results. We use a curated database which only contains 18S nematode sequences with a few 'outgroups' (i.e. non-nematode).

In our soil nematode data, we find a lot of reads of a certain marine worm, which is not possible. The reference of this marine worm is correct in our database (checked).

When I run emu with outputting the read assignment distributions, I find the reads that have a score of "1.0" for this marine worm.

When I filter this read from the emu input data, and use BLAST to identify it, it shows it is actually some insect which is not in our curated emu reference database. We found out later that there indeed could be some insect DNA in this set of samples.

Then I aligned (in Geneious Prime) this read (which had a 1.0 prob. of the marine worm) with the reference read of that marine worm, the alignment is actually quite poor with a similarity of 77%.

I'm a bit worried by this result, as I don't understand why Emu (minimap2) would still not put this read in the "unclassified" as it is clearly not a taxon from our database. I think this happens because we have a limited database and Emu is trying to force it into a taxon, eventhough it is not correct. Is there a way to stop this forcing?

Do you know why this read is not classified as "unclassified/unassigned" and if there is a way to add some threshold parameter to filter poor aligned reads?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions