incorrect hits with bad alignment still gives "1.0" read assignment probability

Dear,

We are analyzing nematode full 18S data (Nanopore reads) and are stumbling into some peculiar results. We use a curated database which only contains 18S nematode sequences with a few 'outgroups' (i.e. non-nematode).

In our soil nematode data, we find a lot of reads of a certain marine worm, which is not possible. The reference of this marine worm is correct in our database (checked).

When I run emu with outputting the read assignment distributions, I find the reads that have a score of "1.0" for this marine worm. 

When I filter this read from the emu input data, and use BLAST to identify it, it shows it is actually some insect which is not in our curated emu reference database. We found out later that there indeed could be some insect DNA in this set of samples.

Then I aligned (in Geneious Prime) this read  (which had a 1.0 prob. of the marine worm) with the reference read of that marine worm, the alignment is actually quite poor with a similarity of 77%.

I'm a bit worried by this result, as I don't understand why Emu (minimap2) would still not put this read in the "unclassified" as it is clearly not a taxon from our database. I think this happens because we have a limited database and Emu is trying to force it into a taxon, eventhough it is not correct. Is there a way to stop this forcing?

Do you know why this read is not classified as "unclassified/unassigned" and if there is a way to add some threshold parameter to filter poor aligned reads?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorrect hits with bad alignment still gives "1.0" read assignment probability #74

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

incorrect hits with bad alignment still gives "1.0" read assignment probability #74

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions