-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Dear,
We are analyzing nematode full 18S data (Nanopore reads) and are stumbling into some peculiar results. We use a curated database which only contains 18S nematode sequences with a few 'outgroups' (i.e. non-nematode).
In our soil nematode data, we find a lot of reads of a certain marine worm, which is not possible. The reference of this marine worm is correct in our database (checked).
When I run emu with outputting the read assignment distributions, I find the reads that have a score of "1.0" for this marine worm.
When I filter this read from the emu input data, and use BLAST to identify it, it shows it is actually some insect which is not in our curated emu reference database. We found out later that there indeed could be some insect DNA in this set of samples.
Then I aligned (in Geneious Prime) this read (which had a 1.0 prob. of the marine worm) with the reference read of that marine worm, the alignment is actually quite poor with a similarity of 77%.
I'm a bit worried by this result, as I don't understand why Emu (minimap2) would still not put this read in the "unclassified" as it is clearly not a taxon from our database. I think this happens because we have a limited database and Emu is trying to force it into a taxon, eventhough it is not correct. Is there a way to stop this forcing?
Do you know why this read is not classified as "unclassified/unassigned" and if there is a way to add some threshold parameter to filter poor aligned reads?