Issue
Recently, we came across a sample that Emu calls as containing soil bacteria as well as Orientia tsutsugamushi. We suspected contamination as neither of the finds seemed to make sense for the sample, but wanted to be sure we weren't missing a genuine call for the latter, so I cross-checked results by clustering reads and blasting consensus sequences against NCBI's 16S database and the core_nt database. I found a cluster with no hits at all in the former, and a match to mitochondrial sequences from Albugo laibachii and other fungi in the latter; running this cluster alone through Emu with the standard database results in O. tsutsugamushi and related organisms being called. With the RDP database, the result is Phytophthora infestans instead. I've tested with both Emu v 3.4.4 and 3.5.4, and the issue appears in both versions.
Steps to reproduce
-
Download cluster sequences (I've had to slap a .txt on so Github would let me upload):
cluster3.txt
-
Run emu abundance with default parameters and the default Emu database