-
Notifications
You must be signed in to change notification settings - Fork 1
Description
SourceApp is designed to do competitive read mapping. That is, we should only be counting each read's best alignment if that alignment is above user-specified criteria (e.g., percent identity, etc.). Although not necessarily super common, tied matches occur when a read has multiple best-scoring matches to multiple subjects.
This isn't a huge deal when the subject sequences are simply different contigs (or regions of the same) belonging to the same genome or even different genomes but belonging to the same source category. It's not quite the same when there are instances of tied matches belonging to genomes across different source categories. What should we do about that? Right now, we have --remove-crx as a step in sourceapp_build.py which serves a sort of stopgap for this issue (the idea being, if we remove genomes belonging to the same cluster, then this is less likely to occur).
Either way, depending on the read mapper used, primary alignments are usually just selected at random when there are ties like this. Should we handle this differently, perhaps retaining information on tied matches and creating some sort of error bounds from this (e.g., 2% +/-0.2%)?