How does sourceapp handle tied matches?

SourceApp is designed to do competitive read mapping. That is, we should only be counting each read's best alignment *if* that alignment is above user-specified criteria (e.g., percent identity, etc.). Although not necessarily super common, tied matches occur when a read has multiple best-scoring matches to multiple subjects. 

This isn't a huge deal when the subject sequences are simply different contigs (or regions of the same) belonging to the same genome or even different genomes but belonging to the same source category. It's not quite the same when there are instances of tied matches belonging to genomes across *different* source categories. What should we do about that? Right now, we have `--remove-crx` as a step in `sourceapp_build.py` which serves a sort of stopgap for this issue (the idea being, if we remove genomes belonging to the same cluster, then this is less likely to occur).

Either way, depending on the read mapper used, primary alignments are *usually* just selected at random when there are ties like this. Should we handle this differently, perhaps retaining information on tied matches and creating some sort of error bounds from this (e.g., 2% +/-0.2%)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does sourceapp handle tied matches? #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How does sourceapp handle tied matches? #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions