-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hi! I'm new to Asteroid and excited to see what it can do with gappy species representation across gene trees.
I have a pipeline that is producing single-copy gene trees, running through DISCO at the step prior to Asteroid. So - per single-copy gene tree, a given species is represented by a single sequence - and the sequences in the gene tree are all orthologs based on genome clustering methods.
However, I am running the genomes through multiple phylogenetic levels or clustering tools for ortholog detection - and so a single sequence from a single species can be in multiple single-copy gene trees that would become input for Asteroid.
As I understand it, there should only be a single sequence representing a given species in a given gene tree in order for the underlying methods and statistics to be sound in Asteroid / in supertree methods - hence tools like DISCO ensure single-copy gene trees are produced prior to downstream species tree building by supertree methods
BUT
I am unsure if it is a violation of statistical or other methods to have the same sequence in multiple single-copy gene trees used in Asteroid / supertree methods.
Any guidance on this would be greatly appreciated :)
Thank you! Eric