Skip to content

Is it theoretically sound to have the same sequence in multiple gene trees? #12

@000generic

Description

@000generic

Hi! I'm new to Asteroid and excited to see what it can do with gappy species representation across gene trees.

I have a pipeline that is producing single-copy gene trees, running through DISCO at the step prior to Asteroid. So - per single-copy gene tree, a given species is represented by a single sequence - and the sequences in the gene tree are all orthologs based on genome clustering methods.

However, I am running the genomes through multiple phylogenetic levels or clustering tools for ortholog detection - and so a single sequence from a single species can be in multiple single-copy gene trees that would become input for Asteroid.

As I understand it, there should only be a single sequence representing a given species in a given gene tree in order for the underlying methods and statistics to be sound in Asteroid / in supertree methods - hence tools like DISCO ensure single-copy gene trees are produced prior to downstream species tree building by supertree methods

BUT

I am unsure if it is a violation of statistical or other methods to have the same sequence in multiple single-copy gene trees used in Asteroid / supertree methods.

Any guidance on this would be greatly appreciated :)

Thank you! Eric

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions