Skip to content

Topology weighting from unphased genotypes of any ploidy

License

Notifications You must be signed in to change notification settings

simonhmartin/twisst2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

twisst2

twisst2 is a tool for topology weighting. Topology weighting summarises genealogies in terms of the relative abundance of different sub-tree topologies. It can be used to explore processes like introgression and it can aid the identification of trait-associated loci.

twisst2 has a number of important improvements over the original twisst tool. Most importantly, twisst2 incorporates inference of the ancestral recombination graph (ARG) or tree sequence - local genealogies and their breakpoints along the chromosome. It does this using sticcs. sticcs is a model-free approach and it does not require phased data, so twisst2 can run on unphased genotypes of any ploidy.

The recommended way to run twisst2 is to start from polarised genotype data. This means you either need to know the ancestral allele at each site, or you need an appropriate outgroup(s) to allow inference of the derived allele.

An alternative way to run it is by first inferring the ancestral recombination graph (ARG) tree sequence using a different tool like Relate or tsinfer. However, this typically requires phased genotypes, and my tests suggest that twisst2+sticcs is more accurate than other methods anyway.

Publications

Installation

First install sticcs by following the intructions there.

If you would like to analyse tree sequence objects from tools like msprime and tsinfer, you will also need to install tskit yourself. To install twisst2:

git clone https://github.com/simonhmartin/twisst2.git

cd twisst2

pip install -e .

Command line tool

Starting from unphased (or phased) genotypes

To perform tree inference and topology weighting, twisst2 takes as input a modified vcf file that contains a DC field, giving the count of derived alleles for each individual at each site.

Once you have a vcf file for your genotype data, make the modified version using sticcs (this needs to be installed, see above):

sticcs prep -i <input vcf> -o <output vcf>  --outgroup <outgroup sample ID>

If the vcf file already has the ancestral allele (provided in the AA field in the INFO section), then you do not need to specifiy outrgoups for polarising.

Now you can run the twisst2 to count sub-tree topologies:

twisst2 sticcstack -i <input_vcf> -o <output_prefix> --max_subtrees 512 --ploidy 2 --groups <groupname1> <groupname2> <groupname3> <groupname4> --groups_file

Starting from pre-inferred trees or ARG (e.g. Relate, tsinfer, argweaver, Singer)

twisst2 trees -i <input_file> -o <output_prefix> --groups <groupname1> <groupname2> <groupname3> <groupname4> --groups_file

Output

  • <output_prefix>.topocounts.tsv.gz gives the count of each group tree topology for each interval.
  • <output_prefix>.intervals.tsv.gz gives the chromosome, start and end position of each interval.

R functions for plotting

Some functions for importing and plotting are provided in the plot_twisst/plot_twisst.R script. For examples of how to use these functions, see the plot_twisst/example_plot.R script.

About

Topology weighting from unphased genotypes of any ploidy

Resources

License

Stars

Watchers

Forks

Packages

No packages published