twisst2 is a tool for topology weighting. Topology weighting summarises genealogies in terms of the relative abundance of different sub-tree topologies. It can be used to explore processes like introgression and it can aid the identification of trait-associated loci.
twisst2 has a number of important improvements over the original twisst tool. Most importantly, twisst2 incorporates inference of the ancestral recombination graph (ARG) or tree sequence - local genealogies and their breakpoints along the chromosome. It does this using sticcs. sticcs is a model-free approach and it does not require phased data, so twisst2 can run on unphased genotypes of any ploidy.
The recommended way to run twisst2 is to start from polarised genotype data. This means you either need to know the ancestral allele at each site, or you need an appropriate outgroup(s) to allow inference of the derived allele.
An alternative way to run it is by first inferring the ancestral recombination graph (ARG) tree sequence using a different tool like Relate or tsinfer. However, this typically requires phased genotypes, and my tests suggest that twisst2+sticcs is more accurate than other methods anyway.
- The general concept of topology weighting is described by Martin and Van Belleghem 2017.
- Combining genealogy inference with
sticcsand topology weighting withtwisst2is described by Martin 2025.
First install sticcs by following the intructions there.
If you would like to analyse tree sequence objects from tools like msprime and tsinfer, you will also need to install tskit yourself. To install twisst2:
git clone https://github.com/simonhmartin/twisst2.git
cd twisst2
pip install -e .To perform tree inference and topology weighting, twisst2 takes as input a modified vcf file that contains a DC field, giving the count of derived alleles for each individual at each site.
Once you have a vcf file for your genotype data, make the modified version using sticcs (this needs to be installed, see above):
sticcs prep -i <input vcf> -o <output vcf> --outgroup <outgroup sample ID>If the vcf file already has the ancestral allele (provided in the AA field in the INFO section), then you do not need to specifiy outrgoups for polarising.
Now you can run the twisst2 to count sub-tree topologies:
twisst2 sticcstack -i <input_vcf> -o <output_prefix> --max_subtrees 512 --ploidy 2 --groups <groupname1> <groupname2> <groupname3> <groupname4> --groups_filetwisst2 trees -i <input_file> -o <output_prefix> --groups <groupname1> <groupname2> <groupname3> <groupname4> --groups_file<output_prefix>.topocounts.tsv.gzgives the count of each group tree topology for each interval.<output_prefix>.intervals.tsv.gzgives the chromosome, start and end position of each interval.
Some functions for importing and plotting are provided in the plot_twisst/plot_twisst.R script. For examples of how to use these functions, see the plot_twisst/example_plot.R script.