Skip to content

Model-free inference of tree sequences from unphased genotypes

License

Notifications You must be signed in to change notification settings

simonhmartin/sticcs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

sticcs

sticcs is a method for inferring the series of genealogies along the genome, also called an Ancestral Recombination Graph (ARG) or tree sequence. Unlike some other methods, sticcs does not require phased haplotypes, and it can work on any ploidy level.

The input for sticcs is polarised genotype data. This means you need to know the ancestral allele at each site, or you need an appropriate outgroup(s) to allow inference of the derived allele.

The method is described in this paper.

Installation

sticcs requires cyvcf2 and numpy. If these are not already installed, they will be downloaded and installed when you run the install command.

If you would like to export tree sequence objects from tskit, you will also need to install tskit yourself before running sticcs. Then install sticcs:

git clone https://github.com/simonhmartin/sticcs.git

cd sticcs

pip install -e .

Command line tool

The command line tool takes as input a modified vcf file that contains a DC field, giving the count of derived alleles for each individual at each site.

You can make this from your standard vcf by running:

sticcs prep -i <input vcf> -o <output vcf>  --outgroup <outgroup sample ID>

If your vcf file already has the ancestral allele (provided in the AA field in the INFO section), then you do not need to specifiy outrgoups for polarising.

Now you can run the main command to make the tree sequence:

sticcs ts -i <input vcf> -o <output prefix> --output_format tskit

This will make a treesequence file that can be loaded and analysed using tskit. The default for --output_format is newick, which makes a file of newick trees and a separate file giving the chromosme coordinates of each tree interval.

Python API

Classes and functions from sticcs can be used by importing sticcs in your python script. Full documention is not yet available, but some example can be seen in the twisst2 code.

About

Model-free inference of tree sequences from unphased genotypes

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages