snakeit

Snakemake recipe for running/interacting with SHAPEIT via a python environment

This workflow is immature and should be considered rough beta. This was developed for research use, I am placing online as I thought it may be useful to others. No guarantees! SHAPEIT documentation

I use the scikit-allel module for working with genetic variation data, also included is a script that converts SHAPEIT output to an hdf5 file, which is a convenient way of handling large scale genetic data. For more about this see the above link.

Recommended use

Follow the suggestions presented in the snakemake deployment documentation.

ie:

# clone workflow into working directory
git clone https://github.com/hardingnj/snakeit.git path/to/workdir
cd path/to/workdir

# edit config and workflow as needed
vim config.yaml

# install dependencies into isolated environment
conda env create -n myworkflow --file environment.yaml

# activate environment
source activate myworkflow

# execute workflow
snakemake -n

Edit files as needed. At the very least you will need to create a new bam_locations.txt, and edit the config.yaml file. Please feel free to push extensions of the Snakefile back to master.

NOTE: To run the snakefile you will need numpy and pandas installed.

Overview of files

Snakefile; File that encodes the pipeline.
config.yaml; Configuration, including filepaths etc.
submit.sh; An example of how you may invoke snakemake.
bam_locations.txt; describes where to find bam files for each sample in your vcf.
environment.yaml; describes conda environment necessary to run tools.

Overview of pipeline

The pipeline takes vcf files as an input, assuming one vcf per chromosome/contig.

Split these large vcfs into manageable chunks, as the extract PIRs step, and the shapeit phasing step are resource intensive.

Run extract PIRs on each chunk

Run shapeit on each chunk

Run ligate haplotypes on chunks to create a single phased output file

Run shapeit_2_hdf5.py to create a much easier to work with hdf5 file.

~ @hardingnj

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
bam_locations.txt		bam_locations.txt
config.yaml		config.yaml
environment.yaml		environment.yaml
submit.sh		submit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

snakeit

Recommended use

Overview of files

Overview of pipeline

About

Uh oh!

Releases

Packages

Languages

License

melcampos/snakeit

Folders and files

Latest commit

History

Repository files navigation

snakeit

Recommended use

Overview of files

Overview of pipeline

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages