RNA-seq SNP based Phylogenetic Tree Pipeline, implemented as a Snakemake workflow
09/14 Graeber Lab Presentation
Original Non-Snakemake Pipeline Notes
Original Pipeline Implementation using Shell
Using either the Mamba (recommended) or Conda package manager, install Snakemake & Snakedeploy in an isolated environment:
mamba create -c conda-forge -c bioconda -n snakemake snakemake snakedeployEnsure that the newly created environment is activated for all following steps:
mamba activate snakemakeCreate a working directory for this project and enter it for all following steps:
mkdir -p path/to/workdir
cd path/to/workdirIf you want to run the pipeline according to the main branch of this repository, run:
snakedeploy deploy-workflow https://github.com/liaoyjruby/PhyloTree_SM . --branch mainIf you want to have all files locally, clone this repository into the working directory:
git clone https://github.com/liaoyjruby/PhyloTree_SM.git .There are two main folders:
workflow: contains the Snakemake rule that implement the workflowconfig: contains configuration files that should be edited according to needs
General Settings:
Modify config.yaml as needed according to comments in the file.
Units & Samples Sheets:
units.tsv: Required columnsSample_IDandID. Add columnMapped_Pathwith absolute paths if aligned BAM files are elsewhere.samples.tsv: Sample annotation sheet with required columnSample_ID. Add columns with information about conditions of interest as desired.
The pipeline will include all samples listed in the units.tsv sheet in the final phylogenetic tree output.
If you have the aligned BAM files already and do not want to copy them into the working directory, place them into subdirectory mapped/ with name <Sample_ID>.bam.
If you have the VCF file + index, place them into subdirectory hcVCF/ with name <Sample_ID>.vcf.gz & <Sample_ID>.vcf.gz.tbi.
See DAG of pipeline jobs by running:
snakemake -c 1 dag --use-condaAfter configuration, run the Snakemake workflow while deploying any necessary software in the process with:
snakemake -c all --use-condaThe main script Snakefile in the workflow subfolder will automatically be detected and executed.
Change all to desired number of cores to use to run the pipeline.