Skip to content

liaoyjruby/PhyloTree_SM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PhyloTree_SM

RNA-seq SNP based Phylogenetic Tree Pipeline, implemented as a Snakemake workflow

Links

09/14 Graeber Lab Presentation

Original Non-Snakemake Pipeline Notes

Original Pipeline Implementation using Shell

Usage

1. Install Snakemake

Using either the Mamba (recommended) or Conda package manager, install Snakemake & Snakedeploy in an isolated environment:

mamba create -c conda-forge -c bioconda -n snakemake snakemake snakedeploy

Ensure that the newly created environment is activated for all following steps:

mamba activate snakemake

2. Deploy workflow

Create a working directory for this project and enter it for all following steps:

mkdir -p path/to/workdir
cd path/to/workdir

If you want to run the pipeline according to the main branch of this repository, run:

snakedeploy deploy-workflow https://github.com/liaoyjruby/PhyloTree_SM . --branch main

If you want to have all files locally, clone this repository into the working directory:

git clone https://github.com/liaoyjruby/PhyloTree_SM.git .

There are two main folders:

  • workflow: contains the Snakemake rule that implement the workflow
  • config: contains configuration files that should be edited according to needs

3. Configure workflow

General Settings:

Modify config.yaml as needed according to comments in the file.

Units & Samples Sheets:

  • units.tsv: Required columns Sample_ID and ID. Add column Mapped_Path with absolute paths if aligned BAM files are elsewhere.
  • samples.tsv: Sample annotation sheet with required column Sample_ID. Add columns with information about conditions of interest as desired.

The pipeline will include all samples listed in the units.tsv sheet in the final phylogenetic tree output.

If you have the aligned BAM files already and do not want to copy them into the working directory, place them into subdirectory mapped/ with name <Sample_ID>.bam.

If you have the VCF file + index, place them into subdirectory hcVCF/ with name <Sample_ID>.vcf.gz & <Sample_ID>.vcf.gz.tbi.

5. Run workflow

See DAG of pipeline jobs by running:

snakemake -c 1 dag --use-conda

After configuration, run the Snakemake workflow while deploying any necessary software in the process with:

snakemake -c all --use-conda

The main script Snakefile in the workflow subfolder will automatically be detected and executed.

Change all to desired number of cores to use to run the pipeline.

About

RNA-seq SNP based Phylogenetic Tree pipeline packaged as Snakemake workflow

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors