Skip to content

Snakemake workflow that runs PyClone-VI clustering and then PhyClone phylogenetic reconstruction.

License

Notifications You must be signed in to change notification settings

Roth-Lab/PhyClone-Workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PhyClone-Workflow

A Snakemake pipeline that bundles the running of PyClone-VI and PhyClone, for pre-clustering and phylogenetic reconstruction of multi-sample bulk-sequencing data.


Overview

  1. Setup
  2. Usage
  3. Workflow Output
  4. Workflow Rulegraph

Dependencies


Setup

This pipeline requires that conda and Snakemake be installed; the Bioconda package channel must also be configured.

Environment Setup

  1. Ensure that you have a working conda installation, you can do this by installing Miniforge.
  2. Configure the Bioconda channel and set strict channel priority:
    conda config --add channels bioconda
    conda config --add channels conda-forge
    conda config --set channel_priority strict
    
  3. Install Snakemake:
    conda create -c conda-forge -c bioconda --name snakemake snakemake'>=9.14.8'
    

Acquire Workflow

  1. Create a working directory for the workflow:
    mkdir -p path/to/project-workdir
    cd path/to/project-workdir
    
  2. Clone the workflow repository through git:
    • To clone the latest code:
      git clone --depth 1 https://github.com/Roth-Lab/PhyClone-Workflow.git
      
    • To clone a specific version of the workflow:
      git clone --branch <version_tag> --depth 1 https://github.com/Roth-Lab/PhyClone-Workflow.git
      

Usage

Configuration

For a full description of all available pipeline options, please refer to the pipeline schema.

  1. Modify the configuration file, config.yaml to suit your dataset.
  2. The following configuration fields must be configured per experiment:
    • input_file: A valid filepath to the input file for the pipeline, the format of which can be found in both the PyClone-VI and PhyClone repositories.
    • out_directory: Path to the desired output directory
  3. The default program options listed under pyclone-vi and phyclone in the configuration schema should suit most cases. However, the following values may be of interest to adjust depending on the data being analysed and computing resources available:
    • pyclone-vi options of interest:
      • num_threads: Number of threads (compute cores) to use during inference.
      • seed: Can be used to seed the random number generator for reproducible results.
    • phyclone options of interest:
      • num_chains: Number of independent parallel PhyClone sampling chains to use, each chain will use a CPU core. PhyClone will benefit from running multiple chains; we recommend ≥4 chains, if the compute cores can be spared.
      • seed: Can be used to seed the random number generator for reproducible results.
  4. The remaining configuration options have been named to mirror the options of their respective programs, to read more on the available options and their use cases:

Tip

An example input file can be found in the PyClone-VI repository, here.


Run Workflow

Tip

A basic workflow-profile has been set up here, adjust as needed.

  1. Navigate to the project directory and activate the snakemake environment:

    cd path/to/project-workdir/PhyClone-Workflow
    conda activate snakemake
    
  2. Run a dry-run of the pipeline to confirm the ruleset and outputs are as you expect:

    snakemake --cores <number-of-CPU-cores-to-use> --configfile <path/to/config-file> -n 
    
  3. Run the pipeline:

    snakemake --cores <number-of-CPU-cores-to-use> --configfile <path/to/config-file>
    
  4. Following the pipeline run, you can additionally create an interactive visual HTML report that bundles together and reports on the pipeline results.

    (Note: the report file must have the .zip extension)

    To create this report archive, run:

    snakemake --configfile <path/to/config-file> --report <path/to/report.zip>
    

Steps 3 and 4 can also be combined with a command like the following:

snakemake --cores <number-of-CPU-cores-to-use> --configfile <path/to/config-file> --report <path/to/report.zip> --report-after-run

Workflow Output

The main outputs of the pipeline are point estimate PhyClone clonal phylogenies and/or the PhyClone topology report/archive. More on the contents of these output files can be found in the PhyClone repository.

Example workflow output folder structure:

<output-directory>
├── benchmarks
│   ├── phyclone
│   │   ├── run_phyclone.benchmark.txt
│   │   ├── write_Consensus_results_phyclone.benchmark.txt
│   │   ├── write_MAP_results_phyclone.benchmark.txt
│   │   └── write_phyclone_topology_archive_and_report.benchmark.txt
│   └── pyclone-vi
│       ├── run_pyclone_vi.benchmark.txt
│       └── write_results_pyclone_vi.benchmark.txt
├── logs
│   ├── main_snakefile_logs
│   │   ├── correct_input.stderr.log
│   │   └── correct_input.stdout.log
│   ├── phyclone_logs
│   │   ├── get_phyclone_version.log
│   │   ├── plot_Consensus_tree.log
│   │   ├── plot_MAP_tree.log
│   │   ├── run_phyclone.log
│   │   ├── write_Consensus_results_phyclone.log
│   │   ├── write_MAP_results_phyclone.log
│   │   └── write_phyclone_topology_archive_and_report.log
│   └── pyclone-vi_logs
│       ├── get_pyclone_version.log
│       ├── run_pyclone_vi.log
│       └── write_results_pyclone_vi.log
└── pipeline_outputs
    ├── input
    │   ├── cleaned_input.tsv.gz
    │   └── removed_variants.tsv.gz
    ├── phyclone
    │   ├── Consensus
    │   │   ├── Consensus_results_table.tsv.gz
    │   │   ├── Consensus_sample_prevalence_table.tsv.gz
    │   │   ├── Consensus_tree.nwk
    │   │   └── Consensus_tree.svg
    │   ├── MAP
    │   │   ├── MAP_results_table.tsv.gz
    │   │   ├── MAP_sample_prevalence_table.tsv.gz
    │   │   ├── MAP_tree.nwk
    │   │   └── MAP_tree.svg
    │   ├── phyclone.version.txt
    │   ├── Topology_Report
    │   │   ├── sampled_topologies.tar.gz
    │   │   └── topology_report.tsv.gz
    │   └── trace.h5
    └── pyclone-vi
        ├── clusters.tsv.gz
        ├── pyclone-vi.version.txt
        └── trace.h5

Workflow Rulegraph

PhyClone - Pipeline Rulegraph

About

Snakemake workflow that runs PyClone-VI clustering and then PhyClone phylogenetic reconstruction.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages