PhyClone-Workflow

A Snakemake pipeline that bundles the running of PyClone-VI and PhyClone, for pre-clustering and phylogenetic reconstruction of multi-sample bulk-sequencing data.

Overview

Setup
- Environment Setup
- Acquire Workflow
Usage
- Configuration
- Run Workflow
Workflow Output
- Workflow output folder structure
Workflow Rulegraph

Dependencies

conda, version >24.7.1
Snakemake, version >=9.14.8

Setup

This pipeline requires that conda and Snakemake be installed; the Bioconda package channel must also be configured.

Environment Setup

Ensure that you have a working conda installation, you can do this by installing Miniforge.

Configure the Bioconda channel and set strict channel priority:

conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict

Install Snakemake:

conda create -c conda-forge -c bioconda --name snakemake snakemake'>=9.14.8'

Acquire Workflow

Create a working directory for the workflow:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

Clone the workflow repository through git:

To clone the latest code:

git clone --depth 1 https://github.com/Roth-Lab/PhyClone-Workflow.git

To clone a specific version of the workflow:

git clone --branch <version_tag> --depth 1 https://github.com/Roth-Lab/PhyClone-Workflow.git

Usage

Configuration

For a full description of all available pipeline options, please refer to the pipeline schema.

Modify the configuration file, config.yaml to suit your dataset.
The following configuration fields must be configured per experiment:
- input_file: A valid filepath to the input file for the pipeline, the format of which can be found in both the PyClone-VI and PhyClone repositories.
- out_directory: Path to the desired output directory
The default program options listed under pyclone-vi and phyclone in the configuration schema should suit most cases. However, the following values may be of interest to adjust depending on the data being analysed and computing resources available:
- pyclone-vi options of interest:
  - num_threads: Number of threads (compute cores) to use during inference.
  - seed: Can be used to seed the random number generator for reproducible results.
- phyclone options of interest:
  - num_chains: Number of independent parallel PhyClone sampling chains to use, each chain will use a CPU core. PhyClone will benefit from running multiple chains; we recommend ≥4 chains, if the compute cores can be spared.
  - seed: Can be used to seed the random number generator for reproducible results.
The remaining configuration options have been named to mirror the options of their respective programs, to read more on the available options and their use cases:
- PhyClone documentation
- PyClone-VI documentation

Tip

An example input file can be found in the PyClone-VI repository, here.

Run Workflow

Tip

A basic workflow-profile has been set up here, adjust as needed.

Navigate to the project directory and activate the snakemake environment:
```
cd path/to/project-workdir/PhyClone-Workflow
conda activate snakemake
```

Run a dry-run of the pipeline to confirm the ruleset and outputs are as you expect:

snakemake --cores <number-of-CPU-cores-to-use> --configfile <path/to/config-file> -n

Run the pipeline:

snakemake --cores <number-of-CPU-cores-to-use> --configfile <path/to/config-file>

Following the pipeline run, you can additionally create an interactive visual HTML report that bundles together and reports on the pipeline results.

(Note: the report file must have the .zip extension)

To create this report archive, run:
```
snakemake --configfile <path/to/config-file> --report <path/to/report.zip>
```

Steps 3 and 4 can also be combined with a command like the following:

snakemake --cores <number-of-CPU-cores-to-use> --configfile <path/to/config-file> --report <path/to/report.zip> --report-after-run

Workflow Output

The main outputs of the pipeline are point estimate PhyClone clonal phylogenies and/or the PhyClone topology report/archive. More on the contents of these output files can be found in the PhyClone repository.

Example workflow output folder structure:

<output-directory>
├── benchmarks
│   ├── phyclone
│   │   ├── run_phyclone.benchmark.txt
│   │   ├── write_Consensus_results_phyclone.benchmark.txt
│   │   ├── write_MAP_results_phyclone.benchmark.txt
│   │   └── write_phyclone_topology_archive_and_report.benchmark.txt
│   └── pyclone-vi
│       ├── run_pyclone_vi.benchmark.txt
│       └── write_results_pyclone_vi.benchmark.txt
├── logs
│   ├── main_snakefile_logs
│   │   ├── correct_input.stderr.log
│   │   └── correct_input.stdout.log
│   ├── phyclone_logs
│   │   ├── get_phyclone_version.log
│   │   ├── plot_Consensus_tree.log
│   │   ├── plot_MAP_tree.log
│   │   ├── run_phyclone.log
│   │   ├── write_Consensus_results_phyclone.log
│   │   ├── write_MAP_results_phyclone.log
│   │   └── write_phyclone_topology_archive_and_report.log
│   └── pyclone-vi_logs
│       ├── get_pyclone_version.log
│       ├── run_pyclone_vi.log
│       └── write_results_pyclone_vi.log
└── pipeline_outputs
    ├── input
    │   ├── cleaned_input.tsv.gz
    │   └── removed_variants.tsv.gz
    ├── phyclone
    │   ├── Consensus
    │   │   ├── Consensus_results_table.tsv.gz
    │   │   ├── Consensus_sample_prevalence_table.tsv.gz
    │   │   ├── Consensus_tree.nwk
    │   │   └── Consensus_tree.svg
    │   ├── MAP
    │   │   ├── MAP_results_table.tsv.gz
    │   │   ├── MAP_sample_prevalence_table.tsv.gz
    │   │   ├── MAP_tree.nwk
    │   │   └── MAP_tree.svg
    │   ├── phyclone.version.txt
    │   ├── Topology_Report
    │   │   ├── sampled_topologies.tar.gz
    │   │   └── topology_report.tsv.gz
    │   └── trace.h5
    └── pyclone-vi
        ├── clusters.tsv.gz
        ├── pyclone-vi.version.txt
        └── trace.h5

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
config		config
images		images
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhyClone-Workflow

Overview

Dependencies

Setup

Environment Setup

Acquire Workflow

Usage

Configuration

Run Workflow

Workflow Output

Example workflow output folder structure:

Workflow Rulegraph

About

Uh oh!

Releases 2

Packages

Languages

License

Roth-Lab/PhyClone-Workflow

Folders and files

Latest commit

History

Repository files navigation

PhyClone-Workflow

Overview

Dependencies

Setup

Environment Setup

Acquire Workflow

Usage

Configuration

Run Workflow

Workflow Output

Example workflow output folder structure:

Workflow Rulegraph

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages