GitHub - JasonAHendry/savanna: Comprehensive analysis of nanopore sequencing data

Overview

Savanna is a tool for analysing targeted nanopore sequencing data. It supports a workflow of basecalling, demultiplexing, and downstream analysis $-$ locally or using High-performance computing (HPC). A variety of analysis pipelines are available.

Please note Savanna is still in early stage development and your feedback is welcome.

Features

Basecalling with Dorado
Sample demultiplexing with Dorado or Guppy
Read mapping with Minimap2
Sample quality control and amplicon coverage evaluation
Variant calling with bcftools
Support for multiple species

Installation

Docker

Details

Requires

Docker

Steps

docker pull jasonahendry/savanna:0.0

This will download an image that already has dorado, savanna, and all dependencies pre-installed. Unfortunately it is a bit more cumbersome to run from the command line:

docker run -w `pwd` -v `pwd`:`pwd` jasonahendry/dorado:0.0 savanna

From source

Details

Requires

The version control software Git
The package manager Conda or Mamba
- Mamba is faster and is recommended
Dorado in must be installed and inside of $PATH for savanna basecall
Dorado or Guppy must be installed for savanna demultiplex

Steps

1. Clone the repository:

git clone https://github.com/JasonAHendry/savanna.git
cd savanna

2. Install other depedendencies with conda:

conda env create -f environments/run.yml

or equivalently, with mamba:

mamba env create -f environments/run.yml

3. Install savanna and remaining dependencies:

pip install -e .

4. Test your installation. In the terminal, you should see available commands by typing:

savanna --help

Basic usage

Savanna has four main subcommands which can be viewed by typing savanna --help:

Usage: savanna [OPTIONS] COMMAND [ARGS]...

  Analyse targeted nanopore sequencing data for genomic surveillance

Options:
  --help  Show this message and exit.

Commands:
  download     Download reference genomes.
  basecall     POD5 to FASTQ.
  demultiplex  FASTQ to per-sample FASTQ.
  analyse      Per-sample FASTQ to results.

A. Download your reference genome

In most cases you will want to start by downloading your reference genome(s) of interest. For example, to download the P. falciparum reference genome, you would run:

savanna download -r Pf3D7

Both the FASTA files and GFF files for the reference will be downloaded.

Note: You will need a stable internet connection for this step.

B. Analyse results

If you have already basecalled and demultiplexed your data (e.g. using MinKNOW), then the next step will be to analyse the data using savanna analyse. As an example, the following command will analyse the example data for NOMADS8 provided in the github repository:

savanna analyse \
-e 0000-00-00_expt1 \
-f example_data/expt1/fastq_pass \
-m example_data/expt1/metadata/sample_info.csv \
-r example_data/expt1/metadata/nomads8.amplicons.bed \
--pipeline plasmo

Here is a breakdown of key flags:

Flag	Description	Required / Optional
`-e`	Name of the experiment, used as output directory name. E.g. '2023-05-12_exptA'.	Required
`-f`	Path to directory containing demultiplexed FASTQ files (e.g. '//fastq_pass'). Typically produced by MinKNOW, dorado, guppy or with savanna demultiplex.	Required
`-m`	Path to metadata CSV file containing barcode and sample information. Required to contain `barcode` and `sample_id` columns; can optionally contain other columns of relevance. See here for an example.	Required
`-r`	Path to BED file specifying genomic regions of interest. See here for an example.	Required
`-p`	Name of the pipeline to be run. Default is `plasmo` for P. falciparum.	Optional
`-b`	Analyse only a single barcode from the experiment, indicated by an integer. E.g. to analyse `barcode03` you would include `-b 3`	Optional
`-s`	Only run experiment-wide summary. Mainly useful for running Savanna in HPC environments.	Optional

Testing

Example data is present in example_data and example scripts are present in scripts.

Development

Creating a new analysis module

Create a new directory inside of src/savanna/analyse
Implement a BarcodeAnalysis subclass
- This specifies what the analysis will do for a single barcode
- Pass parameters of interest to the initialisation method
Implement a ExperimentAnalysis subclass
- This will automatically run a BarcodeAnalysis across an entire experiment
- Optionally allows for outputs to be summarised across barcodes, and plots created
Use your ExperimentAnalysis in an existing or new Pipeline subclass
Invoke the pipeline using the --pipeline flag of savanna analyse

Acknowledgements

This work was funded by the Bill and Melinda Gates Foundation (INV-003660, INV-048316).

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
configs/parameters		configs/parameters
environments		environments
example_data		example_data
misc		misc
scripts		scripts
slurm		slurm
src/savanna		src/savanna
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Features

Installation

Docker

Requires

Steps

From source

Requires

Steps

Basic usage

Testing

Development

Creating a new analysis module

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

JasonAHendry/savanna

Folders and files

Latest commit

History

Repository files navigation

Overview

Features

Installation

Docker

Requires

Steps

From source

Requires

Steps

Basic usage

Testing

Development

Creating a new analysis module

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages