A Nextflow pipeline for the functional annotation of prokaryotic genomes and metagenomes.
- Input: prokaryotic contig/genomes in FASTA format;
- Functional annotation using Prokka;
- Functional annotation through orthology assignment by eggNOG-mapper v2 and the eggNOG v5.0 database (optional);
- KEGG ortholog assignment by KofamScan and the KOfam database (https://www.genome.jp/tools/kofamkoala/) (optional);
- Estimates KEGG pathway completeness using Anvi'o (https://merenlab.org/software/anvio/) (optional);
- Automatic database download.
-
Install Docker (or Singulariry) and Nextflow (see Dependences);
-
Start running the analysis (Prokka only, bacteria mode):
nextflow run metashot/prok-annotate \ --genomes "data/*.fa" \ --outdir results
eggNOG, KofamScan and Anvi'o Kofam annotations can be performed by setting
--run_eggnog, --run_kofamscan and --run_anvio_kofam options:
nextflow run metashot/prok-annotate \
--genomes "data/*.fa" \
--run_eggnog \
--run_kofamscan \
--run_anvio_kofam \
--outdir resultsEggNOG, KofamScan and Anvi'o Kofam databases will be automatically downloaded
from the Internet and and they will be put in the dbs directory in the
results folder. If you want to download the databases manually, run the
following lines:
-
EggNOG:
docker run --rm -v /path/to/eggnog_db:/eggnog_db metashot/eggnog-mapper:2.0.1-3 \ python /eggnog_mapper/download_eggnog_data.py -y --data_dir /eggnog_db
Add
--eggnog_db /path/to/eggnog_dboption to thenextflow runcommand. -
KofamScan
Download KOfam database from ftp://ftp.genome.jp/pub/db/kofam/ and decompress it (e.g. into the
/path/to/kofamscan_dbfolder:mkdir /path/to/kofamscan_db curl -s -L ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz | \ tar -zxf - -C /path/to/kofamscan_db curl -s -L ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz -o ko_list.gz && \ gunzip -c ko_list.gz > /path/to/kofamscan_db/ko_list
Add
--kofamscan_db /path/to/kofamscan_dboption to thenextflow runcommand. -
Anvi'o Kofam
docker run --rm -v /path/to/anvio_kofam_db:/anvio_kofam_db metashot/metashot/anvio:7-2 \ anvi-setup-kegg-kofams --kegg-data-dir /anvio_kofam_db
Add
--anvio_kofam_db /path/to/anvio_kofam_dboption to thenextflow runcommand.
For metagenomes, we suggest to run the pipeline using the --metagenome option using
EggNOG mapper only:
nextflow run metashot/prok-annotate \
--genomes "contigs/*.fa" \
--metagenome \
--skip_prokka \
--run_eggnog \
--eggnog_db_mem \
--outdir resultsSee the file nextflow.config for the complete list of
parameters.
The files and directories listed below will be created in the output directory after the pipeline has finished.
prokka: Prokka outputs (documentation);eggnog_*.tsv: count matrix for each transferred annotation;kofamscan_hits.tsv: count matrix for KofamScan hits;anvio_kofam_hits.tsv: count matrix for Anvio Kofam hits;anvio_modules.tsv: module completeness matrix.
eggnog(if--run_eggnog): eggNOG outputs (documentation);kofamscan(if--run_kofamscan): KofamScan output (detailformat);anvio(if--run_anvio_kofam):anvi-estimate-metabolism(https://merenlab.org/software/anvio/help/main/programs/anvi-estimate-metabolism/) outputs for each input genome (kofam_hitsandmodulesformats). Moreover gene calls in FASTA and text format are reported (anvi-get-sequences-for-gene-callsandanvi-export-gene-calls).
Please refer to System requirements for the complete list of system requirements options.