Skip to content

grp-bork/ENA2eggNOG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

ENA2eggNOG workflow

Bork Group Logo Developed by the Bork Group in collaboration with nf-core
Raise an issue or contact us

See our other Software & Services
Contributors:
Collaborators:
The development of this workflow was supported by NFDI4Microbiota NFDI4Microbiota icon

Description

The ENA2eggNOG workflow is a nextflow workflow for fast functional annotation of novel sequences. It uses precomputed orthologous groups and phylogenies from the eggNOG database to transfer functional information from fine-grained orthologs only.

Common uses of eggNOG-mapper include the annotation of novel genomes, transcriptomes, or even metagenomic gene catalogs.

The use of orthology predictions for functional annotation permits a higher precision than traditional homology searches (i.e. BLAST searches), as it avoids transferring annotations from close paralogs (duplicate genes with a higher chance of being involved in functional divergence).

Benchmarks comparing different eggNOG-mapper options against BLAST and InterProScan can be found here.

Citation

This workflow: DOI

Also cite:

Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol Biol Evol. 2021;38(12):5825-5829. doi:10.1093/molbev/msab293
Ewels PA, Peltzer A, Fillinger S, et al. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020;38(3):276-278. doi:10.1038/s41587-020-0439-x
Li D, Liu CM, Luo R, Sadakane K, Lam TW. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31(10):1674-1676. doi:10.1093/bioinformatics/btv033
Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. doi:10.1186/1471-2105-11-119

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.


Overview

  1. Download data from ENA/SRA (fetchngs)
  2. Run assembly (MEGAHIT)
  3. Predict genes(Prodigal)
  4. Annotate genes (eggnog-mapper)

Usage

Cloud-based Workflow Manager (CloWM)

This workflow will be available on the CloWM platform (coming soon).

Command-Line Interface (CLI)

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

You can run the pipeline using:

nextflow run eggnogmapper \
   -profile <docker/singularity/.../institute> \
   --input ids.csv \
   --outdir <OUTDIR>

Input files

The input is a csv file with a list of ENA accession IDs that looks as follows:

ids.csv:

PRJEB6102
SRR9984183
SRR13191702

Each can be a project ID or a run ID.

About

eggNOG mapper pipeline

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors