Benchmarking Shallow Metagenomics - UNDER CONSTRUCTION

This GitHub repository describes the workflow used for benchmarking shallow metagenomic sequencing of Mock communities (DNA mixtures), as described in Treichel et al. (bioRxiv).

Background

With this study we aimed to systematically assess the threshold of sequencing depth necessary for the read-outs of taxonomic analysis, functional genes and pathways, and MAG construction. We used three complex mixtures of DNA from cultured gut bacteria. An evenly distributed Mock community containing DNA of 70 strains, and two with staggered distributions, containing either DNA of 24 strains, or 70 strains. Analysis was done at up to 11 sequencing depths (0.1, 0.25, 0.5, 0.75, 1.0, 1.5, 2.0, 5.0, 10.0, 20.0 and 50.0 Gb). Additionally, library preparation was performed in two facilities and the effect of background DNA was tested. Furthermore long-read sequencing at a third facility was included for analysis of MAG construction. For analysis of the effect of multi-coverage binning on MAG construction, 10 Mock communities containing DNA of 24 strains in different distributions were generated and sequenced at 10 Gb each.

Description

Pre-processing short-read sequencing

Sub-sampling of shotgun metagenomic data to exact number of reads (seqtk)
Quality filtering and phiX removal (trimmomatic, bbmap, bbduk)
Assembly into contigs (MEGAHIT and metaSPAdes)

Pre-processing long-read sequencing

Length and quality filtering of raw reads (chopper)
Random sub-sampling to 11 datasets corresponding the required sequencing depth (rasusa)
Assembly into contigs (flye, medaka)
Quality check of assemblies (QUAST)

Taxonomic Analysis

Coverage of reads to reference genomes (coverM)
Read count per reference genome / relative abundance (coverM)
Non-supervised taxonomic profiling (MetaPhlAn)

Functional Analysis

Protein coding gene prediction (prodigal)
Alignment to predicted protein sequences of reference genomes (Diamond)
Completeness of functional pathways (kofamscan, KEGGdecoder)

Metagenome-assembled genomes (MAGs)

Removal of contigs < 1000 bp
MAG construction (bowtie2, metabat2)
Evaluation of completeness and contamination (checkM)
Taxonomic assignment (GTDB-tk)
MAG composition with respect to reference genomes (blastn)
Comparison of sample-specific and multi-coverage binning for the ability to reconstruct high-quality, non-chimeric MAGs (Workflow by Mattock and Watson)

Graphical overview

Installation / Requirements

For installation of the required tools please visite their original websites linked above.

Data availability

Metagenomic data has be deposited at the European Nucleotide Archive/NCBI and is accessible under Project no. PRJEB83573.

Publication

Treichel et al. bioRxiv

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Functional_Analysis.md		Functional_Analysis.md
LICENSE		LICENSE
MAG_construction.md		MAG_construction.md
Pre-processing.md		Pre-processing.md
Quality_filtering_phiX_removal_and_assembly.sh		Quality_filtering_phiX_removal_and_assembly.sh
README.md		README.md
Taxonomic_Analysis.md		Taxonomic_Analysis.md
Workflow_Overview.png		Workflow_Overview.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Benchmarking Shallow Metagenomics - UNDER CONSTRUCTION

Background

Description

Graphical overview

Installation / Requirements

Data availability

Publication

About

Uh oh!

Releases

Packages

Languages

License

ClavelLab/Benchmarking-shallow-Metagenomics

Folders and files

Latest commit

History

Repository files navigation

Benchmarking Shallow Metagenomics - UNDER CONSTRUCTION

Background

Description

Graphical overview

Installation / Requirements

Data availability

Publication

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages