Skip to content

ClavelLab/Benchmarking-shallow-Metagenomics

 
 

Repository files navigation

Benchmarking Shallow Metagenomics - UNDER CONSTRUCTION

This GitHub repository describes the workflow used for benchmarking shallow metagenomic sequencing of Mock communities (DNA mixtures), as described in Treichel et al. (bioRxiv).

Background

With this study we aimed to systematically assess the threshold of sequencing depth necessary for the read-outs of taxonomic analysis, functional genes and pathways, and MAG construction. We used three complex mixtures of DNA from cultured gut bacteria. An evenly distributed Mock community containing DNA of 70 strains, and two with staggered distributions, containing either DNA of 24 strains, or 70 strains. Analysis was done at up to 11 sequencing depths (0.1, 0.25, 0.5, 0.75, 1.0, 1.5, 2.0, 5.0, 10.0, 20.0 and 50.0 Gb). Additionally, library preparation was performed in two facilities and the effect of background DNA was tested. Furthermore long-read sequencing at a third facility was included for analysis of MAG construction. For analysis of the effect of multi-coverage binning on MAG construction, 10 Mock communities containing DNA of 24 strains in different distributions were generated and sequenced at 10 Gb each.

Description

Pre-processing short-read sequencing

  1. Sub-sampling of shotgun metagenomic data to exact number of reads (seqtk)
  2. Quality filtering and phiX removal (trimmomatic, bbmap, bbduk)
  3. Assembly into contigs (MEGAHIT and metaSPAdes)

Pre-processing long-read sequencing

  1. Length and quality filtering of raw reads (chopper)
  2. Random sub-sampling to 11 datasets corresponding the required sequencing depth (rasusa)
  3. Assembly into contigs (flye, medaka)
  4. Quality check of assemblies (QUAST)

Taxonomic Analysis

  1. Coverage of reads to reference genomes (coverM)
  2. Read count per reference genome / relative abundance (coverM)
  3. Non-supervised taxonomic profiling (MetaPhlAn)

Functional Analysis

  1. Protein coding gene prediction (prodigal)
  2. Alignment to predicted protein sequences of reference genomes (Diamond)
  3. Completeness of functional pathways (kofamscan, KEGGdecoder)

Metagenome-assembled genomes (MAGs)

  1. Removal of contigs < 1000 bp
  2. MAG construction (bowtie2, metabat2)
  3. Evaluation of completeness and contamination (checkM)
  4. Taxonomic assignment (GTDB-tk)
  5. MAG composition with respect to reference genomes (blastn)
  6. Comparison of sample-specific and multi-coverage binning for the ability to reconstruct high-quality, non-chimeric MAGs (Workflow by Mattock and Watson)

Graphical overview

Workflow overview

Installation / Requirements

For installation of the required tools please visite their original websites linked above.

Data availability

Metagenomic data has be deposited at the European Nucleotide Archive/NCBI and is accessible under Project no. PRJEB83573.

Publication

Treichel et al. bioRxiv

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 100.0%