This repository is for a project with REU Jetstream program and the National Center for Genome Analysis Support (NCGAS). We utilize Jetstream, a cloud-based infrastructure that aids analysis of metagenomes. To learn more about Jetstream and how to get started, follow this blogpost published by NCGAS.
Metagenomes consist of the total genome content collected from an environmental sample containing bacterial, archaeal, and viral sequences present. These datasets are complex and can be overwhelming to visualize. Using multiple visualization methods benefits researchers by allowing them to perform exploratory analyses that could aid in downstream analysis of the data. This paper focuses on using different visualization methods including a rarefaction curve, ordination plots, alluvial plot and heatmap to represent a metagenomic dataset using Jetstream. Applying the visualization methods on a hydrocarbon seepage metagenomic dataset, we found that the samples cluster based on location, one sample was similar to both reference and seep samples, and the datasets had human contamination. These findings can now lead to potential downstream analysis questions to further assess this data. The scripts and input files used to create the different visualizations are available on GitHub.
The visualization codes are written in R and can be run on either Rstudio or Jupyter Notebook.
Jupyter notebook is a open-source project that allows code, text, images, and equations to be in one single document. To install Jupyter notebooks, follow this blogpost.
The code was written for metagenomic datasets but is transferable to other datasets as well.
Kraken2 is a taxonomic classification system that uses short genomic substrings (k-mer) matches. The k-mer is matched within a query sequence with the lowest common ancestor of all genomes containing the exact k-mer. Kraken is fast and provides a taxa report for each sample. This ste
Install Kraken2 Documentation available here
The test datasets are available in the folder data_input
The jupyter notebook can be downloaded and run on a local installation of the Jupyter notebook. To download the code,
git clone https://github.com/hleffler/Microbial-visualization.git
Further details of this project are explained in the paper available as part of the repository
Report errors or any questions under "Issues".