Skip to content

omicscodeathon/kenyavirocat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KenyaViroCat: The Kenyan Human Gut Virome Catalogue

The Kenyan Human Gut Virome Catalogue is the first comprehensive resource of gut virome genomic sequences derived from the metagenomes of the Kenyan subjects. This repository contains both separate and integrated infant and adult gut virome signatures, providing a foundation for profiling the virome landscape in the Kenyan population.

Background

The gut viral community has been increasingly recognized for its role in human physiology and health, but is less studied than the gut bacteriome. Gut virome studies that exist are biased towards high-income countries, hence overlooking the viral diversity in less represented populations such as in Sub – Saharan Africa. One population whose general microbiome and virome has been underexplored is the Kenyan population. Hence, the establishment of this catalogue seeks to shed light on viral signatures that might not have been investigated before by other human virome catalogues.

General Workflow

3 pubication workflow

Metagenomic data sources

The Kenyan Human Gut Virome Catalogue was constructed from these studies:

Bionformatics catalogue construction

a) Quality of the metagenomic reads was assessed using fastQC and trimming of low quality bases and remianing adapter sequences was performed using fastp

b) Genome assembly was peformed using MegaHIT after which contigs less than 1kb were discarded.

c) Viral contig prediction, clustering and functional annotation was peformed using the Modular Viromics Pipeline. This included:

  • Virus and provirus sequence prediction using geNomad
  • Assessment of viral contig completeness and contamination and further viral filtering using CheckV
  • Clustering of viral contigs based on >95% Average Nucleotide Identity (ANI) and >85% Alignment Fraction (AF) using aniclust.py provided by CheckV
  • Viral protein prediction and functional annotation of predicted proteins against PHROGs, PFam dbAPIS, RdRP AND DRAM-v
  • Viral cluster coverage estimation using coverM

d) Viral taxonomy, microbial host taxonomy and predicted lifestyle was assigned using geNomad and uhgv-tools

Reproducible workflow

We have organized all the code that was generated and used in the scripts. Upon installation of all the tools references above, one can run the bash scripts simply by doing:

bash <your_script.sh>

We are currently working on making the entire catalogue construction pipeline reproducible using Nextflow.

Catalogue description and data availability

The Kenyan Human Gut Virome Catalogue (KHGVC) comprises of 116,968 species-level representative viral sequences and 1,693,638 viral proteins. The files containing the sequences can be downloaded from here:

Team

  1. Simeon Hebrew - Team Lead, Bioinformatician - Centre for Immunology and Microbial Infections, France
  2. Abiola Babajide - Writer (Manuscript, GitHub), AI & ML, SQL, PowerBI - University of the Western Cape, South Africa
  3. James Mordecai - Bash scripter, Bio-illustrator, Statistical (R) analyst, Writer - King Fahd University of Petroleum and Minerals (KFUPM), Saudi Arabia
  4. David Makoko - Editor (Manuscript), Analyst- R - Jhpiego, A Johns Hopkins University Affiliate, Tanzania
  5. Yacouba SAWADOGO - Member - Nazi Boni University of Bobo, Burkina Faso
  6. Olaitan I. Awe - Supervisor - Institute of Genomic Medical Research (IGMR), United States

Acknowledgement

This project was supported by:

License

This project is open-source and available under the MIT License.

Contact

For questions, contributions, or collaborations, please open an issue or contact the project lead at Simeon Hebrew

About

Development of a Kenyan Human Gut Virome Catalogue

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5