The Kenyan Human Gut Virome Catalogue is the first comprehensive resource of gut virome genomic sequences derived from the metagenomes of the Kenyan subjects. This repository contains both separate and integrated infant and adult gut virome signatures, providing a foundation for profiling the virome landscape in the Kenyan population.
The gut viral community has been increasingly recognized for its role in human physiology and health, but is less studied than the gut bacteriome. Gut virome studies that exist are biased towards high-income countries, hence overlooking the viral diversity in less represented populations such as in Sub – Saharan Africa. One population whose general microbiome and virome has been underexplored is the Kenyan population. Hence, the establishment of this catalogue seeks to shed light on viral signatures that might not have been investigated before by other human virome catalogues.
The Kenyan Human Gut Virome Catalogue was constructed from these studies:
- Manghini et al. 2025, Expanding the human gut microbiome atlas of Africa : https://doi.org/10.1038/s41586-024-08485-8
- Masqood et al. 2024, Gut virome and microbiome dynamics before and after SARS-CoV-2 infection inwomen living with HIV and their infants : https://doi.org/10.1080/19490976.2024.2394248
- Derrien et al. 2023, Gut microbiome function and composition in infants from rural Kenya and association with human milk oligosaccharides : https://doi.org/10.1080/19490976.2023.2178793
- Auguet et al. 2021, Population-level faecal metagenomic profiling as a tool to predict antimicrobial resistance in Enterobacterales isolates causing invasive infections: An exploratory study across Cambodia, Kenya, and the UK : https://doi.org/10.1016/j.eclinm.2021.100910
a) Quality of the metagenomic reads was assessed using fastQC and trimming of low quality bases and remianing adapter sequences was performed using fastp
b) Genome assembly was peformed using MegaHIT after which contigs less than 1kb were discarded.
c) Viral contig prediction, clustering and functional annotation was peformed using the Modular Viromics Pipeline. This included:
- Virus and provirus sequence prediction using geNomad
- Assessment of viral contig completeness and contamination and further viral filtering using CheckV
- Clustering of viral contigs based on >95% Average Nucleotide Identity (ANI) and >85% Alignment Fraction (AF) using aniclust.py provided by CheckV
- Viral protein prediction and functional annotation of predicted proteins against PHROGs, PFam dbAPIS, RdRP AND DRAM-v
- Viral cluster coverage estimation using coverM
d) Viral taxonomy, microbial host taxonomy and predicted lifestyle was assigned using geNomad and uhgv-tools
We have organized all the code that was generated and used in the scripts. Upon installation of all the tools references above, one can run the bash scripts simply by doing:
bash <your_script.sh>
We are currently working on making the entire catalogue construction pipeline reproducible using Nextflow.
The Kenyan Human Gut Virome Catalogue (KHGVC) comprises of 116,968 species-level representative viral sequences and 1,693,638 viral proteins. The files containing the sequences can be downloaded from here:
- Simeon Hebrew - Team Lead, Bioinformatician - Centre for Immunology and Microbial Infections, France
- Abiola Babajide - Writer (Manuscript, GitHub), AI & ML, SQL, PowerBI - University of the Western Cape, South Africa
- James Mordecai - Bash scripter, Bio-illustrator, Statistical (R) analyst, Writer - King Fahd University of Petroleum and Minerals (KFUPM), Saudi Arabia
- David Makoko - Editor (Manuscript), Analyst- R - Jhpiego, A Johns Hopkins University Affiliate, Tanzania
- Yacouba SAWADOGO - Member - Nazi Boni University of Bobo, Burkina Faso
- Olaitan I. Awe - Supervisor - Institute of Genomic Medical Research (IGMR), United States
This project was supported by:
- Institute for Genomic Medicine Research (IGMR) https://www.igmr.org
- African Society for Bioinformatics and Computational Biology (ASBCB) https://www.asbcb.org
- National Institutes of Health (NIH) Office of Data Science Strategy (ODSS) https://datascience.nih.gov/about/odss
This project is open-source and available under the MIT License.
For questions, contributions, or collaborations, please open an issue or contact the project lead at Simeon Hebrew
