Skip to content
Alise Ponsero edited this page Mar 13, 2025 · 2 revisions

AVrC Wiki - Home Page

Welcome to the AVrC Wiki

The Aggregated Gut Viral Catalog (AVrC) is a comprehensive database of viral sequences from the human gut microbiome. By harmonizing multiple existing viral catalogs and incorporating novel sequences from large-scale studies, AVrC provides researchers with a unified resource for exploring human gut viral diversity.

About AVrC

Despite the growing interest in the role of the gut virome in human health and disease, identifying viral sequences from human gut metagenomes remains computationally challenging due to underrepresentation of viral genomes in reference databases. Several recent large-scale efforts have mined human gut metagenomes to establish viral sequence catalogues, using varied computational tools and quality control criteria.

The AVrC database addresses this critical gap by systematically surveying nine previously published human gut viral catalogues and expanding representation by mining 7,867 infant gut metagenomes. While these catalogues collectively screened >40,000 human fecal metagenomes, 82% of the recovered 345,613 viral sequences were unique to one catalogue, highlighting limited redundancy.

Database Statistics

The AVrC contains:

  • 1,018,941 dereplicated viral sequences
  • 449,859 species-level vOTUs (viral Operational Taxonomic Units)
  • 130,092 representative vOTU sequences classified as "Complete" or "High Quality"
  • 380,747 representative vOTU sequences classified as bacteriophages

Data Sources

The AVrC database aggregates viral sequences from major catalogs including:

  • IMG/Vr (human gut subset)
  • Gut Virome Database (GVD)
  • Gut Phage Database (GPD)
  • Metagenomic Gut Virus (MGV)
  • Cenote Human Virome Database (CHVD)
  • COPSAC infant phages
  • Gut Phages (KGP)
  • Japanese 4D catalogue
  • Danish Enteric Virome Catalog (DEVoC)
  • 12 large-scale infant studies (>7,000 samples)

Key Features

  • Comprehensive Coverage: Combines multiple existing viral catalogs with newly discovered sequences
  • Quality Assessment: All sequences evaluated with CheckV for completeness and quality
  • Taxonomic Classification: Viral taxonomy assignments using GeNomad
  • Host Prediction: Predicted bacterial hosts using iPhop
  • Lifestyle Annotation: Viral lifestyle predictions (temperate/virulent)
  • Easy Access: Available in both SQLite and CSV formats
  • Curated Subsets: Pre-filtered collections for high-quality sequences and bacteriophages

AVrC Toolkit

The AVrC Toolkit is a Python package designed to simplify working with the AVrC database. It provides command-line utilities to:

  • Download complete datasets or specific subsets
  • Filter sequences based on quality, taxonomy, host, and other criteria
  • Extract and organize specific viral groups for analysis
  • Generate properly formatted output files for downstream analysis

Getting Started

Citation

If you use AVrC in your research, please cite our preprint:

Galperina, A., Lugli, G. A., Milani, C., De Vos, W. M., Ventura, M., Salonen, A., Hurwitz, B., & Ponsero, A. J. (2024). 
The Aggregated Gut Viral Catalogue (AVrC): A Unified Resource for Exploring the Viral Diversity of the Human Gut. 
bioRxiv. https://doi.org/10.1101/2024.06.24.600367

And the dataset:

@dataset{avrc_dataset,
    doi = {10.5281/zenodo.11426065},
    url = {https://doi.org/10.5281/zenodo.11426065},
    title = {Aggregated Gut Viral Catalogue (AVrC)},
    year = {2024}
}

Support

If you encounter any issues or have questions about AVrC or the toolkit, please open an issue on our GitHub repository.

Acknowledgments

This study was supported by grants from the Academy of Finland (339172 to AP), by the BBSRC Institute Strategic Programme Food Microbiome and Health BB/X011054/1, and the BBSRC Core Capability Grant BB/CCG2260/1.

Clone this wiki locally