Skip to content

VirusFriends: discover viral sequences in the NCBI Sequence Read Archive!

License

Notifications You must be signed in to change notification settings

janpb/VirusFriends

 
 

Repository files navigation

VirusFriends: discover viral sequences in the NCBI SRA!

Phage Friends!

Please cite our work!

DOI:DOI 10.17605/OSF.IO/Z4BCN Virus Discovery Project: https://osf.io/4cn3j/

It's always sunny when you have phage friends, too

virus friends

VirusFriends

Is an implementation to discover viruses that are slightly related to already known ones. The idea is to use the genomes of known viruses as anchors to expand the viral sequence space.

Why VirusFriends?

It's being estimated that there are 3x10^31 viral particles in the world, but only ~9,500 viral genomes have being described at a genomic level. This means there is a huge amount of viruses to be discovered!

VirusFriends is a bioinformatics tool that discovers viruses in Whole Shotgun Sequence metagenomic samples in the Sequence Read Archive. In addition to identification and quantification of known viruses in metagenomes, VirusFriends also allows for identification of novel viruses in metagenomic samples. VirusFriends uses Magic-BLAST to screen metagenomes from the SRA for presence of reads that map to the NCBI Viral RefSeq database or a custom user-specified reference database. For a given reference, statistics are output that include read coverage, and numbers of reads mapped at various levels of sequence identity. Novel viruses are identified by de-novo assembly and iteratively searching for reads that map to the 5’ and 3’ ends of a given contig to extend the contig’s length.

What is VirusFriends?

This is an implementation that is inpired on work developed on previous NCBI-hackatons as part of the Virus Discovery Project, the natural histor of this work is: SIDEARM --> Virome Sniff --> ViruSpy --> EndoVir --> VirusFriends

VirusFriends is the latest stage of the [Virus Discovery Project] (https://osf.io/4cn3j/) developed at several NCBI-sponsored hackathons

Pipeline

VirusFriends Pipeline Step 1. Screen a set of SRA datasets for viral reference genomes and keep the "very good" hits and "weak hits"

Input: [a list of SRA ids] [a fasta file with the viral database] Output: sam files, fasta files for weak viral hits, stats about number of hits, identity, etc ... for strong and weak hits

Step 2: Weak hits go into denovo assembly, viral motifs search and extension of the contigs Input: [fasta file of weak hits] Output: enriched contigs, weakly related to known viruses

How does this relate to previous work?

Our hackathon project is build on the shoulders of many other good projects. The closest relative to this project is EndoVir and our relationship to this project can be seen in the figure below. This project also has deeper roots in ViruSpy, Virus Domains, and Virus_Detection_SRA. An overview of this history can be seen here and our relationship to EndoVir can be seen below.

endovir and virusfriends

Test Datasets

HIV-spiked metagenome

Ebola metagenome

Use Cases

crAssphage

Picobirnavirus

VirusFriends with any nucleotides database

Quick Start Guide

To get started, a working directory needs to be created and the appropriate databases need to be put in place. Currently, these steps are best handled by running the setup.sh script in the root directory of VirusFriends. The setup.sh script installs a list of tools required to run VirusFriends. bash setup.sh -I To list the list of tools that will be installed run bash setup -L

To run VirusFriends python3 src/virusfriends.py --inputs SRR5150787 -intype srr

This will create a directory called SRR5150787 in the VirusFriends/analysis directory which will contain the initial results from magicblast in the form of a sam file. SPAdes will assemble any weakly matching reads into contigs.fasta in the asm folder. From there, the rpstblastn and bud pipelines will start.

Dependencies

VirusFriends require a set of tools, most of these tools are installed using the setup.sh script if they are already not in your path.

Installing from Github

  1. git clone https://github.com/NCBI-Hackathons/VirusFriends.git
  2. bash setup.sh -I

The following tools are downloaded and installed by setup.sh file if not already installed and in your path

  • Megahit v1.1.2
  • Blast 2.7
  • SPAdes v3.11.1
  • Sra-toolkit v.2.8.2
  • MagicBlast v1.3.0
  • Python v3.5.5 - pysam 0.13 and biopython (using sudo pip install currently)
  • Samtools v1.7

Authors

About

VirusFriends: discover viral sequences in the NCBI Sequence Read Archive!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 87.9%
  • Shell 12.1%