virus_pipe is a command-line workflow for virus discovery from sequencing data.
It supports input files in:
*.fastq*.fastq.gz*.fasta
The pipeline orchestrates trimming, QC, assembly, mapping, unmapped read extraction, contig construction, blastn/blastx, and virus-focused discovery.
virus_pipe.sh– main analysis pipeline.build_dictionary.sh– helper script to build a BWA/Picard/SAMtools reference index.virus_database.py– adds ICTV/NCBI taxonomy information to candidate virus hits.reverse.py– helper used while generating adapter sequences.ICTV_corrected.txt,ICTV_and_NCBI_Tax.txt– taxonomy lookup tables used byvirus_database.py.
The scripts assume these tools are available in your environment/path (or at the hard-coded locations in the scripts):
- Java
- Trimmomatic
- FastQC
- Trinity
- BWA
- SAMtools
- Picard
- CAP3
- BLAST+ (
blastn,blastx) fasta_formatter- Python (for
virus_database.pyandreverse.py)
Note:
virus_pipe.shcurrently contains cluster-specific absolute paths (for example/local/cluster/...). Update those paths for your environment.
Before running the main pipeline, build a reference dictionary from a FASTA reference:
./build_dictionary.sh reference.fastaThis creates a directory named after reference with BWA, Picard dictionary (.dict), and samtools faidx outputs.
In virus_pipe.sh, update the dictionary selection block to point to your reference index path.
Current logic expects a key and path placeholder:
if [ $2 = "genome_to_map_to" ]; then
Dictionary="location"
fiReplace:
genome_to_map_towith the key you want to pass as the second argument.locationwith the actual reference index prefix path.
./virus_pipe.sh your_file.fastq genome_to_map_toWhere:
your_file.fastqcan also be*.fastq.gzor*.fasta.genome_to_map_tois the dictionary key configured in the script.
The script creates an output folder named after the input basename and writes intermediate files, BLAST reports, and a final *_virus_report.txt summary.
- The default resource settings in
virus_pipe.share:CPU=8memory=64G
- Intermediate files are compressed during processing to reduce disk usage.
- Legacy mention of
crawler.shhas been removed from this README because that script is not present in this repository.