Command Reference

This page provides detailed information about the commands available in the AVrC Toolkit, including their options, arguments, and example usage.

Overview

The AVrC Toolkit provides two main commands:

download: For retrieving the AVrC database or its subsets
filter: For filtering sequences based on various criteria

Download Command

The download command allows you to retrieve the AVrC database or specific subsets.

Usage

avrc download [SUBSET] [OPTIONS]

Arguments

SUBSET: The subset of the database to download. Available options:
- all: Complete dataset with all representative sequences
- hq: High-quality sequences subset
- phage: Bacteriophage sequences subset

Options

--list: List available subsets and their descriptions
-o, --output PATH: Output directory (default: current directory)
--no-metadata: Download sequences only (no metadata files)

Examples

List available subsets:

avrc download --list

Download complete dataset:

avrc download all -o data/

Download high-quality subset:

avrc download hq -o high_quality_data/

Download phage subset without metadata:

avrc download phage -o phage_data/ --no-metadata

Filter Command

The filter command allows you to filter sequences based on various criteria.

Usage

avrc filter PATH [OPTIONS]

Arguments

PATH: Path to the directory containing AVrC database files

Options

Quality Filtering

--quality TEXT: Filter by CheckV quality category [Complete|High-quality|Medium-quality|Low-quality]
--min-length INT: Minimum sequence length
--no-plasmids: Exclude sequences classified as potential plasmids

Taxonomy Filtering

--realm TEXT: Filter by viral realm
--kingdom TEXT: Filter by viral kingdom
--phylum TEXT: Filter by viral phylum
--class TEXT: Filter by viral class
--order TEXT: Filter by viral order
--family TEXT: Filter by viral family

Lifestyle Filtering

--lifestyle TEXT: Filter by predicted lifestyle [temperate|virulent|uncertain]

Host Filtering

--host-domain TEXT: Filter by host domain
--host-phylum TEXT: Filter by host phylum
--host-class TEXT: Filter by host class
--host-order TEXT: Filter by host order
--host-family TEXT: Filter by host family
--host-genus TEXT: Filter by host genus

Output Options

--output [fasta|metadata|both]: Output format (default: both)
--output-dir PATH: Output directory (default: filtered/)

Examples

Basic quality filtering:

avrc filter data/ \
  --quality High-quality \
  --no-plasmids \
  --output fasta

Host-specific filtering:

avrc filter data/ \
  --host-phylum Bacillota \
  --output both \
  --output-dir filtered/

Combined filtering:

avrc filter data/ \
  --min-length 10000 \
  --lifestyle temperate \
  --host-genus Campylobacter \
  --output both \
  --output-dir campylobacter_phages/

Output Files

When using the filter command with --output both, the following files are generated:

filtered_sequences.fasta.gz: Filtered sequences in compressed FASTA format
filtered_quality.csv: Quality metrics for filtered sequences
filtered_viral_desc.csv: Taxonomic information for filtered sequences
filtered_hosts.csv: Host predictions for filtered sequences

Resource Requirements

Different operations require different amounts of computational resources:

Memory Usage:
- Basic filtering: 2-4GB
- Complex filtering with large datasets: 4-8GB
Disk Space:
- Complete dataset: ~10GB
- High-quality subset: ~5GB
- Phage subset: ~3GB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Command Reference

Command Reference

Overview

Download Command

Usage

Arguments

Options

Examples

Filter Command

Usage

Arguments

Options

Quality Filtering

Taxonomy Filtering

Lifestyle Filtering

Host Filtering

Output Options

Examples

Output Files

Resource Requirements

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally