-
Notifications
You must be signed in to change notification settings - Fork 0
Command Reference
Alise Ponsero edited this page Mar 16, 2025
·
2 revisions
This page provides detailed information about the commands available in the AVrC Toolkit, including their options, arguments, and example usage.
The AVrC Toolkit provides two main commands:
-
download: For retrieving the AVrC database or its subsets -
filter: For filtering sequences based on various criteria
The download command allows you to retrieve the AVrC database or specific subsets.
avrc download [SUBSET] [OPTIONS]
-
SUBSET: The subset of the database to download. Available options:-
all: Complete dataset with all representative sequences -
hq: High-quality sequences subset -
phage: Bacteriophage sequences subset
-
-
--list: List available subsets and their descriptions -
-o, --output PATH: Output directory (default: current directory) -
--no-metadata: Download sequences only (no metadata files)
List available subsets:
avrc download --list
Download complete dataset:
avrc download all -o data/
Download high-quality subset:
avrc download hq -o high_quality_data/
Download phage subset without metadata:
avrc download phage -o phage_data/ --no-metadata
The filter command allows you to filter sequences based on various criteria.
avrc filter PATH [OPTIONS]
-
PATH: Path to the directory containing AVrC database files
-
--quality TEXT: Filter by CheckV quality category [Complete|High-quality|Medium-quality|Low-quality] -
--min-length INT: Minimum sequence length -
--no-plasmids: Exclude sequences classified as potential plasmids
-
--realm TEXT: Filter by viral realm -
--kingdom TEXT: Filter by viral kingdom -
--phylum TEXT: Filter by viral phylum -
--class TEXT: Filter by viral class -
--order TEXT: Filter by viral order -
--family TEXT: Filter by viral family
-
--lifestyle TEXT: Filter by predicted lifestyle [temperate|virulent|uncertain]
-
--host-domain TEXT: Filter by host domain -
--host-phylum TEXT: Filter by host phylum -
--host-class TEXT: Filter by host class -
--host-order TEXT: Filter by host order -
--host-family TEXT: Filter by host family -
--host-genus TEXT: Filter by host genus
-
--output [fasta|metadata|both]: Output format (default: both) -
--output-dir PATH: Output directory (default: filtered/)
Basic quality filtering:
avrc filter data/ \
--quality High-quality \
--no-plasmids \
--output fasta
Host-specific filtering:
avrc filter data/ \
--host-phylum Bacillota \
--output both \
--output-dir filtered/
Combined filtering:
avrc filter data/ \
--min-length 10000 \
--lifestyle temperate \
--host-genus Campylobacter \
--output both \
--output-dir campylobacter_phages/
When using the filter command with --output both, the following files are generated:
-
filtered_sequences.fasta.gz: Filtered sequences in compressed FASTA format -
filtered_quality.csv: Quality metrics for filtered sequences -
filtered_viral_desc.csv: Taxonomic information for filtered sequences -
filtered_hosts.csv: Host predictions for filtered sequences
Different operations require different amounts of computational resources:
-
Memory Usage:
- Basic filtering: 2-4GB
- Complex filtering with large datasets: 4-8GB
-
Disk Space:
- Complete dataset: ~10GB
- High-quality subset: ~5GB
- Phage subset: ~3GB