VICON is a Python package for processing and analyzing viral sequence data, with specialized tools for viral genome coverage analysis and sequence alignment.
- Viral sequence alignment and coverage analysis
- K-mer analysis and sliding window coverage calculations
- Visualization tools for coverage plots
- Wrapper scripts for vsearch and ViralMSA
The easiest way to install VICON with all dependencies:
# Create and activate a new environment
conda create -n vicon python=3.11
conda activate vicon
# Install VICON and all dependencies
conda install -c conda-forge -c bioconda -c eka97 vicon
# Set required permissions
chmod +x "$CONDA_PREFIX/bin/vicon-run"
chmod +x "$CONDA_PREFIX/bin/viralmsa"
chmod +x "$CONDA_PREFIX/bin/minimap2"Install from PyPI:
pip install viconNote: When installing via pip, you must manually install these external dependencies:
- minimap2 (≥2.30)
- vsearch
- ViralMSA
Ubuntu / Debian:
sudo apt-get update
sudo apt-get install -y minimap2 vsearchmacOS (Homebrew):
brew install minimap2 vsearchViralMSA:
mkdir -p ~/bin && cd ~/bin
wget "https://raw.githubusercontent.com/niemasd/ViralMSA/master/ViralMSA.py"
chmod +x ViralMSA.py
ln -sf "$PWD/ViralMSA.py" ~/.local/bin/viralmsaRun the VICON pipeline with:
vicon-run --config path/to/your/config.yamlNote:
VICON automatically preprocesses your input FASTA files (both sample and reference) before analysis:
- Converts all sequences to uppercase
- Cleans and standardizes FASTA headers
- Replaces any non-ATCG characters in sequences with 'N'
You do not need to manually edit or check your FASTA files for these issues.
Create a configuration file (config.yaml):
project_path: "project_path"
virus_name: "orov"
input_sample: "data/orov/samples/samples.fasta"
input_reference: "data/orov/reference/reference.fasta"
email: "email@address.com"
kmer_size: 150
threshold: 147 # shows a tolerance of 150-147 = 3 degenerations
l_gene_start: 8000
l_gene_end: 16000
coverage_ratio: 0.5
min_year: 2020
threshold_ratio: 0.01
drop_old_samples: false
drop_mischar_samples: trueThe pipeline automatically extracts years from FASTA headers using a two-step approach:
- Priority extraction: Years following separators (
|,_,/,-) - Fallback extraction: Any standalone 4-digit number between 1850-2030
| Header Example | Year Extracted? | Extracted Year | Reason |
|---|---|---|---|
| `>sample | 2021` | ✅ Yes | 2021 |
>sample_2020 |
✅ Yes | 2020 | After underscore separator |
>sample/2019/data |
✅ Yes | 2019 | After slash separator |
>sample-2022-final |
✅ Yes | 2022 | After dash separator |
>data 2021 sequence |
✅ Yes | 2021 | Standalone 4-digit number |
>sample.2020.version |
✅ Yes | 2020 | Standalone 4-digit number |
>test2021extra |
✅ Yes | 2021 | Standalone 4-digit number |
| `>sample | 202` | ❌ No | - |
>sample_1800_old |
❌ No | - | Outside valid range (1850-2030) |
>sample20213long |
❌ No | - | 5 consecutive digits |
Best Practice: Use
|YYYY,_YYYY,/YYYY, or-YYYYpatterns for reliable year extraction.
This project is licensed under the terms of the MIT license.