This document provides detailed installation options for the Virulence Factors Classifier (VF_classifier). For most users, the recommended approach using the environment.yml file is the simplest.
A functioning installation is composed of three main steps:
- Clone the repository (or download the zip file)
- Install the dependencies (using mamba or conda)
- Download the VFDB fasta file and create the BLAST database
For most users, follow these simple steps:
# Clone the repository
git clone https://github.com/brunogoncalves/VF_classifier.git
cd VF_classifier
# Make script executable
chmod +x *.py *.R
# Install mamba (if not already installed)
conda install -c conda-forge mamba
# Create environment from provided environment.yml file
mamba env create -f environment.yml
# Activate the environment
conda activate VF_classifierThat's it! You're ready to use the tool. The environment.yml file includes all necessary dependencies.
Mamba is a faster alternative to conda and can install all dependencies, including system tools:
# Create a new conda environment with mamba
conda create -n VF_classifier -c conda-forge mamba
conda activate VF_classifier
# Install all dependencies, including BLAST and Krona
mamba install -c bioconda -c conda-forge blast krona pandas biopython matplotlib seabornIf you prefer to use conda instead of mamba, be warned that it will be much slower:
# Create a new conda environment
conda create -n VF_classifier -c conda-forge
conda activate VF_classifier
# Install all dependencies, including BLAST and Krona
conda install -c bioconda -c conda-forge blast krona pandas biopython matplotlib seabornIf you prefer to install BLAST+ system-wide and use pip for Python packages:
# macOS (using Homebrew):
brew install blast
# Ubuntu/Debian:
sudo apt-get update
sudo apt-get install ncbi-blast+
# CentOS/RHEL:
sudo yum install ncbi-blast+
# From Source:
# Download from https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/# Create virtual environment (recommended)
python -m venv vf-env
source vf-env/bin/activate # On Windows: vf-env\Scripts\activate
# Install required dependencies
pip install pandas biopython matplotlib seaborn# Install Krona via system package manager
sudo apt update && sudo apt install radiant# Using conda (if not installed with system packages)
conda install -c bioconda krona# Manual installation
wget https://github.com/marbl/Krona/releases/download/v2.7/KronaTools-2.7.tar
tar -xvf KronaTools-2.7.tar
cd KronaTools-2.7
./install.plNote: All three options provide the same ktImportText command used by the script. The radiant package is the Ubuntu/Debian system package for Krona tools.
For basic functionality without visualizations:
# Install BLAST+ (see Option 3 for system-specific commands)
# Install only required Python packages
pip install pandas biopythonTo generate advanced comparative heatmaps using plot_vf_heatmap.R, you must have R installed. The script will automatically attempt to install required packages (ComplexHeatmap, dplyr, etc.) on its first run.
# macOS (using Homebrew):
brew install r
# Ubuntu/Debian:
sudo apt-get update
sudo apt-get install r-base
# CentOS/RHEL:
sudo yum install RIf you are using the provided environment, R and all dependencies are already included. Otherwise, you can add them manually:
conda install -c conda-forge r-base r-dplyr r-tidyr r-pheatmap r-rcolorbrewer
conda install -c bioconda bioconductor-complexheatmapNote: The script is optimized for R 4.0 or later.
- NCBI BLAST+ (version 2.12.0 or later)
- Python 3.7+
- Pandas
- Biopython
- Git (optional, for cloning/downloading the repository)
- Mamba (optional but highly recommended, for faster conda package installation)
- Matplotlib (optional, for visualization)
- Seaborn (optional, for visualization)
- Krona Tools (optional, for interactive charts) - Install from Krona GitHub
- VFDB Database - Download from VFDB Official Site
- R (version 4.0+, optional, for advanced heatmap generation)
- ComplexHeatmap (Bioconductor package, handles automated install if R is present)
- Krona is optional - the script will work without it, but won't generate interactive HTML charts
- If Krona is not installed, the script will gracefully skip HTML generation with a warning
- Matplotlib/Seaborn are optional - the script will work without them, but won't generate summary plots
Important information:
The VFDB data is freely available under the Creative Commons Attribution-NonCommercial (CC BY-NC) license version 4.0 for personal and public non-commercial, research or academic use by individuals at academic, government or non-profit institutions. Users intending to use VFDB data for commercial purposes should contact the authors directly.
For more information, visit: https://www.mgc.ac.cn/VFs/
Use the enhanced download script with intelligent database management; note that the VFDB webpage may change, so the script may need to be updated.
# enter the directory
cd VF_classifier
# Download and create BLAST database with version checking
./VF_classifier.v0.1.0.py --setup
# To force an update even if up to date:
./VF_classifier.v0.1.0.py --setup --forceFeatures of the enhanced script:
- Version comparison between existing and new databases
- Confirmation prompts to prevent accidental overwrites
- Comprehensive metadata file creation
Download the VFDB dataset from the official repository (Approx. 10 MB). The database is updated regularly, so it's recommended to download the latest version.
# Download core dataset (Set B - nucleotide sequences)
wget https://www.mgc.ac.cn/VFs/Down/VFDB_setB_nt.fas.gz
gunzip VFDB_setB_nt.fas.gzOnce you have the VFDB FASTA file, create a BLAST database:
makeblastdb -in VFDB_setB_nt.fas -dbtype nucl -out VFDB_db# Move database files to a dedicated directory
mkdir -p database
mv VFDB_db* database/
mv VFDB_setB_nt.fas database/Important: Keep the VFDB_setB_nt.fas file in the same directory as your BLAST database files. The script will automatically detect it, or you can specify its location with the -v parameter. You can put the database in any path you prefer, but make sure to update the path in the script accordingly.
After installation, verify everything is working:
# Check BLAST installation
blastn -version
# Check Python packages
python -c "import pandas, biopython; print('Python packages OK')"
# Check optional packages
python -c "import matplotlib, seaborn; print('Visualization packages OK')"
# Check Krona (if installed)
ktImportText --help
# Check script
python VF_classifier.v0.1.0.py -h
# Check R and ComplexHeatmap (if R is installed)
Rscript -e "if (!require('ComplexHeatmap', quietly=TRUE)) { print('ComplexHeatmap not found, will be installed on first run') } else { print('ComplexHeatmap OK') }"If you see an error stating a package was "installed before R 4.0.0", it means R is finding an old, incompatible binary in your system or cache.
Solution: The "Clean Start" If the standard installation fails with library errors, follow these steps to clear the cache and start fresh:
# Deactivate and remove the environment
conda deactivate
conda env remove -n VF_classifier
# CLEAR CACHE (The Nuclear Option)
# This ensures no broken binaries are reused
conda clean --all
# Re-create environment
mamba env create -f environment.ymlIf conda or mamba fails to solve the environment, try updating the solver first:
conda update -n base conda
# or if using mamba
mamba update -n base mambaSometimes R finds packages in your personal folder (~/R/...) instead of the Conda environment. To see which paths R is searching, run:
conda activate VF_classifier
Rscript -e ".libPaths()"If you see paths outside of your conda environment appearing first, you may need to temporarily rename those folders during analysis.