VITAL: A Deep Learning Framework for Affinity Prediction and Interface Mapping of Peptide–Protein Interactions
VITAL is a deep learning framework that co-learns local structural geometries and global sequence contexts to enable quantitative peptide–protein interaction (PepPI) characterization.
Table of contents
ckpts/- Pretrained model checkpointsdatasets/- Example datasets used for training and evaluation?data_processing/- Tools and utilities for feature extraction and preprocessingmodel/- Contains the model file for inferencefeature_dic.py- Main script for generating feature dictionariesrun_feature.shparse_feature_dict.py- Script for parsing and organizing extracted feature dictionariesprediction.py- Main script for running inferencerun_feature.sh- Shell script for executing the full feature extraction pipelinerun_prediction.sh- Shell script for performing model prediction
All experiments were conducted using PyTorch 1.12.1 and Python 3.9 on a server equipped with an NVIDIA GeForce RTX 3090 GPU (CUDA 11.4).
Follow the steps below to set up the environment and install all dependencies.
git clone https://github.com/BADDxmu/VITAL.git
cd VITAL
We provide an env.yml file for reproducible environment setup.
conda env create -f env.yml
conda activate VITAL
Important notice on third-party tools: Users are now required to install the corresponding tools independently from their official repositories, and configure the paths accordingly. This does not affect the reproducibility of our results, as all feature extraction strictly follows the official implementations.
cd data_processing
git clone https://github.com/Superzchen/iFeature.git
cp utils/AAINDEX.py iFeature/codes
cp utils/HQI18.txt iFeature/data
cd ..
cd data_processing # Skip this step if you are already in this directory.
git clone https://github.com/jas-preet/SPOT-1D-Single.git SPOT_1D_Single
cp utils/spot1d_single2.py SPOT_1D_Single
cp utils/__init__.py SPOT_1D_Single
cp utils/ dataset_inference.py SPOT_1D_Single/dataset
cp utils/main.py SPOT_1D_Single
cd ..
Then download the pretrained model weights manually from the official SPOT-1D-Single repository
Note: If you encounter the mkl-service_error error, please ensure the following environment variables are set:
export MKL_THREADING_LAYER=GNU
export MKL_SERVICE_FORCE_INTEL=1
cd data_processing # Skip this step if you are already in this directory.
git clone https://github.com/facebookresearch/esm.git ESM-2
cp utils/extract.py ESM-2/scripts
To download the pretrained ESM-2 model weights, run the following commands:
# Ensure you are in the `data_processing` directory
mkdir ESM-2/checkpoints
cp utils/download_weights.sh ESM-2/scripts
cd ESM-2
bash scripts/download_weights.sh
Note: If aria2c is not installed, you may download the model weights (esm2_t30_150M_UR50D-contact-regression.pt and esm2_t30_150M_UR50D.pt) manually from the official ESM repository and place them in the checkpoints/ directory.
Before using VITAL for inference, you need to generate all required features.
- Prepare your peptide and protein sequences and save them as FASTA files.
- Create a pair list file specifying peptide–protein pairs for prediction. Each line should contain one peptide ID and one protein ID, separated by a tab (\t), as shown below:
peptide1 protein1
peptide2 protein2
...
- Run the full feature extraction pipeline:
bash run_feature.sh
Or run it manually if you wish to modify default arguments via command line:
python feature_dic.py \
--load_list ./datasets/example_data/example_list \
--load_fasta ./datasets/example_data/example_fasta/ \
--save_path ./datasets/example_feature/
--load_listspecifies the peptide–protein pair list file. Each line should contain one peptide ID and one protein ID, separated by a tab character (\t).--load_fastaspecifies the directory containing FASTA files for both peptides and proteins. The sequence identifiers must exactly match those used in the pair list.--save_pathspecifies the output directory where all extracted features and processed feature dictionaries will be stored.
The script will:
- Generate sequence-based and structure-related features
- Save processed feature dictionaries into
./datasets/example_feature/
Run the full inference pipeline:
bash run_prediction.sh
Or run inference manually:
python prediction.py \
--batch_input_csv ./datasets/example_feature/feature_path.csv \
--ckpt_path ./ckpts/ \
--device cuda:0 \
--output ./output/prediction_results/result.json \
--ASM_output_path ./output/ASM \
--verbose
--batch_input_csvspecifies the input CSV file listing the paths to all extracted features. This file is automatically generated byfeature_dic.pyand located in the directory defined by--save_pathduring feature extraction.--outputspecifies the path to save the final prediction results in JSON format.--ASM_output_pathspecifies the directory to save the predicted affinity / interaction strength matrices (ASM) for each peptide–protein pair.--verboseenables detailed logging during inference.
The inference script will:
- Load the precomputed feature dictionary
- Load the TorchScript model
- Produce prediction scores and ASM for each protein–peptide pair
- Inference results will be saved to prediction_results.csv by default.
You can access and use VITAL through the VITAL-web-server.
