Official PyTorch-based implementation of Paper:
"An Image-based Protein-Ligand Binding Representation Learning Framework via Multi-Level Flexible Dynamics Trajectory Pre-training"
- π News
- π οΈ Installation & Environment Setup
- πΌοΈ Data Preprocessing
- π§ͺ Pre-training ImagePLB
- π― Fine-tuning on Downstream Tasks
- π Reproducing Our Results
- π Citation
-
[2025/09/19] π Paper was accepted by Bioinformatics!
-
[2024/06/28] Repository setup completed. Code and instructions released.
- GPU with CUDA 11.6
- Ubuntu 18.04
# create conda env
conda create -n ImagePLB python=3.9
conda activate ImagePLB
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
# install environment
pip install rdkit
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install biopython==1.79
pip install easydict
pip install tqdm
pip install timm==0.6.12
pip install tensorboard
pip install scikit-learn
pip install setuptools==59.5.0
pip install pandas
pip install torch-cluster torch-scatter torch-sparse torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.13.1%2Bcu116.html
pip install torch-geometric==1.6.0
pip install dgl-cu116
pip install ogb
pip install seaborn
conda install openbabel -c conda-forge
pip install einopsWe use PyMOL to genearte multi-view ligand images from molecular conformations. Here is the PyMOL script to get the multi-view ligand images, you can run it in the PyMOL command:
Click here for the code!
sdf_filepath=demo.sdf # sdf file path of ligand
rotate_direction=x
rotate=0
save_img_path=demo_frame.png
load $sdf_filepath;bg_color white;set stick_ball,on;set stick_ball_ratio,3.5;set stick_radius,0.15;set sphere_scale,0.2;set valence,1;set valence_mode,0;set valence_size, 0.1;rotate $rotate_direction, $rotate;save $save_img_path;quit;Note that we used 4 views by setting the following parameters:
- rotate_direction=x; rotate=0
- rotate_direction=x; rotate=180
- rotate_direction=y; rotate=180
- rotate_direction=z; rotate=180
Of course, to save you time on data preprocessing, we also provide download links for all data for your free access.
| Name | Download link | Description |
|---|---|---|
| multi_view_trajectory_video.tar.gz | ||
| multi_view_trajectory_video_1%.tar.gz | BaiduCloud | ligand trajectory with 1% multi-view images. (only #1 frame is multi-view images) |
| pocket.tar.gz | OneDrive | pocket trajectory with 3D graphs. |
Please download all data listed above and put it in datasets/pre-training/MISATO/processed/ if you want to pre-train ImagePLB from scratch.
The directory is organized in the following format:
datasets/pre-training/MISATO/processed/
+---pocket
| | train.npz
| |
+---multi_view_trajectory_video
| +---1A0Q
| | +---x_0
| | | mov0001.png
| | | mov0002.png
| | | ...
| | +---x_180
| | | mov0001.png
| | | mov0002.png
| | | ...
| | +---y_180
| | | mov0001.png
| | | mov0002.png
| | | ...
| | +---z_180
| | | mov0001.png
| | | mov0002.png
| | | ...
The pre-trained ImagePLB (ImagePLB-P) can be accessed in following table.
| Name | Download link | Description |
|---|---|---|
| ImagePLB-P.pth | OneDrive | You can download the ImagePLB-P and put it in the directory: resumes/. |
If you want to pre-train your own ImagePLB-P, see the command below.
Usage:
usage: pretrain_ImagePLB.py [-h] [--dataroot DATAROOT] [--workers WORKERS]
[--model_name MODEL_NAME]
[--max_len_pocket MAX_LEN_POCKET] [--center]
[--n_dim_graph N_DIM_GRAPH] [--lr LR]
[--momentum MOMENTUM]
[--weight-decay WEIGHT_DECAY] [--weighted_loss]
[--runseed RUNSEED] [--start_epoch START_EPOCH]
[--epochs EPOCHS] [--batch BATCH]
[--imageSize IMAGESIZE] [--resume RESUME]
[--n_ckpt_save N_CKPT_SAVE]
[--n_batch_step_optim N_BATCH_STEP_OPTIM]
[--lambda_next_mol LAMBDA_NEXT_MOL]
[--lambda_next_pocket LAMBDA_NEXT_POCKET]
[--lambda_next_complex LAMBDA_NEXT_COMPLEX]
[--log_dir LOG_DIR] [--tb_step_num TB_STEP_NUM]run command in pretrain folder to pre-train ImagePLB:
CUDA_VISIBLE_DEVICES=0,1,2,3 python pretrain_ImagePLB.py \
--workers 16 \
--batch 128 \
--epochs 30 \
--lr 0.001 \
--dataroot ../datasets/pre-training/MISATO/processed \
--log_dir ./experiments/pretrain_ImagePLB \
--weighted_lossAll downstream task data is publicly accessible below:
| Datasets | Links | Description |
|---|---|---|
| PDBBind | OneDrive | Including PDBBind-30, PDBBind-60, PDBBind-Scaffold. |
| LEP | OneDrive | Dataset of ligand efficacy prediction. |
datasets/fine-tuning/
+---pdbbind
| +---ligand
| | +---1a4k
| | | x_0.png
| | | x_180.png
| | | y_180.png
| | | z_180.png
| +---30
| | | train.npz
| | | valid.npz
| | | test.npz
| +---60
| | | train.npz
| | | valid.npz
| | | test.npz
| +---scaffold
| | | train.npz
| | | valid.npz
| | | test.npz
+---lep
| +---ligand
| | +---Lig2__6BQG__6BQH
| | | x_0.png
| | | x_180.png
| | | y_180.png
| | | z_180.png
| +---protein
| | | train.npz
| | | val.npz
| | | test.npz
- run command in finetune folder for PDBBind:
python pdbbind.py \
--batch 32 \
--epochs 20 \
--lr 0.0001 \
--egnn_dropout 0.3 \
--predictor_dropout 0.3 \
--dataroot ../datasets/fine-tuning/pdbbind \
--split_type scaffold \
--resume ../resumes/ImagePLB-P.pth \
--log_dir ./experiments/pdbbind/scaffold/rs0/ \
--runseed 0 \
--dist-url tcp://127.0.0.1:12312- run command in finetune folder for LEPοΌ
python lep.py \
--batch 32 \
--epochs 100 \
--lr 0.0001 \
--dataroot ../datasets/fine-tuning/lep \
--split_type protein \
--egnn_dropout 0.5 \
--predictor_dropout 0.5 \
--resume ../resumes/ImagePLB-P.pth \
--log_dir ./experiments/lep/rs0/ \
--runseed 0 \
--dist-url tcp://127.0.0.1:12345We provide detailed training logs and corresponding checkpoints, you can easily see more training details from the logs and directly use our trained models for structure-based virtual screening.
| Name | Download link | Description |
|---|---|---|
| PDBBind-30 | OneDrive | The training details of ImagePLB-P on PDBBind-30 |
| PDBBind-60 | OneDrive | The training details of ImagePLB-P on PDBBind-60 |
| PDBBind-Scaffold | OneDrive | The training details of ImagePLB-P on PDBBind-Scaffold |
| LEP | OneDrive | The training details of ImagePLB-P on LEP |
The files include training logs and checkpoints for training ImagePLB-P with three random seeds (0, 1, 2).
If you find this repository helpful, please consider citing our work and starring π the repository.
@article{10.1093/bioinformatics/btaf535,
author = {Xiang, Hongxin and Liu, Mingquan and Hou, Linlin and Jin, Shuting and Wang, Jianmin and Xia, Jun and Du, Wenjie and Yuan, Sisi and Fu, Xiangzheng and Yang, Xinyu and Zeng, Li and Xu, Lei},
title = {An Image-based Protein-Ligand Binding Representation Learning Framework via Multi-Level Flexible Dynamics Trajectory Pre-Training},
journal = {Bioinformatics},
pages = {btaf535},
year = {2025},
month = {09},
abstract = {Accurate prediction of protein-ligand binding (PLB) relationships plays a crucial role in drug discovery, which helps identify drugs that modulate the activity of specific targets. Traditional biological assays for measuring PLB relationships are time consuming and costly. In addition, models for predicting PLB relationships have been developed and widely used in drug discovery tasks. However, learning more accurate PLB representations is essential to meet the stringent standards required for drug discovery.We propose an image-based protein-ligand binding representation learning framework, called ImagePLB, which equips ligand representation learner (LRL) and protein representation learner (PRL) to accept 3D multi-view ligand images and protein graphs as input respectively and learns rich interaction information between ligand and protein through a binding representation learner (BRL). Considering the scarcity of protein-ligand pairs, we further propose a multi-level next trajectory prediction (MLNTP) task to pre-train ImagePLB on the 4D flexible dynamics trajectory of 16,972 complexes, including ligand-level, protein-level and complex-level, to learn information related to trajectories. Besides, by introducing trajectory regularization (TR), we effectively alleviate the problem of high (even almost identical) feature similarity caused by adjacent trajectories.The proposed pre-training strategies (MLNTP and TR) can further improve the performance of ImagePLB. Compared with the current state-of-the-art methods, ImagePLB has achieved competitive improvements on PLB-related prediction tasks, including protein-ligand affinity and efficacy prediction tasks. This study opens the door to the image-based PLB learning paradigm.All data and implementation details of code can be obtained from https://github.com/HongxinXiang/ImagePLB.},
issn = {1367-4811},
doi = {10.1093/bioinformatics/btaf535},
url = {https://doi.org/10.1093/bioinformatics/btaf535},
eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaf535/64373676/btaf535.pdf},
}