ImagePLB 🐘

Official PyTorch-based implementation of Paper:

"An Image-based Protein-Ligand Binding Representation Learning Framework via Multi-Level Flexible Dynamics Trajectory Pre-training"

📌 Table of Contents

🚀 News
🛠️ Installation & Environment Setup
🖼️ Data Preprocessing
🧪 Pre-training ImagePLB
🎯 Fine-tuning on Downstream Tasks
📊 Reproducing Our Results
📚 Citation

🚀 News

[2025/09/19] 🎉 Paper was accepted by Bioinformatics!
[2024/06/28] Repository setup completed. Code and instructions released.

🛠️ Installation & Environment Setup

1. Hardware/Software Environment

GPU with CUDA 11.6
Ubuntu 18.04

2. Setup with Conda

# create conda env
conda create -n ImagePLB python=3.9
conda activate ImagePLB
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

# install environment
pip install rdkit
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116 -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install biopython==1.79
pip install easydict
pip install tqdm
pip install timm==0.6.12
pip install tensorboard
pip install scikit-learn
pip install setuptools==59.5.0
pip install pandas
pip install torch-cluster torch-scatter torch-sparse torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.13.1%2Bcu116.html
pip install torch-geometric==1.6.0
pip install dgl-cu116
pip install ogb
pip install seaborn
conda install openbabel -c conda-forge
pip install einops

🖼️ Data Preprocessing

We use PyMOL to genearte multi-view ligand images from molecular conformations. Here is the PyMOL script to get the multi-view ligand images, you can run it in the PyMOL command:

Click here for the code!

sdf_filepath=demo.sdf  # sdf file path of ligand
rotate_direction=x
rotate=0
save_img_path=demo_frame.png
load $sdf_filepath;bg_color white;set stick_ball,on;set stick_ball_ratio,3.5;set stick_radius,0.15;set sphere_scale,0.2;set valence,1;set valence_mode,0;set valence_size, 0.1;rotate $rotate_direction, $rotate;save $save_img_path;quit;

Note that we used 4 views by setting the following parameters:

rotate_direction=x; rotate=0
rotate_direction=x; rotate=180
rotate_direction=y; rotate=180
rotate_direction=z; rotate=180

Of course, to save you time on data preprocessing, we also provide download links for all data for your free access.

🧪 Pre-training ImagePLB

1. Pre-training Dataset

Name	Download link	Description
multi_view_trajectory_video.tar.gz
multi_view_trajectory_video_1%.tar.gz	BaiduCloud	ligand trajectory with 1% multi-view images. (only #1 frame is multi-view images)
pocket.tar.gz	OneDrive	pocket trajectory with 3D graphs.

Please download all data listed above and put it in datasets/pre-training/MISATO/processed/ if you want to pre-train ImagePLB from scratch.

The directory is organized in the following format:

datasets/pre-training/MISATO/processed/
+---pocket
|   |   train.npz
|   |
+---multi_view_trajectory_video
|   +---1A0Q
|   |   +---x_0
|   |   |   mov0001.png
|   |   |   mov0002.png
|   |   |   ...
|   |   +---x_180
|   |   |   mov0001.png
|   |   |   mov0002.png
|   |   |   ...
|   |   +---y_180
|   |   |   mov0001.png
|   |   |   mov0002.png
|   |   |   ...
|   |   +---z_180
|   |   |   mov0001.png
|   |   |   mov0002.png
|   |   |   ...

2. Download Pretrained Model

The pre-trained ImagePLB (ImagePLB-P) can be accessed in following table.

Name	Download link	Description
ImagePLB-P.pth	OneDrive	You can download the ImagePLB-P and put it in the directory: `resumes/`.

3. Pre-train ImagePLB-P from Scratch

If you want to pre-train your own ImagePLB-P, see the command below.

Usage:

usage: pretrain_ImagePLB.py [-h] [--dataroot DATAROOT] [--workers WORKERS]
                            [--model_name MODEL_NAME]
                            [--max_len_pocket MAX_LEN_POCKET] [--center]
                            [--n_dim_graph N_DIM_GRAPH] [--lr LR]
                            [--momentum MOMENTUM]
                            [--weight-decay WEIGHT_DECAY] [--weighted_loss]
                            [--runseed RUNSEED] [--start_epoch START_EPOCH]
                            [--epochs EPOCHS] [--batch BATCH]
                            [--imageSize IMAGESIZE] [--resume RESUME]
                            [--n_ckpt_save N_CKPT_SAVE]
                            [--n_batch_step_optim N_BATCH_STEP_OPTIM]
                            [--lambda_next_mol LAMBDA_NEXT_MOL]
                            [--lambda_next_pocket LAMBDA_NEXT_POCKET]
                            [--lambda_next_complex LAMBDA_NEXT_COMPLEX]
                            [--log_dir LOG_DIR] [--tb_step_num TB_STEP_NUM]

run command in pretrain folder to pre-train ImagePLB:

CUDA_VISIBLE_DEVICES=0,1,2,3 python pretrain_ImagePLB.py \
	--workers 16 \
	--batch 128 \
	--epochs 30 \
	--lr 0.001 \
	--dataroot ../datasets/pre-training/MISATO/processed \
	--log_dir ./experiments/pretrain_ImagePLB \
	--weighted_loss

🎯 Fine-tuning ImagePLB on Downstream Tasks

1. Datasets

All downstream task data is publicly accessible below:

Datasets	Links	Description
PDBBind	OneDrive	Including PDBBind-30, PDBBind-60, PDBBind-Scaffold.
LEP	OneDrive	Dataset of ligand efficacy prediction.

⚠️Please download the dataset provided above and organize the directory as follows:

datasets/fine-tuning/
+---pdbbind
|   +---ligand
|   |   +---1a4k
|   |   |   x_0.png
|   |   |   x_180.png
|   |   |   y_180.png
|   |   |   z_180.png
|   +---30
|   |   |   train.npz
|   |   |   valid.npz
|   |   |   test.npz
|   +---60
|   |   |   train.npz
|   |   |   valid.npz
|   |   |   test.npz
|   +---scaffold
|   |   |   train.npz
|   |   |   valid.npz
|   |   |   test.npz
+---lep
|   +---ligand
|   |   +---Lig2__6BQG__6BQH
|   |   |   x_0.png
|   |   |   x_180.png
|   |   |   y_180.png
|   |   |   z_180.png
|   +---protein
|   |   |   train.npz
|   |   |   val.npz
|   |   |   test.npz

2. Run Fine-tuning

run command in finetune folder for PDBBind:

python pdbbind.py \
	--batch 32 \
	--epochs 20 \
	--lr 0.0001 \
	--egnn_dropout 0.3 \
	--predictor_dropout 0.3 \
	--dataroot ../datasets/fine-tuning/pdbbind \
	--split_type scaffold \
	--resume ../resumes/ImagePLB-P.pth \
	--log_dir ./experiments/pdbbind/scaffold/rs0/ \
	--runseed 0 \
	--dist-url tcp://127.0.0.1:12312

run command in finetune folder for LEP：

python lep.py \
	--batch 32 \
	--epochs 100 \
	--lr 0.0001 \
	--dataroot ../datasets/fine-tuning/lep \
	--split_type protein \
	--egnn_dropout 0.5 \
	--predictor_dropout 0.5 \
	--resume ../resumes/ImagePLB-P.pth \
	--log_dir ./experiments/lep/rs0/ \
	--runseed 0 \
	--dist-url tcp://127.0.0.1:12345

📊 Reproducing Our Results

We provide detailed training logs and corresponding checkpoints, you can easily see more training details from the logs and directly use our trained models for structure-based virtual screening.

Name	Download link	Description
PDBBind-30	OneDrive	The training details of ImagePLB-P on PDBBind-30
PDBBind-60	OneDrive	The training details of ImagePLB-P on PDBBind-60
PDBBind-Scaffold	OneDrive	The training details of ImagePLB-P on PDBBind-Scaffold
LEP	OneDrive	The training details of ImagePLB-P on LEP

The files include training logs and checkpoints for training ImagePLB-P with three random seeds (0, 1, 2).

📚 Citation

If you find this repository helpful, please consider citing our work and starring 🌟 the repository.

@article{10.1093/bioinformatics/btaf535,
    author = {Xiang, Hongxin and Liu, Mingquan and Hou, Linlin and Jin, Shuting and Wang, Jianmin and Xia, Jun and Du, Wenjie and Yuan, Sisi and Fu, Xiangzheng and Yang, Xinyu and Zeng, Li and Xu, Lei},
    title = {An Image-based Protein-Ligand Binding Representation Learning Framework via Multi-Level Flexible Dynamics Trajectory Pre-Training},
    journal = {Bioinformatics},
    pages = {btaf535},
    year = {2025},
    month = {09},
    abstract = {Accurate prediction of protein-ligand binding (PLB) relationships plays a crucial role in drug discovery, which helps identify drugs that modulate the activity of specific targets. Traditional biological assays for measuring PLB relationships are time consuming and costly. In addition, models for predicting PLB relationships have been developed and widely used in drug discovery tasks. However, learning more accurate PLB representations is essential to meet the stringent standards required for drug discovery.We propose an image-based protein-ligand binding representation learning framework, called ImagePLB, which equips ligand representation learner (LRL) and protein representation learner (PRL) to accept 3D multi-view ligand images and protein graphs as input respectively and learns rich interaction information between ligand and protein through a binding representation learner (BRL). Considering the scarcity of protein-ligand pairs, we further propose a multi-level next trajectory prediction (MLNTP) task to pre-train ImagePLB on the 4D flexible dynamics trajectory of 16,972 complexes, including ligand-level, protein-level and complex-level, to learn information related to trajectories. Besides, by introducing trajectory regularization (TR), we effectively alleviate the problem of high (even almost identical) feature similarity caused by adjacent trajectories.The proposed pre-training strategies (MLNTP and TR) can further improve the performance of ImagePLB. Compared with the current state-of-the-art methods, ImagePLB has achieved competitive improvements on PLB-related prediction tasks, including protein-ligand affinity and efficacy prediction tasks. This study opens the door to the image-based PLB learning paradigm.All data and implementation details of code can be obtained from https://github.com/HongxinXiang/ImagePLB.},
    issn = {1367-4811},
    doi = {10.1093/bioinformatics/btaf535},
    url = {https://doi.org/10.1093/bioinformatics/btaf535},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaf535/64373676/btaf535.pdf},
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
dataloader		dataloader
finetune		finetune
model		model
pretrain		pretrain
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImagePLB 🐘

📌 Table of Contents

🚀 News

🛠️ Installation & Environment Setup

1. Hardware/Software Environment

2. Setup with Conda

🖼️ Data Preprocessing

🧪 Pre-training ImagePLB

1. Pre-training Dataset

2. Download Pretrained Model

3. Pre-train ImagePLB-P from Scratch

🎯 Fine-tuning ImagePLB on Downstream Tasks

1. Datasets

2. Run Fine-tuning

📊 Reproducing Our Results

📚 Citation

About

Uh oh!

Releases 1

Packages

Languages

License

HongxinXiang/ImagePLB

Folders and files

Latest commit

History

Repository files navigation

ImagePLB 🐘

📌 Table of Contents

🚀 News

🛠️ Installation & Environment Setup

1. Hardware/Software Environment

2. Setup with Conda

🖼️ Data Preprocessing

🧪 Pre-training ImagePLB

1. Pre-training Dataset

2. Download Pretrained Model

3. Pre-train ImagePLB-P from Scratch

🎯 Fine-tuning ImagePLB on Downstream Tasks

1. Datasets

2. Run Fine-tuning

📊 Reproducing Our Results

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages