Skip to content

An Image-based Protein-Ligand Binding Representation Learning Framework via Multi-Level Flexible Dynamics Trajectory Pre-training (Bioinformatics 2025)

License

Notifications You must be signed in to change notification settings

HongxinXiang/ImagePLB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ImagePLB 🐘

Official PyTorch-based implementation of Paper:

"An Image-based Protein-Ligand Binding Representation Learning Framework via Multi-Level Flexible Dynamics Trajectory Pre-training"

Python 3.9+ GitHub GitHub last commit

πŸ“Œ Table of Contents


πŸš€ News

  • [2025/09/19] πŸŽ‰ Paper was accepted by Bioinformatics!

  • [2024/06/28] Repository setup completed. Code and instructions released.

πŸ› οΈ Installation & Environment Setup

1. Hardware/Software Environment

  • GPU with CUDA 11.6
  • Ubuntu 18.04

2. Setup with Conda

# create conda env
conda create -n ImagePLB python=3.9
conda activate ImagePLB
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

# install environment
pip install rdkit
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116 -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install biopython==1.79
pip install easydict
pip install tqdm
pip install timm==0.6.12
pip install tensorboard
pip install scikit-learn
pip install setuptools==59.5.0
pip install pandas
pip install torch-cluster torch-scatter torch-sparse torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.13.1%2Bcu116.html
pip install torch-geometric==1.6.0
pip install dgl-cu116
pip install ogb
pip install seaborn
conda install openbabel -c conda-forge
pip install einops

πŸ–ΌοΈ Data Preprocessing

We use PyMOL to genearte multi-view ligand images from molecular conformations. Here is the PyMOL script to get the multi-view ligand images, you can run it in the PyMOL command:

Click here for the code!
sdf_filepath=demo.sdf  # sdf file path of ligand
rotate_direction=x
rotate=0
save_img_path=demo_frame.png
load $sdf_filepath;bg_color white;set stick_ball,on;set stick_ball_ratio,3.5;set stick_radius,0.15;set sphere_scale,0.2;set valence,1;set valence_mode,0;set valence_size, 0.1;rotate $rotate_direction, $rotate;save $save_img_path;quit;

Note that we used 4 views by setting the following parameters:

  • rotate_direction=x; rotate=0
  • rotate_direction=x; rotate=180
  • rotate_direction=y; rotate=180
  • rotate_direction=z; rotate=180

Of course, to save you time on data preprocessing, we also provide download links for all data for your free access.

πŸ§ͺ Pre-training ImagePLB

1. Pre-training Dataset

Name Download link Description
multi_view_trajectory_video.tar.gz
multi_view_trajectory_video_1%.tar.gz BaiduCloud ligand trajectory with 1% multi-view images. (only #1 frame is multi-view images)
pocket.tar.gz OneDrive pocket trajectory with 3D graphs.

Please download all data listed above and put it in datasets/pre-training/MISATO/processed/ if you want to pre-train ImagePLB from scratch.

The directory is organized in the following format:

datasets/pre-training/MISATO/processed/
+---pocket
|   |   train.npz
|   |
+---multi_view_trajectory_video
|   +---1A0Q
|   |   +---x_0
|   |   |   mov0001.png
|   |   |   mov0002.png
|   |   |   ...
|   |   +---x_180
|   |   |   mov0001.png
|   |   |   mov0002.png
|   |   |   ...
|   |   +---y_180
|   |   |   mov0001.png
|   |   |   mov0002.png
|   |   |   ...
|   |   +---z_180
|   |   |   mov0001.png
|   |   |   mov0002.png
|   |   |   ...

2. Download Pretrained Model

The pre-trained ImagePLB (ImagePLB-P) can be accessed in following table.

Name Download link Description
ImagePLB-P.pth OneDrive You can download the ImagePLB-P and put it in the directory: resumes/.

3. Pre-train ImagePLB-P from Scratch

If you want to pre-train your own ImagePLB-P, see the command below.

Usage:

usage: pretrain_ImagePLB.py [-h] [--dataroot DATAROOT] [--workers WORKERS]
                            [--model_name MODEL_NAME]
                            [--max_len_pocket MAX_LEN_POCKET] [--center]
                            [--n_dim_graph N_DIM_GRAPH] [--lr LR]
                            [--momentum MOMENTUM]
                            [--weight-decay WEIGHT_DECAY] [--weighted_loss]
                            [--runseed RUNSEED] [--start_epoch START_EPOCH]
                            [--epochs EPOCHS] [--batch BATCH]
                            [--imageSize IMAGESIZE] [--resume RESUME]
                            [--n_ckpt_save N_CKPT_SAVE]
                            [--n_batch_step_optim N_BATCH_STEP_OPTIM]
                            [--lambda_next_mol LAMBDA_NEXT_MOL]
                            [--lambda_next_pocket LAMBDA_NEXT_POCKET]
                            [--lambda_next_complex LAMBDA_NEXT_COMPLEX]
                            [--log_dir LOG_DIR] [--tb_step_num TB_STEP_NUM]

run command in pretrain folder to pre-train ImagePLB:

CUDA_VISIBLE_DEVICES=0,1,2,3 python pretrain_ImagePLB.py \
	--workers 16 \
	--batch 128 \
	--epochs 30 \
	--lr 0.001 \
	--dataroot ../datasets/pre-training/MISATO/processed \
	--log_dir ./experiments/pretrain_ImagePLB \
	--weighted_loss

🎯 Fine-tuning ImagePLB on Downstream Tasks

1. Datasets

All downstream task data is publicly accessible below:

Datasets Links Description
PDBBind OneDrive Including PDBBind-30, PDBBind-60, PDBBind-Scaffold.
LEP OneDrive Dataset of ligand efficacy prediction.

⚠️Please download the dataset provided above and organize the directory as follows:

datasets/fine-tuning/
+---pdbbind
|   +---ligand
|   |   +---1a4k
|   |   |   x_0.png
|   |   |   x_180.png
|   |   |   y_180.png
|   |   |   z_180.png
|   +---30
|   |   |   train.npz
|   |   |   valid.npz
|   |   |   test.npz
|   +---60
|   |   |   train.npz
|   |   |   valid.npz
|   |   |   test.npz
|   +---scaffold
|   |   |   train.npz
|   |   |   valid.npz
|   |   |   test.npz
+---lep
|   +---ligand
|   |   +---Lig2__6BQG__6BQH
|   |   |   x_0.png
|   |   |   x_180.png
|   |   |   y_180.png
|   |   |   z_180.png
|   +---protein
|   |   |   train.npz
|   |   |   val.npz
|   |   |   test.npz

2. Run Fine-tuning

  • run command in finetune folder for PDBBind:
python pdbbind.py \
	--batch 32 \
	--epochs 20 \
	--lr 0.0001 \
	--egnn_dropout 0.3 \
	--predictor_dropout 0.3 \
	--dataroot ../datasets/fine-tuning/pdbbind \
	--split_type scaffold \
	--resume ../resumes/ImagePLB-P.pth \
	--log_dir ./experiments/pdbbind/scaffold/rs0/ \
	--runseed 0 \
	--dist-url tcp://127.0.0.1:12312
  • run command in finetune folder for LEP:
python lep.py \
	--batch 32 \
	--epochs 100 \
	--lr 0.0001 \
	--dataroot ../datasets/fine-tuning/lep \
	--split_type protein \
	--egnn_dropout 0.5 \
	--predictor_dropout 0.5 \
	--resume ../resumes/ImagePLB-P.pth \
	--log_dir ./experiments/lep/rs0/ \
	--runseed 0 \
	--dist-url tcp://127.0.0.1:12345

πŸ“Š Reproducing Our Results

We provide detailed training logs and corresponding checkpoints, you can easily see more training details from the logs and directly use our trained models for structure-based virtual screening.

Name Download link Description
PDBBind-30 OneDrive The training details of ImagePLB-P on PDBBind-30
PDBBind-60 OneDrive The training details of ImagePLB-P on PDBBind-60
PDBBind-Scaffold OneDrive The training details of ImagePLB-P on PDBBind-Scaffold
LEP OneDrive The training details of ImagePLB-P on LEP

The files include training logs and checkpoints for training ImagePLB-P with three random seeds (0, 1, 2).

πŸ“š Citation

If you find this repository helpful, please consider citing our work and starring 🌟 the repository.

@article{10.1093/bioinformatics/btaf535,
    author = {Xiang, Hongxin and Liu, Mingquan and Hou, Linlin and Jin, Shuting and Wang, Jianmin and Xia, Jun and Du, Wenjie and Yuan, Sisi and Fu, Xiangzheng and Yang, Xinyu and Zeng, Li and Xu, Lei},
    title = {An Image-based Protein-Ligand Binding Representation Learning Framework via Multi-Level Flexible Dynamics Trajectory Pre-Training},
    journal = {Bioinformatics},
    pages = {btaf535},
    year = {2025},
    month = {09},
    abstract = {Accurate prediction of protein-ligand binding (PLB) relationships plays a crucial role in drug discovery, which helps identify drugs that modulate the activity of specific targets. Traditional biological assays for measuring PLB relationships are time consuming and costly. In addition, models for predicting PLB relationships have been developed and widely used in drug discovery tasks. However, learning more accurate PLB representations is essential to meet the stringent standards required for drug discovery.We propose an image-based protein-ligand binding representation learning framework, called ImagePLB, which equips ligand representation learner (LRL) and protein representation learner (PRL) to accept 3D multi-view ligand images and protein graphs as input respectively and learns rich interaction information between ligand and protein through a binding representation learner (BRL). Considering the scarcity of protein-ligand pairs, we further propose a multi-level next trajectory prediction (MLNTP) task to pre-train ImagePLB on the 4D flexible dynamics trajectory of 16,972 complexes, including ligand-level, protein-level and complex-level, to learn information related to trajectories. Besides, by introducing trajectory regularization (TR), we effectively alleviate the problem of high (even almost identical) feature similarity caused by adjacent trajectories.The proposed pre-training strategies (MLNTP and TR) can further improve the performance of ImagePLB. Compared with the current state-of-the-art methods, ImagePLB has achieved competitive improvements on PLB-related prediction tasks, including protein-ligand affinity and efficacy prediction tasks. This study opens the door to the image-based PLB learning paradigm.All data and implementation details of code can be obtained from https://github.com/HongxinXiang/ImagePLB.},
    issn = {1367-4811},
    doi = {10.1093/bioinformatics/btaf535},
    url = {https://doi.org/10.1093/bioinformatics/btaf535},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaf535/64373676/btaf535.pdf},
}

About

An Image-based Protein-Ligand Binding Representation Learning Framework via Multi-Level Flexible Dynamics Trajectory Pre-training (Bioinformatics 2025)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages