Skip to content

Jiadong001/ComIN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ComIN: Common Interface Network for Multi-domain Biomolecular Interaction Learning

🖼️ Overview of ComIN

ComIN is a universal framework that learns biomolecular interface representations via contrastive learning on interface atom graphs, jointly trained on protein-protein, protein-peptide, and protein-small molecule interactions. ComIN uses a geometry-aware VisNet encoder to extract invariant representations from atomic graphs of receptor-ligand interfaces, optimized via InfoNCE loss to discriminate binding patterns.

overview

🛠️ Dependency

You can install the environment using env.yaml/requirements.txt or manually follow these steps:

# Create and activate environment
conda create -n graph python=3.8
conda activate graph

# Basic packages
pip install pandas==2.0.3 numpy==1.24.4 scikit-learn tqdm

# PyTorch (CUDA 11.8)
pip install torch==2.1.1+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

# PyG extensions
pip install torch_scatter torch_cluster -f https://pytorch-geometric.com/whl/torch-2.1.1+cu118.html
pip install torch-geometric

# Plotting and notebooks
pip install matplotlib seaborn jupyter notebook

# Structural tools
pip install freesasa
conda install mmseqs2=17.* -c conda-forge -c bioconda

📊 Data

Main Datasets

  • Main datasets are curated from the sequence non-redundant set of Q-BioLiP. See data_preparation folder.

  • data_preparation/pkls: Data for ComIN-Base (4.5 Å proximity threshold).

  • data_preparation/pkls_large: Data for ComIN-Large (6.0 Å proximity threshold).

Downstream Datasets

Due to size limits, they are hosted on Google Drive.

🚀 Training

All source code and configurations are located in the src directory.

  • src/configs/: YAML files for model hyperparameters.
  • src/scripts/: Training via bash train.sh.

Trained weights are stored in the ckpts directory.

Model Configuration Checkpoint Path
ComIN-Base src/configs/train.yaml ckpts/default
ComIN-Large src/configs/train_large.yaml ckpts/large

🧪 Evaluation

General Interaction Prediction

To evaluate the model on the Test sets:

cd src/scripts
bash test.sh

Downstream Applications

We provide notebooks in notebooks for specialized evaluations:

  • Protein-small molecule: Pocket classification & Virtual screening (LIT-PCBA).

  • Peptide-HLA: Binding prediction (HLA3DB).

  • Protein-cyclic peptide: Target region prediction (CPset).

  • Antibody-antigen: Antibody-specific epitope identification (SAbDab).

To-do list:

  • [] upload downstream_data and notebooks

About

Common Interface Network for Multi-domain Biomolecular Interaction Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors