ComIN: Common Interface Network for Multi-domain Biomolecular Interaction Learning

🖼️ Overview of ComIN

ComIN is a universal framework that learns biomolecular interface representations via contrastive learning on interface atom graphs, jointly trained on protein-protein, protein-peptide, and protein-small molecule interactions. ComIN uses a geometry-aware VisNet encoder to extract invariant representations from atomic graphs of receptor-ligand interfaces, optimized via InfoNCE loss to discriminate binding patterns.

🛠️ Dependency

You can install the environment using env.yaml/requirements.txt or manually follow these steps:

# Create and activate environment
conda create -n graph python=3.8
conda activate graph

# Basic packages
pip install pandas==2.0.3 numpy==1.24.4 scikit-learn tqdm

# PyTorch (CUDA 11.8)
pip install torch==2.1.1+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

# PyG extensions
pip install torch_scatter torch_cluster -f https://pytorch-geometric.com/whl/torch-2.1.1+cu118.html
pip install torch-geometric

# Plotting and notebooks
pip install matplotlib seaborn jupyter notebook

# Structural tools
pip install freesasa
conda install mmseqs2=17.* -c conda-forge -c bioconda

📊 Data

Main Datasets

Main datasets are curated from the sequence non-redundant set of Q-BioLiP. See data_preparation folder.
data_preparation/pkls: Data for ComIN-Base (4.5 Å proximity threshold).
data_preparation/pkls_large: Data for ComIN-Large (6.0 Å proximity threshold).

Downstream Datasets

Processed data for downstream tasks, sourced from LIT-PCBA, HLA3DB, CPset, and SAbDab. See downstream_data folder.

Due to size limits, they are hosted on Google Drive.

🚀 Training

All source code and configurations are located in the src directory.

src/configs/: YAML files for model hyperparameters.
src/scripts/: Training via bash train.sh.

Trained weights are stored in the ckpts directory.

Model	Configuration	Checkpoint Path
ComIN-Base	`src/configs/train.yaml`	`ckpts/default`
ComIN-Large	`src/configs/train_large.yaml`	`ckpts/large`

🧪 Evaluation

General Interaction Prediction

To evaluate the model on the Test sets:

cd src/scripts
bash test.sh

Downstream Applications

We provide notebooks in notebooks for specialized evaluations:

Protein-small molecule: Pocket classification & Virtual screening (LIT-PCBA).
Peptide-HLA: Binding prediction (HLA3DB).
Protein-cyclic peptide: Target region prediction (CPset).
Antibody-antigen: Antibody-specific epitope identification (SAbDab).

To-do list:

[] upload downstream_data and notebooks

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
ckpts		ckpts
data_preparation		data_preparation
downstream_data		downstream_data
src		src
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ComIN: Common Interface Network for Multi-domain Biomolecular Interaction Learning

🖼️ Overview of ComIN

🛠️ Dependency

📊 Data

🚀 Training

🧪 Evaluation

General Interaction Prediction

Downstream Applications

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ComIN: Common Interface Network for Multi-domain Biomolecular Interaction Learning

🖼️ Overview of ComIN

🛠️ Dependency

📊 Data

🚀 Training

🧪 Evaluation

General Interaction Prediction

Downstream Applications

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages