Skip to content

repository for "GPCRact: a hierarchical framework for predicting ligand-induced GPCR activity via allosteric communication modeling"

Notifications You must be signed in to change notification settings

hyojin0912/HJ-GPCRact

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPCRact

License: MIT Python 3.9+ Dataset

This repository serves as the official implementation and reproducibility package for the paper "GPCRact: a hierarchical framework for predicting ligand-induced GPCR activity via allosteric communication modeling".

We provide the complete source code, preprocessed datasets, training scripts, and analysis notebooks required to reproduce the findings presented in the manuscript.


Figure2

📋 Table of Contents


📁 Repository Structure

We have unified all resources into a single structured repository to facilitate full reproducibility.

GPCRact/
├── analysis/           # Jupyter Notebooks for reproducing figures and statistical analyses
├── benchmarks/         # Implementation of baseline models (DeepREAL, AiGPro, 3D-GNN)
├── configs/            # Configuration files (YAML) for training and HPO
├── data/               # Datasets
│   ├── raw/            # Raw data files (GPCRactDB v1)
│   ├── resources/      # Auxiliary bio-info files (PDB info, MSA, etc.)
│   └── splits/         # Exact Train/Val/Test scaffold splits used in the paper
├── preprocessing/      # Scripts to reconstruct the dataset from scratch
├── scripts/            # Executable scripts for Training, Inference, and HPO
├── src/                # Core library code (Model architecture, Layers, Dataloaders)
├── environment.yml     # Conda environment file
└── README.md           # Master documentation

⚙️ Installation

We recommend using Conda to manage the environment for full reproducibility.

  1. Clone the repository:

    git clone https://github.com/hyojin0912/HJ-GPCRact.git
    cd HJ-GPCRact
  2. Create and activate the Conda environment:

    conda env create -f environment.yml
    conda activate gpcract

    Alternatively, you can install packages using pip:

    pip install -r requirements.txt

🔬 Reproducibility Workflow

This section explicitly delineates the steps to reproduce the results reported in our study.

Step 1: Data Construction

Users can reconstruct the GPCRactDB from raw public data or use the pre-generated splits provided in data/splits/. To build from scratch, follow the pipeline in the preprocessing/ directory:

# Example: Running the final dataset creation step
jupyter notebook preprocessing/04_create_final_dataset.ipynb
  • Note: The exact scaffold-based split files (scaffold_train.csv, scaffold_val.csv, scaffold_test.csv) used in our study are already provided in data/splits/ to ensure fair benchmarking.

Step 2: Training the Model 🏋️‍♂️

To train the GPCRact model from scratch using the provided splits:

  1. Configure: Modify configs/training_config.yaml if necessary.
  2. Run: Execute the training script.
python scripts/train.py \
    --data_dir data/splits \
    --save_dir checkpoints/ \
    --epochs 100

For detailed arguments, see scripts/README.md.

Step 3: Inference 🚀

To predict the activity (Agonist/Antagonist/Non-binder) of novel GPCR-ligand pairs using a trained model:

python scripts/inference.py \
    --data_dir data/splits \
    --model_path checkpoints/best_model.pt \
    --output_dir results/

Step 4: Benchmarking 📊

We provide the full source code and execution scripts for the baseline models compared in the manuscript (DeepREAL, AiGPro, 3D-GNN). All baselines were retrained on the identical GPCRact dataset.

  • DeepREAL: See benchmarks/DeepREAL/

  • AiGPro: See benchmarks/AiGPro/ (Docker support included)

  • 3D-GNN Baseline: See benchmarks/3D-GNN/

Step 5: Analysis & Figure Generation 📉

To reproduce the statistical analyses, mechanistic interpretations, and main figures (Fig 1, 3, 4, 7), run the notebooks in the analysis/ directory.

  • 01_receptor_dynamics_analysis.ipynb: Structural ground truth analysis (Fig 1).

  • 02_sequence_structure_correlation.ipynb: MSA vs. 3D dynamics (Fig 3).

  • 03_activity_decision_tree.ipynb: Decision tree for activity rules (Fig 4).

  • 04_mechanistic_interpretability.ipynb: Attention weight analysis (Fig 7).

Supplementary Validations: PRS analysis, Sensitivity analysis, and Mutation studies are also included.

🎓 Citation

Our manuscript is currently under review. If you use GPCRact in your research, we would appreciate it if you could cite our work upon its publication.

📬 Contact

For questions, bug reports, or feedback, please contact Hyojin Son at hyojin0912@kaist.ac.kr.

About

repository for "GPCRact: a hierarchical framework for predicting ligand-induced GPCR activity via allosteric communication modeling"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published