This repository serves as the official implementation and reproducibility package for the paper "GPCRact: a hierarchical framework for predicting ligand-induced GPCR activity via allosteric communication modeling".
We provide the complete source code, preprocessed datasets, training scripts, and analysis notebooks required to reproduce the findings presented in the manuscript.
We have unified all resources into a single structured repository to facilitate full reproducibility.
GPCRact/
├── analysis/ # Jupyter Notebooks for reproducing figures and statistical analyses
├── benchmarks/ # Implementation of baseline models (DeepREAL, AiGPro, 3D-GNN)
├── configs/ # Configuration files (YAML) for training and HPO
├── data/ # Datasets
│ ├── raw/ # Raw data files (GPCRactDB v1)
│ ├── resources/ # Auxiliary bio-info files (PDB info, MSA, etc.)
│ └── splits/ # Exact Train/Val/Test scaffold splits used in the paper
├── preprocessing/ # Scripts to reconstruct the dataset from scratch
├── scripts/ # Executable scripts for Training, Inference, and HPO
├── src/ # Core library code (Model architecture, Layers, Dataloaders)
├── environment.yml # Conda environment file
└── README.md # Master documentation
We recommend using Conda to manage the environment for full reproducibility.
-
Clone the repository:
git clone https://github.com/hyojin0912/HJ-GPCRact.git cd HJ-GPCRact -
Create and activate the Conda environment:
conda env create -f environment.yml conda activate gpcract
Alternatively, you can install packages using pip:
pip install -r requirements.txt
This section explicitly delineates the steps to reproduce the results reported in our study.
Users can reconstruct the GPCRactDB from raw public data or use the pre-generated splits provided in data/splits/. To build from scratch, follow the pipeline in the preprocessing/ directory:
# Example: Running the final dataset creation step
jupyter notebook preprocessing/04_create_final_dataset.ipynb- Note: The exact scaffold-based split files (
scaffold_train.csv,scaffold_val.csv,scaffold_test.csv) used in our study are already provided indata/splits/to ensure fair benchmarking.
To train the GPCRact model from scratch using the provided splits:
- Configure: Modify
configs/training_config.yamlif necessary. - Run: Execute the training script.
python scripts/train.py \
--data_dir data/splits \
--save_dir checkpoints/ \
--epochs 100For detailed arguments, see scripts/README.md.
To predict the activity (Agonist/Antagonist/Non-binder) of novel GPCR-ligand pairs using a trained model:
python scripts/inference.py \
--data_dir data/splits \
--model_path checkpoints/best_model.pt \
--output_dir results/We provide the full source code and execution scripts for the baseline models compared in the manuscript (DeepREAL, AiGPro, 3D-GNN). All baselines were retrained on the identical GPCRact dataset.
-
DeepREAL: See
benchmarks/DeepREAL/ -
AiGPro: See
benchmarks/AiGPro/(Docker support included) -
3D-GNN Baseline: See
benchmarks/3D-GNN/
To reproduce the statistical analyses, mechanistic interpretations, and main figures (Fig 1, 3, 4, 7), run the notebooks in the analysis/ directory.
-
01_receptor_dynamics_analysis.ipynb: Structural ground truth analysis (Fig 1). -
02_sequence_structure_correlation.ipynb: MSA vs. 3D dynamics (Fig 3). -
03_activity_decision_tree.ipynb: Decision tree for activity rules (Fig 4). -
04_mechanistic_interpretability.ipynb: Attention weight analysis (Fig 7).
Supplementary Validations: PRS analysis, Sensitivity analysis, and Mutation studies are also included.
Our manuscript is currently under review. If you use GPCRact in your research, we would appreciate it if you could cite our work upon its publication.
For questions, bug reports, or feedback, please contact Hyojin Son at hyojin0912@kaist.ac.kr.
