The FusionGDA model introduces an attention-based fusion module to enrich the semantic representations of genes and diseases encoded by pre-trained language models, enabling more accurate gene–disease association (GDA) prediction.
# Download and install Anaconda
wget https://repo.anaconda.com/archive/Anaconda3-latest-Linux-x86_64.sh
bash Anaconda3-latest-Linux-x86_64.sh -b
rm Anaconda3-latest-Linux-x86_64.sh
export PATH="/root/anaconda3/bin:$PATH"
# Update Anaconda packages
conda update --all
# Install PyTorch with CUDA support (adjust CUDA version if needed)
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
# Install dependencies
pip install wandb PyTDC lightgbm pytorch-metric-learningEnsure you are in the directory:
~/dpa_pretrain/scriptsThen adjust parameters as required.
bash run_pretrain_gda_ml_adapter_infoNCE.shTDC Dataset
bash run_finetune_gda_lightgbm_infoNCE_tdc.shDisGeNET Dataset
bash run_finetune_gda_lightgbm_infoNCE.shResults can be tracked through your Weights & Biases account.
All datasets are obtained from the following public biomedical repositories:
- TDC: https://tdcommons.ai/
- DisGeNET: https://www.disgenet.org/
The specific versions used in our experiments are stored in the shared Drive:
👉 Shared Drive Link
If you find FusionGDA useful for your research, please cite:
@article{meng2024heterogeneous,
title={Heterogeneous biomedical entity representation learning for gene-disease association prediction},
author={Meng, Zhaohan and Liu, Siwei and Liang, Shangsong and Jani, Bhautesh and Meng, Zaiqiao},
journal={Briefings in Bioinformatics},
volume={25},
number={5},
pages={bbae380},
year={2024},
publisher={Oxford University Press}
}🧠 Developed by the AI4BioMed Lab,
School of Computing Science, University of Glasgow, UK 🇬🇧
