Skip to content

[Briefings in Bioinformatics]* We propose a novel FusionGDA model, which utilises a pre-training phase with a fusion module to enrich the gene and disease semantic representations encoded by pre-trained language models.

Notifications You must be signed in to change notification settings

ZhaohanM/FusionGDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧬 Heterogeneous Biomedical Entity Representation Learning for Gene–Disease Association Prediction

Paper Demo License


The FusionGDA model introduces an attention-based fusion module to enrich the semantic representations of genes and diseases encoded by pre-trained language models, enabling more accurate gene–disease association (GDA) prediction.


🧩 Framework

FusionGDA Framework

⚙️ Installation

# Download and install Anaconda
wget https://repo.anaconda.com/archive/Anaconda3-latest-Linux-x86_64.sh
bash Anaconda3-latest-Linux-x86_64.sh -b
rm Anaconda3-latest-Linux-x86_64.sh
export PATH="/root/anaconda3/bin:$PATH"

# Update Anaconda packages
conda update --all

# Install PyTorch with CUDA support (adjust CUDA version if needed)
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

# Install dependencies
pip install wandb PyTDC lightgbm pytorch-metric-learning

🚀 Execution

Ensure you are in the directory:

~/dpa_pretrain/scripts

Then adjust parameters as required.

🔹 Pre-training Phase

bash run_pretrain_gda_ml_adapter_infoNCE.sh

🔹 Fine-tuning Phase

TDC Dataset

bash run_finetune_gda_lightgbm_infoNCE_tdc.sh

DisGeNET Dataset

bash run_finetune_gda_lightgbm_infoNCE.sh

Results can be tracked through your Weights & Biases account.


📊 Datasets

All datasets are obtained from the following public biomedical repositories:

The specific versions used in our experiments are stored in the shared Drive:
👉 Shared Drive Link


📝 Citation

If you find FusionGDA useful for your research, please cite:

@article{meng2024heterogeneous,
  title={Heterogeneous biomedical entity representation learning for gene-disease association prediction},
  author={Meng, Zhaohan and Liu, Siwei and Liang, Shangsong and Jani, Bhautesh and Meng, Zaiqiao},
  journal={Briefings in Bioinformatics},
  volume={25},
  number={5},
  pages={bbae380},
  year={2024},
  publisher={Oxford University Press}
}

🧠 Developed by the AI4BioMed Lab,
School of Computing Science, University of Glasgow, UK 🇬🇧

About

[Briefings in Bioinformatics]* We propose a novel FusionGDA model, which utilises a pre-training phase with a fusion module to enrich the gene and disease semantic representations encoded by pre-trained language models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published