Skip to content

Andoree/BALI-BERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BALI-BERT

BALI: Enhancing Biomedical Language Representations

Python PyTorch

This is the repository for the BALI: Enhancing Biomedical Language Representations through Knowledge Graph and Language Model Alignment accepted to SIGIR 2025. The paper proposes a novel joint pre-training method that enhances biomedical language models with the information UMLS large biomedical Knowledge Graph (KG) through text-KG representation alignment.

📄 Paper: BALI: Enhancing Biomedical Language Representations through Knowledge Graph and Language Model Alignment
🗓️ SIGIR 2025

Installation

git clone https://github.com/Andoree/BALI-BERT.git
cd BALI-BERT

# Create conda environment
conda create -n bali python=3.8
conda activate bali

# Install dependencies
pip install -r requirements.txt

Available HuggingFace Checkpoiints

Model Description
andorei/BALI-BERT-BioLinkBERT-large-lingraph BioLinkBERT-large model pre-trained with BALI using linear graph encoder
andorei/BALI-BERT-BioLinkBERT-base-GNN BioLinkBERT-base model pre-trained with BALI using GNN-based alignment
andorei/BALI-BERT-PubMedBERT-GNN PubMedBERT model pre-trained with BALI using GNN-based alignment

Evaluation QA data:

📚 Citation

If you use our model in your research, please cite our SIGIR 2025 paper:

@inproceedings{Sakhovskiy2025BALI,
  author = {Sakhovskiy, Andrey and Tutubalina, Elena},
  title = {BALI: Enhancing Biomedical Language Representations through Knowledge Graph and Language Model Alignment},
  booktitle = {Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '25)},
  year = {2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages