Skip to content

[TMLR 25'] Official implementation of “Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design.”

Notifications You must be signed in to change notification settings

JPaulYang/mask-graph-ssl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design

This repository accompanies the paper: Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design. OpenReview


⚠️ Initial camera-ready release
The current codebase focuses on structure and documentation;
a fully runnable version will be uploaded later.

Python environment setup with Conda

conda create -n mask-graph python=3.10
conda activate mask-graph

pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/test/cu118
pip install torch_geometric
pip install torch-cluster -f https://data.pyg.org/whl/torch-2.2.2%2Bcu118.html
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.2.2%2Bcu118.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-2.2.2%2Bcu118.html
pip install torch-spline-conv -f https://data.pyg.org/whl/torch-2.2.2%2Bcu118.html
pip install rdkit-pypi
pip install pytorch-lightning yacs torchmetrics
pip install performer-pytorch
pip install tensorboardX
pip install ogb
pip install wandb

conda clean --all

Dataset

Download the chem_data. The pre-training dataset is under zinc_standard_agent.

Other datasets are benchmarks from MoleculeNet; we mainly used:

bace, bbbp, tox21, toxcast, muv, hiv, sider (classification); lipophilicity, cep, malaria, esol (regression) Before running experiments:

  • Delete the processed/ directory under each dataset.
  • Need to put a vocabulary for motif label prediction ./zinc_standard_agent/vocab/motif_vocab.txt before pretraining.

🚀 Running the Code

Change directory to the corresponding subfolder for each backbone model.

Pretraining with GIN:

python pretrain_attrmask.py

Pretraining with GraphGPS:

python main.py --cfg ./configs/AttrMask.yaml

Finetuning:

python finetune.py

Motif Extraction

For each dataset, run

/GIN/motif_decompose.py

to extract the motifs vocabulary based on the refined BRICS algorithm.

The extracted motif vocabulary will be saved to:

/${backbone}/dataset/${dataset_name}/vocab/

📊 Label Distribution Analysis

Label statistics can be collected via:

./GIN/label_extraction_*.py

About

[TMLR 25'] Official implementation of “Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design.”

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Languages