Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design

This repository accompanies the paper: Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design. OpenReview

⚠️ Initial camera-ready release
The current codebase focuses on structure and documentation;
a fully runnable version will be uploaded later.

Python environment setup with Conda

conda create -n mask-graph python=3.10
conda activate mask-graph

pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/test/cu118
pip install torch_geometric
pip install torch-cluster -f https://data.pyg.org/whl/torch-2.2.2%2Bcu118.html
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.2.2%2Bcu118.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-2.2.2%2Bcu118.html
pip install torch-spline-conv -f https://data.pyg.org/whl/torch-2.2.2%2Bcu118.html
pip install rdkit-pypi
pip install pytorch-lightning yacs torchmetrics
pip install performer-pytorch
pip install tensorboardX
pip install ogb
pip install wandb

conda clean --all

Dataset

Download the chem_data. The pre-training dataset is under zinc_standard_agent.

Other datasets are benchmarks from MoleculeNet; we mainly used:

bace, bbbp, tox21, toxcast, muv, hiv, sider (classification); lipophilicity, cep, malaria, esol (regression) Before running experiments:

Delete the processed/ directory under each dataset.
Need to put a vocabulary for motif label prediction ./zinc_standard_agent/vocab/motif_vocab.txt before pretraining.

🚀 Running the Code

Change directory to the corresponding subfolder for each backbone model.

Pretraining with GIN:

python pretrain_attrmask.py

Pretraining with GraphGPS:

python main.py --cfg ./configs/AttrMask.yaml

Finetuning:

python finetune.py

Motif Extraction

For each dataset, run

/GIN/motif_decompose.py

to extract the motifs vocabulary based on the refined BRICS algorithm.

The extracted motif vocabulary will be saved to:

/${backbone}/dataset/${dataset_name}/vocab/

📊 Label Distribution Analysis

Label statistics can be collected via:

./GIN/label_extraction_*.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
GIN		GIN
GraphGPS		GraphGPS
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design

Python environment setup with Conda

Dataset

🚀 Running the Code

Pretraining with GIN:

Pretraining with GraphGPS:

Finetuning:

Motif Extraction

📊 Label Distribution Analysis

About

Uh oh!

Releases

Languages

JPaulYang/mask-graph-ssl

Folders and files

Latest commit

History

Repository files navigation

Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design

Python environment setup with Conda

Dataset

🚀 Running the Code

Pretraining with GIN:

Pretraining with GraphGPS:

Finetuning:

Motif Extraction

📊 Label Distribution Analysis

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Languages