This repository accompanies the paper: Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design. OpenReview
⚠️ Initial camera-ready release
The current codebase focuses on structure and documentation;
a fully runnable version will be uploaded later.
conda create -n mask-graph python=3.10
conda activate mask-graph
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/test/cu118
pip install torch_geometric
pip install torch-cluster -f https://data.pyg.org/whl/torch-2.2.2%2Bcu118.html
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.2.2%2Bcu118.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-2.2.2%2Bcu118.html
pip install torch-spline-conv -f https://data.pyg.org/whl/torch-2.2.2%2Bcu118.html
pip install rdkit-pypi
pip install pytorch-lightning yacs torchmetrics
pip install performer-pytorch
pip install tensorboardX
pip install ogb
pip install wandb
conda clean --allDownload the chem_data. The pre-training dataset is under zinc_standard_agent.
Other datasets are benchmarks from MoleculeNet; we mainly used:
bace, bbbp, tox21, toxcast, muv, hiv, sider (classification); lipophilicity, cep, malaria, esol (regression) Before running experiments:
- Delete the
processed/directory under each dataset. - Need to put a vocabulary for motif label prediction
./zinc_standard_agent/vocab/motif_vocab.txtbefore pretraining.
Change directory to the corresponding subfolder for each backbone model.
python pretrain_attrmask.pypython main.py --cfg ./configs/AttrMask.yamlpython finetune.pyFor each dataset, run
/GIN/motif_decompose.pyto extract the motifs vocabulary based on the refined BRICS algorithm.
The extracted motif vocabulary will be saved to:
/${backbone}/dataset/${dataset_name}/vocab/Label statistics can be collected via:
./GIN/label_extraction_*.py