Skip to content

PalAvik/analyzing-lrgb

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

172 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analyzing the Long Range Graph Benchmarks and enhancing model performance on them

In this repo, we provide the source code to train various GNN models on the proposed LRGB datasets. We also provide scripts to run baseline and exploratory experiments.

Python environment setup with Conda

conda create -n lrgb python=3.9
conda activate lrgb

pip install torch torchvision torchaudio
pip install torch_geometric==2.0.2 performer-pytorch torchmetrics==0.7.2
pip install ogb wandb pytorch_lightning yacs 
pip install torch_scatter torch_sparse 
pip install tensorboardX
pip install numba
pip install e3nn

All our important model-checkpoints and results generated are stored in the folder: https://drive.google.com/drive/folders/1-WCfFcvqB2uVr12NtNb7XcX-oUdyx6sF?usp=drive_link.

Training the Graph models

The configuration for training graph model on a particular LRGB dataset is provided in src/configs directory. The training process can be started by running the following script,

For GCN and Transformers+LapPE

python src/main.py --cfg src/configs/GCN/vocsuperpixels-GCN.yaml device cuda:0 wandb.use False

python src/main.py --cfg src/configs/GT/vocsuperpixels-Transformer+LapPE.yaml device cuda:0 wandb.use False

For E(n)-Invariant (ENN) and E(n)-Equivariant (EGNN)

python src/main_egnn.py --cfg src/configs/ENN/vocsuperpixels-ENN.yaml device cuda:0 wandb.use False

python src/main_egnn.py --cfg src/configs/EGNN/vocsuperpixels-EGNN.yaml device cuda:0 wandb.use False

For E(3)-Steerable (SEGNN)

python src/main_steer.py --cfg src/configs/SCGNN/vocsuperpixels-SCGNN.yaml device cuda:0 wandb.use False

W&B logging

To use W&B logging, set wandb.use True and have a gtransformers entity set-up in your W&B account (or change it to whatever else you like by setting wandb.entity).

Calculate Influence Scores

After training, the model checkpoints could be loaded up directly by providing the checkpoint path in the configuration at cfg.train.finetune and set cfg.train.freeze_pretrained to True. To compute influence scores, run src/model_inference.py and provide the relevant configuration file,

python src/model_inference.py --cfg src/configs/EGNN/vocsuperpixels-EGNN.yaml device cuda:0 wandb.use False train.finetune /path/to/ckpt_dir train.freeze_pretrained True

Accuracy and F1 experiments

To run the experiments that relate accuracy and f1 score to experiments:

  • navigate to the directory containing noising_experiment.py.
  • Set up the config files, as was required for model training.
  • Run the following terminal command to generate the raw data

python noising_experiment.py --cfg <path_to_config_file> --output_file <path_to_file_to_save_results_to> --device <cuda_device> --num_graphs <number_of_graphs_to_generate_results_for>

  • Upon completion, the program will dump a pickled dataframe containing the results.
  • Load that dataframe in the notebook 'noiser_results.ipynb' using
df = pre_process_df(output_file_path)
ra = get_relative_accuracy(df)
f1 = get_relative_f1s(df)  

There are two pre-prepared pickle files available for use. They are found here in the Drive folder.

Influence Score experiments

To run the influence score experiments:

  • navigate to the directory containing model_inference.py.
  • Set up the config files, as was required for model training.
  • Run the following terminal command to generate the raw gradient information:
python model_inference.py --cfg <path_to_config_file> device 'cuda_device'
  • This will output a pickle file containing the gradients to "inf_scores_{model_type}.pkl".
  • Now run the following code to plot the influence score for a given model:
from src.influence import process_all_graphs, plot_mean_influence_by_distance
import matplotlib.pyplot as plt

file_name = './path/to/pickle/
influence_df_gcn = process_all_graphs('inf_scores_gcn_with_adj.pkl', normalise=True)

fig, ax = plt.subplots()
plot_mean_influence_by_distance(influence_df_gcn, ax, 'GCN')

ax.set_xlabel('Shortest path distance from target node')
ax.set_ylabel('Proportion of total gradient')
ax.legend()

Alternatively, the notebook 'influence.ipynb' contains an example of this pipeline.

pickle files containing the influence scores are avaiable here in the Drive folder.

assets/influence_experiments/<model_name>.pkl

Calculate cheeger constant, diameter and normalized shortest path

python src/metrices_main.py --cfg src/configs/GCN/vocsuperpixels-GCN.yaml device cuda:0 wandb.use False

The pkl file containing the metrics generated by running this command can be found here.

The code notebooks of the visualizations generated from these pkls can be found here.

The results from the model and influence scores can be found in the following Colab Notebook.

About

We replicate results of the original LRGB paper. We also explore alternative approaches to mitigate the problem of Long Range Interactions (LRI). Furthermore, we also conduct qualitative and quantitative experiments to assess whether the LRGB datasets are representative of LRI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 54.0%
  • Python 42.3%
  • Shell 3.7%