Hello! Thanks for your implement. I have noticed that there are some graphormer attention layers in your code(the ones with edge encoding). However, I checked the official code of graphormer and found that the network has only one bias_layer(with edge encoding+sptial encoding) and the rest are regular multi head attention layers.
Besides, I think an important reason for slow is that EdgeEncoding fuction is called many times in the multi head attention, which is unnecessary.
Thanks again for your sharing!