-
Notifications
You must be signed in to change notification settings - Fork 72
Allow for Transformer layers operating on densely connected graph #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| D ~ vertex representation _d_imension | ||
| H ~ attention _h_ead size | ||
| N ~ _n_um attention heads | ||
| P ~ _n_number of node pairs in total, where P = sum(graph_size**2 for graph_size in batch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P ~ number of node _p_airs in total
|
|
||
| # Compute attention scores between queries and keys by doing an inner product, | ||
| # normalised by the square root of the representation size. | ||
| scores = tf.einsum( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it useful to keep information about which kind of attention is being implemented when there is an einsum involved in a comment, for example: Attention(Q, K) = softmax(Q K^T/sqrt(D)) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, I included the comment above the einsum to clarify the attention kind, but I think what you mean is you would prefer to have it as a math-y formula as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant the formula, but it's not a biggy. Sorry I forgot about this one - It's easy to forget these PRs exist in amongst all the VSTS noise..
No description provided.