Skip to content

Conversation

@mmjb
Copy link
Contributor

@mmjb mmjb commented Jul 9, 2020

No description provided.

@mmjb mmjb requested a review from kmaziarz July 9, 2020 17:10
D ~ vertex representation _d_imension
H ~ attention _h_ead size
N ~ _n_um attention heads
P ~ _n_number of node pairs in total, where P = sum(graph_size**2 for graph_size in batch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P ~ number of node _p_airs in total


# Compute attention scores between queries and keys by doing an inner product,
# normalised by the square root of the representation size.
scores = tf.einsum(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it useful to keep information about which kind of attention is being implemented when there is an einsum involved in a comment, for example: Attention(Q, K) = softmax(Q K^T/sqrt(D)) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I included the comment above the einsum to clarify the attention kind, but I think what you mean is you would prefer to have it as a math-y formula as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant the formula, but it's not a biggy. Sorry I forgot about this one - It's easy to forget these PRs exist in amongst all the VSTS noise..

@mmjb mmjb changed the base branch from dev/mabrocks/misc_improvements to master July 17, 2020 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants