-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
The linear projection after the self attention:
bs = self_attention.size(0)
self_attention = self_attention.view(bs, -1)
linear_proj = F.relu(self.linear_projection(self_attention))
From the paper, they said "We project the self-attended neighbor encodings to a LARGER 4x2d dimensional space", so if you flatten out the last two dimensions of "self_attention" before the projection, how can you make sure neighbor < 4?
In my opinion, we should not flatten the last two dimensions before projection, we do projection on the last dimension whose size is 2d, and 2d < 4x2d, so we are projecting it to a larger space.
Please point it out if I understand this wrong at some place, or you do this on purpose for some reason.
Metadata
Metadata
Assignees
Labels
No labels