Skip to content

About the dimension projection #33

@shaonanqinghuaizongshishi

Description

The linear projection after the self attention:
bs = self_attention.size(0)
self_attention = self_attention.view(bs, -1)
linear_proj = F.relu(self.linear_projection(self_attention))

From the paper, they said "We project the self-attended neighbor encodings to a LARGER 4x2d dimensional space", so if you flatten out the last two dimensions of "self_attention" before the projection, how can you make sure neighbor < 4?

In my opinion, we should not flatten the last two dimensions before projection, we do projection on the last dimension whose size is 2d, and 2d < 4x2d, so we are projecting it to a larger space.

Please point it out if I understand this wrong at some place, or you do this on purpose for some reason.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions