Multi-head self/ cross attention

Hello:)
Thank you so much for sharing your code. It has been very useful in understanding the paper.

There is still something I don't quite get from the paper and the code. It From my understanding, the paper mentioned that the attention is calculated separately on multiple heads before being combined. However, I could not find how many heads are used in the code (not sure if I am confused about multi-head part). Could you point to where you defined the number of heads? I understand that in the paper, they did an experiment to analyse the number of attention heads for MHSA from 1 to 8.

Thank you.
Charlene 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-head self/ cross attention #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Multi-head self/ cross attention #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions