-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
I think that the d_model parameter (the embedding dimension) would take a significantly larger value than the value currently used. It is usually a multiple of num_heads, which usually take the value 8, so maybe an initial value of 32 would make sense here? Or is there a specific reasoning behind using a smaller value for it?
Metadata
Metadata
Assignees
Labels
No labels