You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
just checking if I understand the paper correctly. Are you calculating global self-attention without doing any kind of patch embedding as explained in ViT?
This could explain why the model is training so slow for me...