-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Hi, thanks for the awesome work.
I was going through the codebase, and came to wonder how the positional information was encoded.
Seems like the TimeSformer Encoder/Decoder uses rotary embedding as default, but the attention mechanism in the code is based on timm --- which does not use rotary embedding.
How is the positional information encoded, does it use rotary embedding? If it does, the current code doesn't seem to apply it properly; is it simply a bug or is there something that I'm missing?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels