On Positional Encoding

Hi, thanks for the awesome work.

I was going through the codebase, and came to wonder how the positional information was encoded.

Seems like the TimeSformer Encoder/Decoder uses rotary embedding as default, but the attention mechanism in the code is based on timm --- which does not use rotary embedding. 

How is the positional information encoded, does it use rotary embedding? If it does, the current code doesn't seem to apply it properly; is it simply a bug or is there something that I'm missing?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On Positional Encoding #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

On Positional Encoding #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions