Skip to content

On Positional Encoding #3

@JHLew

Description

@JHLew

Hi, thanks for the awesome work.

I was going through the codebase, and came to wonder how the positional information was encoded.

Seems like the TimeSformer Encoder/Decoder uses rotary embedding as default, but the attention mechanism in the code is based on timm --- which does not use rotary embedding.

How is the positional information encoded, does it use rotary embedding? If it does, the current code doesn't seem to apply it properly; is it simply a bug or is there something that I'm missing?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions