s_k^i #5

Open

opened

on Feb 21, 2025

thanks for your paper.
but , Is it wrong on figure? Why use s_k^i -1 instead of s_k^i for k scaling in the attention layer?

Metadata

Assignees

No one assigned

Labels

No labels

No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests