Skip to content

Make attention multi work with attn pattern > 1 #4

@neelnanda-io

Description

@neelnanda-io

Related to: #3

It's nice to be able to plot a function that's >1, eg value weighted attention, even if it's always positive.

The obvious thing to do is just scale it down so the max is one, across all heads. It's also good to let the user pass in a max_range value explicitly, so you get consistent meaning of colours when running the head on different text

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions