Related to: #3
It's nice to be able to plot a function that's >1, eg value weighted attention, even if it's always positive.
The obvious thing to do is just scale it down so the max is one, across all heads. It's also good to let the user pass in a max_range value explicitly, so you get consistent meaning of colours when running the head on different text
Related to: #3
It's nice to be able to plot a function that's >1, eg value weighted attention, even if it's always positive.
The obvious thing to do is just scale it down so the max is one, across all heads. It's also good to let the user pass in a max_range value explicitly, so you get consistent meaning of colours when running the head on different text