It's nice to be able to plot a function on pairs of tokens, like attention, but where values can be positive or negative (eg, attention head logit contribution or attention head neuron contribution), currently AttnMulti is bad at this. I think the easiest way would be adjusting the color map function to give a pair of colours to each head, one for + and one for -
It's nice to be able to plot a function on pairs of tokens, like attention, but where values can be positive or negative (eg, attention head logit contribution or attention head neuron contribution), currently AttnMulti is bad at this. I think the easiest way would be adjusting the color map function to give a pair of colours to each head, one for + and one for -