Example use case: we have 3 tokens at the end of a prompt, and we want to see the attention probs from those back to all other tokens in the sequence. This could be done via something like
cv.attention.attention_patterns(
attention = attention,
src_tokens = tokens,
dest_tokens = tokens[-3:],
)
Not sure how difficult this would be to implement.
Example use case: we have 3 tokens at the end of a prompt, and we want to see the attention probs from those back to all other tokens in the sequence. This could be done via something like
Not sure how difficult this would be to implement.