-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Hello,
I am currently reviewing the implementation of the code and have encountered a potential discrepancy between the code and the description provided in the article. The article states:
"We simply compute the average attention-score one token received from all other tokens as the criteria ϕattn in our experiment."
However, upon examining the code, it appears that the implementation calculates the score for the last token, rather than the average score for all tokens.
last_layer_attention_avg_last_tok = last_layer_attention_avg[-1]
https://github.com/pkunlp-icler/FastV/blob/main/src/transformers/src/transformers/models/llama/modeling_llama.py#L730
I would appreciate it if you could explain the reasoning behind the current implementation. Thank you for your assistance.
Best regards!