Criteria for selecting tokens

Hello, 
I am currently reviewing the implementation of the code and have encountered a potential discrepancy between the code and the description provided in the article. The article states:
`"We simply compute the average attention-score one token received from all other tokens as the criteria ϕattn in our experiment."`

However, upon examining the code, it appears that the implementation calculates the score for the last token, rather than the average score for all tokens.
`last_layer_attention_avg_last_tok = last_layer_attention_avg[-1]`
[https://github.com/pkunlp-icler/FastV/blob/main/src/transformers/src/transformers/models/llama/modeling_llama.py#L730](https://github.com/pkunlp-icler/FastV/blob/main/src/transformers/src/transformers/models/llama/modeling_llama.py#L730)

 I would appreciate it if you could explain the reasoning behind the current implementation.  Thank you for your assistance.
Best regards！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Criteria for selecting tokens #48

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Criteria for selecting tokens #48

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions