Skip to content

Criteria for selecting tokens #48

@EvelynZhang-epiclab

Description

@EvelynZhang-epiclab

Hello,
I am currently reviewing the implementation of the code and have encountered a potential discrepancy between the code and the description provided in the article. The article states:
"We simply compute the average attention-score one token received from all other tokens as the criteria ϕattn in our experiment."

However, upon examining the code, it appears that the implementation calculates the score for the last token, rather than the average score for all tokens.
last_layer_attention_avg_last_tok = last_layer_attention_avg[-1]
https://github.com/pkunlp-icler/FastV/blob/main/src/transformers/src/transformers/models/llama/modeling_llama.py#L730

I would appreciate it if you could explain the reasoning behind the current implementation. Thank you for your assistance.
Best regards!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions