Skip to content

Potential Mistake in kvcache implementation--Why always use the same-layer kvcache? #49

@lxt160980

Description

@lxt160980

https://github.com/pkunlp-icler/FastV/blob/d1659729b5bf1be225e99ee15783deeea80f63b1/src/transformers/src/transformers/models/llama/modeling_llama_fastv_kvcache.py#L784C1-L797C25
在这段代码里,似乎在AGGLayer获得当前层的pruned KVCache(new_past_key_value)之后,在后续层就一直使用这个值,难道不该保留keep_indices,在每一层针对当前层的kvcache重新取一次吗,还是我误解了呢.
顺便还想问一下,对于last_layer_attention_avg = torch.mean(last_layer_attention, dim=1)[0],请问目前这套代码是只支持batchsize=1吗.
期待您的回复

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions