Is there a bug in the `fastv_kvcache.py` ?

The core code in `fastv_kvcache.py` is as follows. To apply **FastV** in layer `K`, we need to obtain the attention matrix in layer `K-1`. The following code implements this by extra forward in `K-1`th `LlamaDecoderLayer` with `output_attentions=True`. However, the extra forward would update the KV Cache, resulting in an additional KV Cache copy in layer `K-1`, which I verified during single-step debugging. Could you explain this phenomenon? Is it a bug?

```
K = 3
ratio = 0.5

if decoder_layer.self_attn.layer_idx == K and seq_length > 1:
    device = hidden_states.device
    image_attention_score = self.last_attention.mean(dim=1)[0][-1][35:611]  
    top_attention_rank_index = image_attention_score.topk(int(576 * ratio)).indices + 35
    keep_indexs = torch.cat((torch.arange(35,device=device), top_attention_rank_index, torch.arange(611,seq_length,device=device)))
    keep_indexs = keep_indexs.sort().values
    hidden_states = hidden_states[:,keep_indexs,:]
    if attention_mask is not None:
        attention_mask = attention_mask[:,:,:hidden_states.shape[1],:hidden_states.shape[1]]
    position_ids = keep_indexs.unsqueeze(0)

if decoder_layer.self_attn.layer_idx == K - 1:
    temp_layer_outputs = decoder_layer(
        hidden_states,
        attention_mask=attention_mask,
        position_ids=position_ids,
        past_key_value=past_key_values,
        output_attentions=True,
        use_cache=use_cache,
    )
    self.last_attention = temp_layer_outputs[1]

layer_outputs = decoder_layer(
    hidden_states,
    attention_mask=attention_mask,
    position_ids=position_ids,
    past_key_value=past_key_values,
    output_attentions=output_attentions,
    use_cache=use_cache,
)
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a bug in the `fastv_kvcache.py` ? #46

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is there a bug in the fastv_kvcache.py ? #46

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Is there a bug in the `fastv_kvcache.py` ? #46