Skip to content

Is there a bug in the fastv_kvcache.py ? #46

@Fanziyang-v

Description

@Fanziyang-v

The core code in fastv_kvcache.py is as follows. To apply FastV in layer K, we need to obtain the attention matrix in layer K-1. The following code implements this by extra forward in K-1th LlamaDecoderLayer with output_attentions=True. However, the extra forward would update the KV Cache, resulting in an additional KV Cache copy in layer K-1, which I verified during single-step debugging. Could you explain this phenomenon? Is it a bug?

K = 3
ratio = 0.5

if decoder_layer.self_attn.layer_idx == K and seq_length > 1:
    device = hidden_states.device
    image_attention_score = self.last_attention.mean(dim=1)[0][-1][35:611]  
    top_attention_rank_index = image_attention_score.topk(int(576 * ratio)).indices + 35
    keep_indexs = torch.cat((torch.arange(35,device=device), top_attention_rank_index, torch.arange(611,seq_length,device=device)))
    keep_indexs = keep_indexs.sort().values
    hidden_states = hidden_states[:,keep_indexs,:]
    if attention_mask is not None:
        attention_mask = attention_mask[:,:,:hidden_states.shape[1],:hidden_states.shape[1]]
    position_ids = keep_indexs.unsqueeze(0)

if decoder_layer.self_attn.layer_idx == K - 1:
    temp_layer_outputs = decoder_layer(
        hidden_states,
        attention_mask=attention_mask,
        position_ids=position_ids,
        past_key_value=past_key_values,
        output_attentions=True,
        use_cache=use_cache,
    )
    self.last_attention = temp_layer_outputs[1]

layer_outputs = decoder_layer(
    hidden_states,
    attention_mask=attention_mask,
    position_ids=position_ids,
    past_key_value=past_key_values,
    output_attentions=output_attentions,
    use_cache=use_cache,
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions