Skip to content

Attention 层参数量计算疑问 #376

@kloop3

Description

@kloop3

AI 计算集群概述中code1中,Attention 层参数量计算时,公式是否有问题?标准多头注意力(不考虑GQA等技术),参数量是否应该是P_{attn_per_layer} = (d_{model} \times d_{model})Q8 + (d*{model} \times d_{model})K8 + (d*{model} \times d_{model})V8 + (d*{model} \times d_{model})O,也就是需要QKV的参数量应该是d_modeld_model*n_heads

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions