Skip to content

Question about Attention Sharing #3

@jmahajan117

Description

@jmahajan117

In your code, you seem to be only manipulating only parts of the keys and values

            all_store_kv['key'].append(self.scale * key[:, :, 512:, :][:, :, dim_neg2_indices, :])
            all_store_kv['value'].append(value[:, :, 512:, :][:, :, dim_neg2_indices, :])

Is there a reason you do this? Why don't you use the all of the key or value?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions