Skip to content

Allow for GroupedQueryAttention#12

Open
PMMon wants to merge 1 commit intoUFO-101:mainfrom
PMMon:feature/GroupedQueryAttention-Compatibility
Open

Allow for GroupedQueryAttention#12
PMMon wants to merge 1 commit intoUFO-101:mainfrom
PMMon:feature/GroupedQueryAttention-Compatibility

Conversation

@PMMon
Copy link
Copy Markdown

@PMMon PMMon commented Sep 14, 2024

Minor modifcations to factorized_dest_nodes such that models with GroupedQueryAttention (e.g. google/gemma-2-2b) can be loaded. Importantly, the number of query heads does not respond to the number of key and value heads.

Copy link
Copy Markdown
Owner

@UFO-101 UFO-101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. Thanks for this PR!

It looks great and correct as far as I can tell. It's a simple enough change that I think tests are not required. But have you used it enough to locally to be fairly confident it hasn't introduced any bugs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants