support GQA and realize flash attention v2 with Causal masking by FoolyAndCooly · Pull Request #10 · tspeterkim/flash-attention-minimal

FoolyAndCooly · 2025-03-10T02:36:39Z

This PR implements GQA for flash attention. and implements flash attention v2 with Causal masking
I got these results on my RTX 1650 (flash attention v2)with
batch_size = 16
n_q_head = 16
n_kv_head = 8
seq_len = 256
head_embd = 64

FoolyAndCooly added 4 commits March 8, 2025 20:03

支持GQA,实现flash_v2

41dd5d1

优化冗余同步

ec06878

flash_v2 with Causal masking

3d17ef7

add v1 bench mark

8320993

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support GQA and realize flash attention v2 with Causal masking#10

support GQA and realize flash attention v2 with Causal masking#10
FoolyAndCooly wants to merge 4 commits intotspeterkim:mainfrom
FoolyAndCooly:main

FoolyAndCooly commented Mar 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FoolyAndCooly commented Mar 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant