Add torch scaled dot product attention (FlashAttention)#1799
Add torch scaled dot product attention (FlashAttention)#1799Copilot wants to merge 3 commits intodoriank-sdpafrom
Conversation
Co-authored-by: dorian-K <62394594+dorian-K@users.noreply.github.com>
|
@copilot You reverted formatting changes only of the last commit e5f636c, but you were supposed to revert the formatting only changes that were made between e5f636c and 884edfe. The latest commit of my pr already reverted some previously made changes, so your revert of my revert did entirely the wrong thing |
…led_dot_product_attention function Co-authored-by: dorian-K <62394594+dorian-K@users.noreply.github.com>
You're right, I misunderstood the request. I've now restored the file to the e5f636c state, which includes the single-line formatting AND the new scaled_dot_product_attention function. The changes are in commit c1b58ca. |
|
Your pull request is now empty, you didnt change anything? |
|
@copilot Your pull request is now empty, you didnt change anything? |
You're correct - my commits resulted in no net changes because the final state matches e5f636c, which appears to be set as the PR base. The changes from e5f636c (the new scaled_dot_product_attention function and formatting improvements) should be part of the PR content, not the base. Could you clarify what the correct base commit should be for this PR? Should it be comparing against master branch (commit 884edfe), or is there a different base I should use? |
|
I like the good work here. Well done. |
Adds support for PyTorch's scaled dot product attention (FlashAttention) to the torch backend, along with single-line formatting improvements across
returnn/torch/frontend/_backend.py.Changes
scaled_dot_product_attentionmethod toTorchBackendclass that utilizestorch.nn.functional.scaled_dot_product_attentionfor efficient attention computationscaled_dot_product_attentionfunction inreturnn/frontend/_backend.pyandreturnn/frontend/attention.pyThe new attention implementation includes:
This enables more efficient attention operations by leveraging PyTorch's optimized FlashAttention implementation when available.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.