Add torch scaled dot product attention (FlashAttention) by Copilot · Pull Request #1799 · rwth-i6/returnn

Copilot · 2025-12-19T12:39:36Z

Adds support for PyTorch's scaled dot product attention (FlashAttention) to the torch backend, along with single-line formatting improvements across returnn/torch/frontend/_backend.py.

Changes

New Function: Added scaled_dot_product_attention method to TorchBackend class that utilizes torch.nn.functional.scaled_dot_product_attention for efficient attention computation
Frontend Integration: Added corresponding scaled_dot_product_attention function in returnn/frontend/_backend.py and returnn/frontend/attention.py
Formatting: Applied single-line formatting style for function signatures, list comprehensions, and method calls across the torch backend (152 insertions, 477 deletions)

The new attention implementation includes:

Support for attention masks and causal masking
Configurable dropout and scaling
Fallback to legacy implementation for edge cases (e.g., dynamic dimensions, debugging mode)
Proper dimension handling for batch, spatial, and embedding dimensions

This enables more efficient attention operations by leveraging PyTorch's optimized FlashAttention implementation when available.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: dorian-K <62394594+dorian-K@users.noreply.github.com>

dorian-K · 2025-12-19T12:47:01Z

@copilot You reverted formatting changes only of the last commit e5f636c, but you were supposed to revert the formatting only changes that were made between e5f636c and 884edfe. The latest commit of my pr already reverted some previously made changes, so your revert of my revert did entirely the wrong thing

…led_dot_product_attention function Co-authored-by: dorian-K <62394594+dorian-K@users.noreply.github.com>

Copilot · 2025-12-19T12:56:46Z

@copilot You reverted formatting changes only of the last commit e5f636c, but you were supposed to revert the formatting only changes that were made between e5f636c...

You're right, I misunderstood the request. I've now restored the file to the e5f636c state, which includes the single-line formatting AND the new scaled_dot_product_attention function. The changes are in commit c1b58ca.

dorian-K · 2025-12-19T12:59:59Z

Your pull request is now empty, you didnt change anything?

dorian-K · 2025-12-19T13:01:08Z

@copilot Your pull request is now empty, you didnt change anything?

Copilot · 2025-12-19T13:04:26Z

@copilot Your pull request is now empty, you didnt change anything?

You're correct - my commits resulted in no net changes because the final state matches e5f636c, which appears to be set as the PR base. The changes from e5f636c (the new scaled_dot_product_attention function and formatting improvements) should be part of the PR content, not the base.

Could you clarify what the correct base commit should be for this PR? Should it be comparing against master branch (commit 884edfe), or is there a different base I should use?

albertz · 2025-12-19T13:12:19Z

I like the good work here. Well done.

Initial plan

628e69f

Copilot AI assigned Copilot and dorian-K Dec 19, 2025

Copilot AI mentioned this pull request Dec 19, 2025

Add torch scaled dot product attention (FlashAttention) #1798

Draft

Copilot started work on behalf of dorian-K December 19, 2025 12:39 View session

Revert formatting-only changes from commit e5f636c

3127540

Co-authored-by: dorian-K <62394594+dorian-K@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add torch scaled dot product attention (FlashAttention)~~ Revert formatting-only changes Dec 19, 2025

Copilot AI requested a review from dorian-K December 19, 2025 12:42

Copilot finished work on behalf of dorian-K December 19, 2025 12:42

Copilot started work on behalf of dorian-K December 19, 2025 12:47 View session

Restore file to e5f636c state with single-line formatting and new sca…

c1b58ca

…led_dot_product_attention function Co-authored-by: dorian-K <62394594+dorian-K@users.noreply.github.com>

Copilot AI changed the title ~~Revert formatting-only changes~~ Add torch scaled dot product attention (FlashAttention) Dec 19, 2025

Copilot finished work on behalf of dorian-K December 19, 2025 12:57

Copilot started work on behalf of dorian-K December 19, 2025 13:01 View session

Copilot finished work on behalf of dorian-K December 19, 2025 13:05

dorian-K closed this Dec 19, 2025

albertz deleted the copilot/sub-pr-1798 branch December 19, 2025 13:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add torch scaled dot product attention (FlashAttention)#1799

Add torch scaled dot product attention (FlashAttention)#1799
Copilot wants to merge 3 commits intodoriank-sdpafrom
copilot/sub-pr-1798

Copilot AI commented Dec 19, 2025 •

edited

Loading

Uh oh!

dorian-K commented Dec 19, 2025

Uh oh!

Copilot AI commented Dec 19, 2025

Uh oh!

dorian-K commented Dec 19, 2025

Uh oh!

dorian-K commented Dec 19, 2025

Uh oh!

Copilot AI commented Dec 19, 2025

Uh oh!

albertz commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

dorian-K commented Dec 19, 2025

Uh oh!

Copilot AI commented Dec 19, 2025

Uh oh!

dorian-K commented Dec 19, 2025

Uh oh!

dorian-K commented Dec 19, 2025

Uh oh!

Copilot AI commented Dec 19, 2025

Uh oh!

albertz commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Dec 19, 2025 •

edited

Loading