fix: pre-download sage_attention kernel before applying backend, remove pinned fa3 kernel version by Marius-Graml · Pull Request #578 · PrunaAI/pruna

Marius-Graml · 2026-03-16T10:44:53Z

Description

Currently, there is a bug in the sageattn algorithm. Diffusers has two set_attention_backend methods, one for the whole model and one for the submodules. The submodule-level set_attention_backend in diffusers does not trigger the kernel download, leaving kernel_fn as None and causing a TypeError. This adds an explicit _maybe_download_kernel_for_backend call.
Further, the pinned version of the fa3 kernel is removed such that fa3 works for torch 2.10 now. Note, that some kernel builds return (out, lse), others return just out, depending on torch and cuda version. Thus, this must be handled in the registered torch-op function.

Related Issue

/

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Run in notebook

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Additional Notes

/

The submodule-level set_attention_backend in diffusers does not trigger the kernel download, leaving kernel_fn as None and causing a TypeError. This adds an explicit _maybe_download_kernel_for_backend call.

github-actions · 2026-03-29T00:14:01Z

This PR has been inactive for 10 days and is now marked as stale.

johannaSommer

LGTM but please wait for Begüm's review regarding the version

johannaSommer · 2026-03-31T08:45:25Z

src/pruna/algorithms/flash_attn3.py

        Dict[str, Any]
            The algorithm packages.
        """
-        flash_attention_3 = get_kernel("kernels-community/flash-attn3", version="<0.1.0")


I know that this was an important fix at some point, so not sure about removing it. Please wait for @begumcig 's review on this, she tackled this back then

johannaSommer · 2026-03-31T08:46:18Z

src/pruna/algorithms/flash_attn3.py

                    enable_gqa=enable_gqa,
                )
            else:
-                out, _, *_ = torch.ops.flash_attn_pruna._flash_attn_forward(


we might need to keep this flexibile depending on the fa3 version we encounter - can we return and check whether the output is a tuple or a tensor and handle it accordingly?

nevermind i get what you did now, this is great!

Bug fix: pre-download sage_attention kernel before applying backend

2aa0b5d

The submodule-level set_attention_backend in diffusers does not trigger the kernel download, leaving kernel_fn as None and causing a TypeError. This adds an explicit _maybe_download_kernel_for_backend call.

Marius-Graml changed the title ~~Bug fix: pre-download sage_attention kernel before applying backend~~ fix: pre-download sage_attention kernel before applying backend Mar 16, 2026

Marius-Graml requested a review from johannaSommer March 16, 2026 12:53

Remove pinned fa3 kernel version.

b5272b1

Marius-Graml changed the title ~~fix: pre-download sage_attention kernel before applying backend~~ fix: pre-download sage_attention kernel before applying backend, remove pinned fa3 kernel version Mar 18, 2026

Marius-Graml requested a review from begumcig March 18, 2026 13:51

github-actions bot added the stale label Mar 29, 2026

johannaSommer approved these changes Mar 31, 2026

View reviewed changes

github-actions bot removed the stale label Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pre-download sage_attention kernel before applying backend, remove pinned fa3 kernel version#578

fix: pre-download sage_attention kernel before applying backend, remove pinned fa3 kernel version#578
Marius-Graml wants to merge 2 commits intomainfrom
fix/SageAttn

Marius-Graml commented Mar 16, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 29, 2026

Uh oh!

johannaSommer left a comment

Uh oh!

johannaSommer Mar 31, 2026

Uh oh!

johannaSommer Mar 31, 2026

Uh oh!

johannaSommer Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Marius-Graml commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

How Has This Been Tested?

Checklist

Additional Notes

Uh oh!

github-actions bot commented Mar 29, 2026

Uh oh!

johannaSommer left a comment

Choose a reason for hiding this comment

Uh oh!

johannaSommer Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

johannaSommer Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

johannaSommer Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Marius-Graml commented Mar 16, 2026 •

edited

Loading