fix(dflash): use H=1 for BlockMask creation in flex_attention #453

Ximingwang-09 · 2026-01-27T04:00:56Z

Motivation

This PR fixes an issue in the DFlash flex_attention implementation where BlockMask was incorrectly created with H=num_head.
Since the mask function dflash_mask_fn(b, h, q_idx, kv_idx) ignores the h parameter, creating separate masks per head is redundant. Using H=1 allows PyTorch to automatically broadcast the mask across all heads

Modifications

Changes

Simplified BlockMask creation: Changed H=num_heads to H=1 in
create_block_mask() call
Removed unnecessary cache key: Removed _cached_num_heads from cache invalidation logic
Cleaned up function signature: Removed num_heads parameter from _get_or_create_block_mask()

Related Issues

#452

Accuracy Test

Benchmark & Profiling

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

gemini-code-assist · 2026-01-27T04:00:59Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

xiaomin-D · 2026-01-27T07:44:25Z

cc @FrankLeeeee @sleepcoo

fix mask

a1d0d65

Ximingwang-09 requested a review from FrankLeeeee as a code owner January 27, 2026 04:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dflash): use H=1 for BlockMask creation in flex_attention #453

fix(dflash): use H=1 for BlockMask creation in flex_attention #453

Uh oh!

Ximingwang-09 commented Jan 27, 2026

Uh oh!

gemini-code-assist bot commented Jan 27, 2026

Uh oh!

xiaomin-D commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(dflash): use H=1 for BlockMask creation in flex_attention #453

Are you sure you want to change the base?

fix(dflash): use H=1 for BlockMask creation in flex_attention #453

Uh oh!

Conversation

Ximingwang-09 commented Jan 27, 2026

Motivation

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Jan 27, 2026

Uh oh!

xiaomin-D commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants