Use flash attention for qwen3 models #3347

guoqingbao · 2026-01-30T05:50:15Z

No description provided.

DrJesseGlass · 2026-01-30T16:53:26Z

I'll review this PR to make sure that I built along the same lines however I had integrated GPU flash attention and CPU flash attention into Qwen3 (but not MoE) but as I was doing it found substantial reason to upgrade the CPU flash attention and made an early simple submission #3254 to get an improved CPU flash structure before I started serious optimizations.

However, someone else was simultaneously working on #3250 for integrating varlen and the agreement at the time was once varlen was integrated that I would tweak mine to incorporate but it's been sitting unfinished.

guoqingbao · 2026-01-30T23:46:29Z

I'll review this PR to make sure that I built along the same lines however I had integrated GPU flash attention and CPU flash attention into Qwen3 (but not MoE) but as I was doing it found substantial reason to upgrade the CPU flash attention and made an early simple submission #3254 to get an improved CPU flash structure before I started serious optimizations.

This uses the previously defined flash attention interface and remains compatible with existing implementations.

DrJesseGlass · 2026-01-31T00:13:03Z

Yes. It makes sense to get this integrated sooner if possible. I just meant to call out the redundancy.

guoqingbao · 2026-01-31T00:26:08Z

Yes. It makes sense to get this integrated sooner if possible. I just meant to call out the redundancy.

It would be better to have a unified entry point for varlen attention (CPU and GPU). I think I can help with GPU varlen attention, but it seems we are missing corresponding use cases in candle. Varlen attention requires parallel requests (though it can work with a single request) to meaningfully demonstrate performance.

Use flash attention for qwen3 models

fd7e014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use flash attention for qwen3 models #3347

Use flash attention for qwen3 models #3347

Uh oh!

guoqingbao commented Jan 30, 2026

Uh oh!

DrJesseGlass commented Jan 30, 2026

Uh oh!

guoqingbao commented Jan 30, 2026

Uh oh!

DrJesseGlass commented Jan 31, 2026

Uh oh!

guoqingbao commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use flash attention for qwen3 models #3347

Are you sure you want to change the base?

Use flash attention for qwen3 models #3347

Uh oh!

Conversation

guoqingbao commented Jan 30, 2026

Uh oh!

DrJesseGlass commented Jan 30, 2026

Uh oh!

guoqingbao commented Jan 30, 2026

Uh oh!

DrJesseGlass commented Jan 31, 2026

Uh oh!

guoqingbao commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants