Fix per-layer quant dtype in DeepSeek R1 attention init by thpereir · Pull Request #1 · thpereir/ATOM

thpereir · 2026-03-04T22:57:13Z

For mixed-precision models (e.g. MXFP4 MoE + FP8 attention), the attention block must resolve its own per-layer quant spec rather than using the global quant_config['quant_dtype'].

Add _attn_spec / _attn_quant_dtype via quant_config.resolve(prefix)
Use resolved dtype for FP4/FP8 decision in attention init
Pass prefix to MergedReplicatedLinear for fused_qkv_a_proj
Use resolved dtype for fuse_qknorm_quant decision

Tested with DeepSeek-R1-0528-moe-mxfp4-other-ptpc on TP=4.

Depends on ROCm#236

For mixed-precision models (e.g. MXFP4 MoE + FP8 attention), the attention block must resolve its own per-layer quant spec rather than using the global quant_config['quant_dtype']. - Add _attn_spec / _attn_quant_dtype via quant_config.resolve(prefix) - Use resolved dtype for FP4/FP8 decision in attention init - Pass prefix to MergedReplicatedLinear for fused_qkv_a_proj - Use resolved dtype for fuse_qknorm_quant decision Tested with DeepSeek-R1-0528-moe-mxfp4-other-ptpc on TP=4.

thpereir closed this Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix per-layer quant dtype in DeepSeek R1 attention init#1

Fix per-layer quant dtype in DeepSeek R1 attention init#1
thpereir wants to merge 1 commit intothpereir/quark_quant_layerfrom
thpereir/deepseek_r1_mxfp4_ptpc

thpereir commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thpereir commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant