Skip to content

Fix per-layer quant dtype in DeepSeek R1 attention init#1

Closed
thpereir wants to merge 1 commit intothpereir/quark_quant_layerfrom
thpereir/deepseek_r1_mxfp4_ptpc
Closed

Fix per-layer quant dtype in DeepSeek R1 attention init#1
thpereir wants to merge 1 commit intothpereir/quark_quant_layerfrom
thpereir/deepseek_r1_mxfp4_ptpc

Conversation

@thpereir
Copy link
Owner

@thpereir thpereir commented Mar 4, 2026

For mixed-precision models (e.g. MXFP4 MoE + FP8 attention), the attention block must resolve its own per-layer quant spec rather than using the global quant_config['quant_dtype'].

  • Add _attn_spec / _attn_quant_dtype via quant_config.resolve(prefix)
  • Use resolved dtype for FP4/FP8 decision in attention init
  • Pass prefix to MergedReplicatedLinear for fused_qkv_a_proj
  • Use resolved dtype for fuse_qknorm_quant decision

Tested with DeepSeek-R1-0528-moe-mxfp4-other-ptpc on TP=4.

Depends on ROCm#236

For mixed-precision models (e.g. MXFP4 MoE + FP8 attention), the
attention block must resolve its own per-layer quant spec rather than
using the global quant_config['quant_dtype'].

- Add _attn_spec / _attn_quant_dtype via quant_config.resolve(prefix)
- Use resolved dtype for FP4/FP8 decision in attention init
- Pass prefix to MergedReplicatedLinear for fused_qkv_a_proj
- Use resolved dtype for fuse_qknorm_quant decision

Tested with DeepSeek-R1-0528-moe-mxfp4-other-ptpc on TP=4.
@thpereir thpereir closed this Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant