-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Skill
ckfree-validate
Priority: P1 — Critical for CK-free deployment
Motivation
CK-free mode removes the Composable Kernel dependency from AITER, dramatically reducing build time. However, this mode has been a persistent source of correctness bugs. Known issues: ASM GEMM on gfx950 produces garbage output (cosine ~ 0.006), Triton fallback for dynamic_per_token_scaled_quant ignores shuffle_scale=True (writes row-major scales when column-major is expected), and _fallback_partial_transpose is a no-op. Each bug was silent — the model generated text but it was incoherent. A validation skill would catch these issues systematically before deployment.
What This Skill Should Do
- Verify GEMM backend selection — Confirm that when
ATOM_CK_FREE=1,use_triton_gemm()returns True in all code paths:linear.py,attention_mla.py, andmoe.py. Ensure ASM GEMM is never invoked on gfx950 (known to produce garbage). - Validate quantization scale layouts — For FP8 per-1x128 quantization with
shuffle_scale=True: verify scales are in column-major (transposed) layout after Triton fallback. Compare scale tensor shapes and strides against the CK path reference. - Per-layer cosine similarity — Run a forward pass on a short prompt and compute cosine similarity between CK-free output and FP16 reference at each layer's output. Flag any layer with cosine < 0.999.
- End-to-end generation test — Generate 50 tokens and compare against FP16 reference generation. Check both token match rate and output embedding cosine similarity.
- MoE path validation — For MoE models (DeepSeek), verify that Triton MoE kernels produce correct expert routing and expert GEMM output (cosine > 0.9999 with properly quantized data).
- Regression checklist — Check all known bug patterns: JIT
SystemExitvsRuntimeError,_fallback_partial_transposeactually transposes,normalize_e4m3fn_to_e4m3fnuzapplied on gfx942.
Acceptance Criteria
- Detects ASM GEMM usage in CK-free mode and flags it as an error
- Validates scale layout (row-major vs column-major) matches GEMM backend expectation
- Per-layer cosine similarity report identifies divergent layers
- End-to-end generation produces coherent output (not garbled text)
- Covers all three code paths: linear, attention_mla, moe
- Catches the known
shuffle_scalebug if reintroduced - Works on both gfx942 and gfx950
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels