Skip to content

[Skill] ckfree-validate: Validate CK-free mode correctness #30

@sunway513

Description

@sunway513

Skill

ckfree-validate

Priority: P1 — Critical for CK-free deployment

Motivation

CK-free mode removes the Composable Kernel dependency from AITER, dramatically reducing build time. However, this mode has been a persistent source of correctness bugs. Known issues: ASM GEMM on gfx950 produces garbage output (cosine ~ 0.006), Triton fallback for dynamic_per_token_scaled_quant ignores shuffle_scale=True (writes row-major scales when column-major is expected), and _fallback_partial_transpose is a no-op. Each bug was silent — the model generated text but it was incoherent. A validation skill would catch these issues systematically before deployment.

What This Skill Should Do

  1. Verify GEMM backend selection — Confirm that when ATOM_CK_FREE=1, use_triton_gemm() returns True in all code paths: linear.py, attention_mla.py, and moe.py. Ensure ASM GEMM is never invoked on gfx950 (known to produce garbage).
  2. Validate quantization scale layouts — For FP8 per-1x128 quantization with shuffle_scale=True: verify scales are in column-major (transposed) layout after Triton fallback. Compare scale tensor shapes and strides against the CK path reference.
  3. Per-layer cosine similarity — Run a forward pass on a short prompt and compute cosine similarity between CK-free output and FP16 reference at each layer's output. Flag any layer with cosine < 0.999.
  4. End-to-end generation test — Generate 50 tokens and compare against FP16 reference generation. Check both token match rate and output embedding cosine similarity.
  5. MoE path validation — For MoE models (DeepSeek), verify that Triton MoE kernels produce correct expert routing and expert GEMM output (cosine > 0.9999 with properly quantized data).
  6. Regression checklist — Check all known bug patterns: JIT SystemExit vs RuntimeError, _fallback_partial_transpose actually transposes, normalize_e4m3fn_to_e4m3fnuz applied on gfx942.

Acceptance Criteria

  • Detects ASM GEMM usage in CK-free mode and flags it as an error
  • Validates scale layout (row-major vs column-major) matches GEMM backend expectation
  • Per-layer cosine similarity report identifies divergent layers
  • End-to-end generation produces coherent output (not garbled text)
  • Covers all three code paths: linear, attention_mla, moe
  • Catches the known shuffle_scale bug if reintroduced
  • Works on both gfx942 and gfx950

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions