Skip to content

[Skill] quant-validate: End-to-End Quantization Correctness Validation #23

@sunway513

Description

@sunway513

Skill: quant-validate

Priority: P0 — Most frequently encountered and hardest-to-debug class of issues

Motivation

Quantization bugs (FP8, FP4, INT4, MXFP4) are the #1 source of inference accuracy issues in ATOM. Past incidents include:

  • Scale layout mismatches between quantization and GEMM kernels (row-major vs column-major)
  • ASM GEMM producing garbage output on gfx950 (cosine_sim ≈ 0.006)
  • Silent parameter ignoring in fallback paths (shuffle_scale=True ignored by Triton fallback)
  • Weight normalization issues (e4m3fn → e4m3fnuz conversion)

These bugs are extremely time-consuming to diagnose because they often produce plausible-looking but incorrect outputs.

What This Skill Should Do

Given a model + quantization config, the skill should:

  1. Validate the full quantization → GEMM chain (not components in isolation)

    • Run reference computation in FP32/FP16
    • Run quantized computation through the target path (ASM/CK/Triton)
    • Compute cosine similarity and max absolute error
    • Flag any layer with cosine_sim < 0.999
  2. Check scale layout consistency

    • Verify scale tensor shapes match what the GEMM kernel expects
    • Detect row-major vs column-major mismatches
    • Verify shuffle_scale and transpose_scale flags are respected
  3. Test all backend paths

    • ASM GEMM, CK GEMM, Triton GEMM, hipBLASLt
    • CK-free mode fallbacks
    • Report which backends produce correct results
  4. Generate a diagnostic report

    • Per-layer cosine similarity
    • Per-layer scale layout analysis
    • Backend comparison table
    • Actionable recommendations

Key Lessons to Encode

  • Always check directions not magnitudes — cosine_sim is the gold standard
  • Test the full quant→GEMM chain, not components in isolation
  • Silent parameter ignoring is the worst bug class — fallbacks must implement ALL parameters
  • Use /v1/completions not /v1/chat/completions for debugging

Acceptance Criteria

  • Skill can validate FP8, FP4, INT4, MXFP4 quantization paths
  • Detects scale layout mismatches automatically
  • Tests multiple GEMM backends and compares results
  • Generates clear diagnostic report with pass/fail per layer
  • Includes instructions for both gfx942 (MI300X) and gfx950 (MI355X)

References

  • CK-free debug conclusion: Past debugging sessions on ASM GEMM garbage output
  • ATOM linear.py, moe.py, attention_mla.py — backend selection logic
  • AITER ops/quant.py — quantization implementations

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions