[Skill] quant-validate: End-to-End Quantization Correctness Validation

## Skill: `quant-validate`

**Priority**: P0 — Most frequently encountered and hardest-to-debug class of issues

### Motivation

Quantization bugs (FP8, FP4, INT4, MXFP4) are the #1 source of inference accuracy issues in ATOM. Past incidents include:
- Scale layout mismatches between quantization and GEMM kernels (row-major vs column-major)
- ASM GEMM producing garbage output on gfx950 (cosine_sim ≈ 0.006)
- Silent parameter ignoring in fallback paths (`shuffle_scale=True` ignored by Triton fallback)
- Weight normalization issues (e4m3fn → e4m3fnuz conversion)

These bugs are extremely time-consuming to diagnose because they often produce plausible-looking but incorrect outputs.

### What This Skill Should Do

Given a model + quantization config, the skill should:

1. **Validate the full quantization → GEMM chain** (not components in isolation)
   - Run reference computation in FP32/FP16
   - Run quantized computation through the target path (ASM/CK/Triton)
   - Compute cosine similarity and max absolute error
   - Flag any layer with cosine_sim < 0.999

2. **Check scale layout consistency**
   - Verify scale tensor shapes match what the GEMM kernel expects
   - Detect row-major vs column-major mismatches
   - Verify `shuffle_scale` and `transpose_scale` flags are respected

3. **Test all backend paths**
   - ASM GEMM, CK GEMM, Triton GEMM, hipBLASLt
   - CK-free mode fallbacks
   - Report which backends produce correct results

4. **Generate a diagnostic report**
   - Per-layer cosine similarity
   - Per-layer scale layout analysis
   - Backend comparison table
   - Actionable recommendations

### Key Lessons to Encode

- Always check **directions not magnitudes** — cosine_sim is the gold standard
- Test the **full quant→GEMM chain**, not components in isolation
- Silent parameter ignoring is the worst bug class — fallbacks must implement ALL parameters
- Use `/v1/completions` not `/v1/chat/completions` for debugging

### Acceptance Criteria

- [ ] Skill can validate FP8, FP4, INT4, MXFP4 quantization paths
- [ ] Detects scale layout mismatches automatically
- [ ] Tests multiple GEMM backends and compares results
- [ ] Generates clear diagnostic report with pass/fail per layer
- [ ] Includes instructions for both gfx942 (MI300X) and gfx950 (MI355X)

### References

- CK-free debug conclusion: Past debugging sessions on ASM GEMM garbage output
- ATOM `linear.py`, `moe.py`, `attention_mla.py` — backend selection logic
- AITER `ops/quant.py` — quantization implementations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Skill] quant-validate: End-to-End Quantization Correctness Validation #23

Skill: `quant-validate`

Motivation

What This Skill Should Do

Key Lessons to Encode

Acceptance Criteria

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Skill] quant-validate: End-to-End Quantization Correctness Validation #23

Description

Skill: quant-validate

Motivation

What This Skill Should Do

Key Lessons to Encode

Acceptance Criteria

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Skill: `quant-validate`