Implement quantized inference kernels for linear and attention layers.
This includes int8 or mixed-precision kernels with efficient
dequantization and accumulation paths.
Planned Benchmarks
- Latency vs precision
- Accuracy comparison
- Memory footprint reduction
Learning Objectives
- Quantization schemes
- Dequantization strategies
- Performance vs accuracy tradeoffs