Skip to content

quants support #4

@0dragosh

Description

@0dragosh

As I understand it there's currently no support for any quants. I think it would be really cool to eventually have support for a type of quant so we can do larger models, say qwen3.5 35b on a 4090. The most practical quant support is transformers + bitsandbytes due to autokernel being built around PyTorch profiling, it seems to me.

As I see it there are effectively 3 levels:

1. Load quantized models

This is easy.

  • Add optional deps like bitsandbytes in pyproject.toml (line 12).
  • Extend profile.py (line 267) and verify.py (line 147) to accept quantization args and pass a BitsAndBytesConfig into from_pretrained().
  • Add CLI flags like --quantization bnb4|bnb8|none, --compute-dtype bf16|fp16, maybe --device-map.

2. Profile quantized models

This is medium difficulty.

  • The profiler will still run, but kernel names and module types will change.
  • Today the repo classifies kernels by CUDA name fragments in profile.py (line 449), which is tuned for dense PyTorch/cuBLAS-style kernels, not quant-specific kernels.
  • You’d need to inspect what 4-bit/8-bit runs actually emit and extend the classifier.

3. Optimize and reintegrate quantized kernels

This is the real work.

  • End-to-end reintegration currently only replaces plain nn.Linear, nn.LayerNorm, and RMSNorm-like modules in verify.py (line 563).
  • Quantized models often replace nn.Linear with custom classes, so verify.py (line 575) would miss them.
  • More importantly, the existing kernel library assumes dense fp16/bf16 kernels. Quantized inference needs different kernels and references: dequantize+matmul fusion, packed weights, scales/zeros, possibly group-wise quant metadata. That means new starter kernels, new reference paths, and likely new benchmark inputs in bench.py.

I can unpack this work at a high level but I don't think I'm there yet for the implementation. Any opinions/guidance?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions