This repository was archived by the owner on Dec 23, 2025. It is now read-only.

FAQ

Jump to bottom

leafspark edited this page Aug 4, 2024 · 1 revision

Q: What is the difference between quantization types?

A: Different quantization types offer various tradeoffs between model size and inference quality. IQ1_S is the smallest but has worst quality and quantization time, while Q8_0 offers better quality but larger file size and faster quantization.
Q: Can I quantize any HuggingFace model?

A: Most HuggingFace models compatible with the GGUF format can be quantized using AutoGGUF, although you need to first convert it with the command python convert_hf_to_gguf.py --outtype auto path_to_your_hf_model and then move the GGUF to the models folder.
Q: How long does quantization take?

A: Quantization time depends on the model size, quantization type, and your hardware. It can range from minutes to several hours.