You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 23, 2025. It is now read-only.
leafspark edited this page Aug 4, 2024
·
1 revision
Q: What is the difference between quantization types?
A: Different quantization types offer various tradeoffs between model size and inference quality. IQ1_S is the smallest but has worst quality and quantization time, while Q8_0 offers better quality but larger file size and faster quantization.
Q: Can I quantize any HuggingFace model?
A: Most HuggingFace models compatible with the GGUF format can be quantized using AutoGGUF, although you need to first convert it with the command python convert_hf_to_gguf.py --outtype auto path_to_your_hf_model and then move the GGUF to the models folder.
Q: How long does quantization take?
A: Quantization time depends on the model size, quantization type, and your hardware. It can range from minutes to several hours.