This repository contains comprehensive documentation for the Model Quantizer tool, which allows you to quantize Hugging Face models to reduce their memory footprint while maintaining most of their performance.
- General Guide: Comprehensive guide to quantizing Hugging Face models
- Benchmarking Guide: How to benchmark and compare quantized models
- Publishing Guide: How to publish quantized models to Hugging Face Hub, including automatic model card generation
- Chat Guide: How to interactively test your quantized models
- Troubleshooting Guide: Solutions to common issues encountered during quantization
- Phi-4-mini Quantization Guide: Detailed guide for quantizing the Microsoft Phi-4-mini model
The Model Quantizer provides a complete workflow for working with quantized models:
First, quantize your model to reduce its memory footprint:
model-quantizer MODEL_NAME --bits 4 --method gptq --output-dir quantized-modelSee the General Model Quantization Guide for detailed instructions.
Next, benchmark your quantized model to evaluate its performance:
run-benchmark --original MODEL_NAME --quantized ./quantized-model --device cpuSee the Benchmarking Guide for detailed instructions.
Test your quantized model interactively to verify its quality:
chat-with-model --model_path ./quantized-modelFinally, publish your quantized model to the Hugging Face Hub:
model-quantizer MODEL_NAME --bits 4 --method gptq --output-dir quantized-model --publish --repo-id YOUR_USERNAME/MODEL_NAME-gptq-4bitSee the Publishing Guide for detailed instructions.
If you're new to model quantization, we recommend starting with the General Model Quantization Guide, which provides an overview of the quantization process and available methods.
For specific models, check if there's a dedicated guide (like the Phi-4-Mini Quantization Guide) that provides optimized settings and recommendations.
For practical examples, see the examples directory, which contains scripts for:
- Quantizing models
- Using quantized models
- Benchmarking performance
- Visualizing results
- Comparing memory usage
If you'd like to contribute to the documentation:
- Fork the repository
- Create a new branch for your changes
- Add or update documentation files
- Submit a pull request
We welcome improvements to existing guides and new model-specific guides.
If you encounter issues not covered in the Troubleshooting Guide, please open an issue on the GitHub repository.