Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
benchmarking.md	benchmarking.md
general_guide.md	general_guide.md
phi4_mini.md	phi4_mini.md
platform_compatibility.md	platform_compatibility.md
publishing_guide.md	publishing_guide.md
troubleshooting.md	troubleshooting.md

Name

Last commit message

Last commit date

benchmarking.md

general_guide.md

phi4_mini.md

platform_compatibility.md

publishing_guide.md

troubleshooting.md

Model Quantizer Documentation

This repository contains comprehensive documentation for the Model Quantizer tool, which allows you to quantize Hugging Face models to reduce their memory footprint while maintaining most of their performance.

Core Documentation

General Guide: Comprehensive guide to quantizing Hugging Face models
Benchmarking Guide: How to benchmark and compare quantized models
Publishing Guide: How to publish quantized models to Hugging Face Hub, including automatic model card generation
Chat Guide: How to interactively test your quantized models
Troubleshooting Guide: Solutions to common issues encountered during quantization

Example-Specific Documentation

Phi-4-mini Quantization Guide: Detailed guide for quantizing the Microsoft Phi-4-mini model

Core Workflow

The Model Quantizer provides a complete workflow for working with quantized models:

1. Quantize

First, quantize your model to reduce its memory footprint:

model-quantizer MODEL_NAME --bits 4 --method gptq --output-dir quantized-model

See the General Model Quantization Guide for detailed instructions.

2. Benchmark

Next, benchmark your quantized model to evaluate its performance:

run-benchmark --original MODEL_NAME --quantized ./quantized-model --device cpu

See the Benchmarking Guide for detailed instructions.

3. Test Interactively

Test your quantized model interactively to verify its quality:

chat-with-model --model_path ./quantized-model

4. Publish

Finally, publish your quantized model to the Hugging Face Hub:

model-quantizer MODEL_NAME --bits 4 --method gptq --output-dir quantized-model --publish --repo-id YOUR_USERNAME/MODEL_NAME-gptq-4bit

See the Publishing Guide for detailed instructions.

Getting Started

If you're new to model quantization, we recommend starting with the General Model Quantization Guide, which provides an overview of the quantization process and available methods.

For specific models, check if there's a dedicated guide (like the Phi-4-Mini Quantization Guide) that provides optimized settings and recommendations.

Examples

For practical examples, see the examples directory, which contains scripts for:

Quantizing models
Using quantized models
Benchmarking performance
Visualizing results
Comparing memory usage

Contributing

If you'd like to contribute to the documentation:

Fork the repository
Create a new branch for your changes
Add or update documentation files
Submit a pull request

We welcome improvements to existing guides and new model-specific guides.

Support

If you encounter issues not covered in the Troubleshooting Guide, please open an issue on the GitHub repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Model Quantizer Documentation

Core Documentation

Example-Specific Documentation

Core Workflow

1. Quantize

2. Benchmark

3. Test Interactively

4. Publish

Getting Started

Examples

Contributing

Support

FilesExpand file tree

docs

Directory actions

More options

Directory actions

More options

Latest commit

History

docs

Folders and files

parent directory

README.md

Model Quantizer Documentation

Core Documentation

Example-Specific Documentation

Core Workflow

1. Quantize

2. Benchmark

3. Test Interactively

4. Publish

Getting Started

Examples

Contributing

Support