Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
custom_prompts.json	custom_prompts.json
quantize_example.py	quantize_example.py

Name

Last commit message

Last commit date

custom_prompts.json

quantize_example.py

Model Quantizer Examples

This directory contains examples demonstrating how to use the Model Quantizer tool.

Available Examples

quantize_example.py: A script for quantizing any Hugging Face model
custom_prompts.json: Example prompts for benchmarking and testing

Basic Usage

# Quantize a model using GPTQ 4-bit
python quantize_example.py --model microsoft/Phi-4-mini-instruct --method gptq --bits 4 --output-dir ./phi4-mini-gptq-4bit

# Quantize a model using BitsAndBytes 8-bit
python quantize_example.py --model microsoft/Phi-4-mini-instruct --method bnb --bits 8 --output-dir ./phi4-mini-bnb-8bit

# Quantize a model and publish to Hugging Face Hub
python quantize_example.py --model microsoft/Phi-4-mini-instruct --method gptq --bits 4 --publish --repo-id YOUR_USERNAME/phi4-mini-gptq-4bit

Complete Workflow

1. Quantize a Model

# Quantize Gemma 2B to 4-bit using GPTQ
python quantize_example.py --model google/gemma-2b --method gptq --bits 4 --output-dir ./gemma-2b-quantized

2. Benchmark the Quantized Model

# Run the automated benchmark process
run-benchmark --original google/gemma-2b --quantized ./gemma-2b-quantized --device cpu --max_tokens 50 --output_dir benchmark_results

3. Test Interactively

# Chat with the model
chat-with-model --model_path ./gemma-2b-quantized --device cpu

4. Publish to Hugging Face Hub

# Publish the quantized model
python quantize_example.py --model google/gemma-2b --method gptq --bits 4 --publish --repo-id YOUR_USERNAME/gemma-2b-gptq-4bit

Advanced Options

Custom Calibration Dataset

# Use a custom calibration dataset
python quantize_example.py --model microsoft/Phi-4-mini-instruct --method gptq --bits 4 --calibration-dataset "This is a sample text,This is another sample"

Device Selection

# Use CPU for quantization
python quantize_example.py --model microsoft/Phi-4-mini-instruct --method gptq --bits 4 --device cpu

# Use CUDA for quantization
python quantize_example.py --model microsoft/Phi-4-mini-instruct --method gptq --bits 4 --device cuda

# Use MPS for quantization (Apple Silicon)
python quantize_example.py --model microsoft/Phi-4-mini-instruct --method gptq --bits 4 --device mps

Quantization Parameters

# Set group size to 64
python quantize_example.py --model microsoft/Phi-4-mini-instruct --method gptq --bits 4 --group-size 64

# Use descending activation order
python quantize_example.py --model microsoft/Phi-4-mini-instruct --method gptq --bits 4 --desc-act

# Use asymmetric quantization
python quantize_example.py --model microsoft/Phi-4-mini-instruct --method gptq --bits 4 --no-sym

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Model Quantizer Examples

Available Examples

Basic Usage

Complete Workflow

1. Quantize a Model

2. Benchmark the Quantized Model

3. Test Interactively

4. Publish to Hugging Face Hub

Advanced Options

Custom Calibration Dataset

Device Selection

Quantization Parameters

FilesExpand file tree

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

Model Quantizer Examples

Available Examples

Basic Usage

Complete Workflow

1. Quantize a Model

2. Benchmark the Quantized Model

3. Test Interactively

4. Publish to Hugging Face Hub

Advanced Options

Custom Calibration Dataset

Device Selection

Quantization Parameters