Skip to content

Commit 3f330d0

Browse files
authored
Merge pull request #1 from FujitsuResearch/feature/v1-0-0
Feature/v1 0 0
2 parents 0319d41 + ed397d9 commit 3f330d0

22 files changed

+197
-17
lines changed

CHANGELOG.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,23 @@
11
# Change log
22

3+
## [v1.0.0] 2026-03-31
4+
5+
### Default Parameter Changes
6+
7+
- Changed `Runner.__init__` default values for calibration parameters:
8+
- `max_length`: `512``2048`
9+
- `num_calibration_samples`: `128``512`
10+
- Pinned old default values explicitly in all `example/` and `tests/` files that previously relied on the defaults
11+
12+
### Documentation
13+
14+
- Updated `docs/user-guide/configuration.md` to reflect the new default values for `max_length` and `num_calibration_samples`
15+
- Added quantizer feature support table to `docs/user-guide/basic-usage.md` and `docs/api/quantizers/base.md`
16+
- Documents which quantizers support `save_quantized_model()` / `create_quantized_model()` and quantized-model PPL/ACC evaluation
17+
- Currently supported: **GPTQ**, **DBF**, **AutoBitQuantizer** (requires `get_quant_config()` and `create_inference_layer()`)
18+
- Unsupported quantizers (RTN, JointQ, QUIP, CQ, ARB, QBB, Onebit): PPL/ACC evaluation automatically falls back to the dequantized (FP16) model
19+
- Updated the perplexity/accuracy evaluation note in `basic-usage.md` to reflect AutoBitQuantizer support and fallback behavior
20+
321
## [v0.5.0] 2026-03-30
422

523
### New Feature: Post-quantization Workflow

README.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,26 @@ This package is currently under development (version 0) and may behave unstably.
66

77
## 📦 Features
88

9-
- **Quantization Error Propagation (QEP)**: A post-training quantization method that corrects quantization errors by propagating them to subsequent layers, improving the accuracy of quantized LLMs. See [Arai & Ichikawa, NeurIPS 2025](https://openreview.net/forum?id=a3l3K9khbL) for details.
9+
- **Quantization Error Propagation (QEP)**: A post-training quantization method that corrects quantization errors by propagating them to subsequent layers, improving the accuracy of quantized LLMs. See [Arai & Ichikawa, NeurIPS 2025](https://openreview.net/forum?id=a3l3K9khbL) for details. The original reference implementation is available at [FujitsuResearch/qep](https://github.com/FujitsuResearch/qep).
1010
- **vLLM Plugin Integration**: Serve OneComp-quantized models with [vLLM](https://docs.vllm.ai/) via built-in plugins for DBF and Mixed-GPTQ quantization methods.
1111
- **AutoBit**: Mixed-precision quantization with ILP-based bitwidth assignment. Automatically estimates the target bitwidth from available VRAM and assigns per-layer bitwidths to minimize quantization error under the memory budget.
1212
- **JointQ**: Joint quantization method that optimizes weight assignments and scale parameters simultaneously for improved quantization accuracy. Supports group-wise quantization (e.g., 4-bit, groupsize=128).
1313
- **LoRA SFT Post-Process**: Fine-tune quantized models with LoRA adapters for accuracy recovery or domain-specific knowledge injection. Supports SFT loss, teacher distillation, and intermediate block alignment.
1414
- **Rotation Preprocessing**: SpinQuant/OstQuant-based rotation preprocessing that reduces quantization error by learning optimal rotation matrices before quantization. Rotation/scaling matrices are absorbed into model weights, with online Hadamard hooks automatically registered at load time. Supports Llama and Qwen3 architectures.
1515
- (TBD)
1616

17+
## 🤖 Supported Models
18+
19+
OneComp has been verified with the following model architectures.
20+
Other Hugging Face-compatible models may work but are currently untested.
21+
22+
| # | Architecture | Verified Models | Status |
23+
|---|-------------|-----------------|--------|
24+
| 1 | Llama | TinyLlama, Llama-2, Llama-3 | ✅ Verified |
25+
| 2 | Qwen3 | Qwen3-0.6B ~ 32B | ✅ Verified |
26+
27+
> **Note:** Support for additional architectures is planned. Contributions and test reports are welcome.
28+
1729
## 🔧 Installation
1830

1931
### for users (pip)
@@ -181,6 +193,19 @@ See [LICENSE](./LICENSE) for more details.
181193

182194
## Citation
183195

196+
OneComp technical report (coming soon on ArXiv):
197+
198+
```
199+
@misc{onecomp2026,
200+
title={TBD},
201+
author={TBD},
202+
year={2026},
203+
note={arXiv preprint coming soon}
204+
}
205+
```
206+
207+
QEP (Quantization Error Propagation):
208+
184209
```
185210
@inproceedings{
186211
arai2025quantization,

docs/algorithms/qep.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@ the error that propagates from previously quantized layers to subsequent ones.
66
!!! abstract "Reference"
77
Yamato Arai and Yuma Ichikawa, "Quantization Error Propagation: Revisiting Layer-Wise
88
Post-Training Quantization," NeurIPS 2025.
9-
[OpenReview](https://openreview.net/forum?id=a3l3K9khbL)
9+
[OpenReview](https://openreview.net/forum?id=a3l3K9khbL) |
10+
[Original implementation](https://github.com/FujitsuResearch/qep)
1011

1112
## Motivation
1213

docs/api/quantizers/base.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,34 @@
44

55
Abstract base class for all quantizers. Defines the common interface and shared functionality.
66

7+
### Quantizer Feature Support
8+
9+
`Runner.save_quantized_model()`, `Runner.create_quantized_model()`, and quantized-model
10+
PPL/ACC evaluation internally call `get_quant_config()` and `create_inference_layer()` on
11+
the quantizer. These methods raise `NotImplementedError` by default and must be overridden
12+
by each quantizer to enable these features.
13+
14+
| Quantizer | `get_quant_config` | `create_inference_layer` | Save | Quantized PPL/ACC |
15+
|--------------------|:------------------:|:------------------------:|:----:|:-----------------:|
16+
| `GPTQ` | Yes | Yes | Yes | Yes |
17+
| `DBF` | Yes | Yes | Yes | Yes |
18+
| `AutoBitQuantizer` | Yes | Yes | Yes | Yes |
19+
| `RTN` ||| No | No (fallback) |
20+
| `JointQ` ||| No | No (fallback) |
21+
| `QUIP` ||| No | No (fallback) |
22+
| `CQ` ||| No | No (fallback) |
23+
| `ARB` ||| No | No (fallback) |
24+
| `QBB` ||| No | No (fallback) |
25+
| `Onebit` ||| No | No (fallback) |
26+
27+
For quantizers without support:
28+
29+
- **PPL/ACC evaluation**: `calculate_perplexity()` / `calculate_accuracy()` with
30+
`quantized_model=True` automatically falls back to the dequantized (FP16) model.
31+
No error is raised.
32+
- **Saving**: use `save_dequantized_model()` (FP16) or `save_quantization_results()`
33+
to persist results.
34+
735
::: onecomp.quantizer._quantizer.Quantizer
836
options:
937
show_source: false

docs/index.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,19 @@ It implements state-of-the-art quantization algorithms including GPTQ, DBF, RTN,
1717
- **LoRA SFT Post-Process** -- Fine-tune quantized models with LoRA adapters for accuracy recovery or domain-specific knowledge injection. Supports SFT loss, teacher distillation, and intermediate block alignment.
1818
- **Rotation Preprocessing** -- SpinQuant/OstQuant-based rotation preprocessing that reduces quantization error by learning optimal rotation matrices before quantization. Rotation/scaling matrices are absorbed into model weights, with online Hadamard hooks automatically registered at load time. Supports Llama and Qwen3 architectures.
1919

20+
## Supported Models
21+
22+
OneComp has been verified with the following model architectures.
23+
Other Hugging Face-compatible models may work but are currently untested.
24+
25+
| # | Architecture | Verified Models | Status |
26+
|---|-------------|-----------------|--------|
27+
| 1 | Llama | TinyLlama, Llama-2, Llama-3 | :white_check_mark: Verified |
28+
| 2 | Qwen3 | Qwen3-0.6B ~ 32B | :white_check_mark: Verified |
29+
30+
!!! note
31+
Support for additional architectures is planned. Contributions and test reports are welcome.
32+
2033
## Quick Example
2134

2235
Quantize any Hugging Face model in a single line -- with QEP, GPTQ 4-bit quantization,
@@ -72,6 +85,19 @@ For full control over each step, see the [step-by-step workflow](user-guide/basi
7285

7386
If you use OneComp in your research, please cite our paper:
7487

88+
OneComp technical report (coming soon on ArXiv):
89+
90+
```bibtex
91+
@misc{onecomp2026,
92+
title={TBD},
93+
author={TBD},
94+
year={2026},
95+
note={arXiv preprint coming soon}
96+
}
97+
```
98+
99+
QEP (Quantization Error Propagation):
100+
75101
```bibtex
76102
@inproceedings{
77103
arai2025quantization,

docs/user-guide/basic-usage.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,9 @@ print(f"Quantized: {quantized_ppl:.2f}")
121121

122122
!!! note
123123
- Evaluating the original or dequantized model requires loading the full model on GPU.
124-
- Quantized-model evaluation is currently supported only for **GPTQ** and **DBF** quantizers. Support for other methods is planned.
124+
- Quantized-model evaluation (`quantized_model=True`) is supported only for quantizers
125+
that implement `create_quantized_model()` (**GPTQ**, **DBF**, **AutoBitQuantizer**).
126+
For other quantizers, evaluation automatically falls back to the dequantized (FP16) model.
125127

126128
### Zero-shot Accuracy
127129

@@ -146,6 +148,29 @@ runner.save_dequantized_model("./output/dequantized")
146148
runner.save_quantized_model("./output/quantized")
147149
```
148150

151+
!!! note "Quantizer feature support"
152+
`save_quantized_model()`, `create_quantized_model()`, and quantized-model PPL/ACC evaluation
153+
require the quantizer to implement `get_quant_config()` and `create_inference_layer()`.
154+
Currently only **GPTQ**, **DBF**, and **AutoBitQuantizer** support these features.
155+
156+
| Quantizer | Save | Quantized PPL/ACC | Fallback |
157+
|--------------------|:----:|:-----------------:|---------------------------|
158+
| `GPTQ` | Yes | Yes | — |
159+
| `DBF` | Yes | Yes | — |
160+
| `AutoBitQuantizer` | Yes | Yes | — |
161+
| `RTN` | — | — | Dequantized (FP16) model |
162+
| `JointQ` | — | — | Dequantized (FP16) model |
163+
| `QUIP` | — | — | Dequantized (FP16) model |
164+
| `CQ` | — | — | Dequantized (FP16) model |
165+
| `ARB` | — | — | Dequantized (FP16) model |
166+
| `QBB` | — | — | Dequantized (FP16) model |
167+
| `Onebit` | — | — | Dequantized (FP16) model |
168+
169+
For unsupported quantizers:
170+
171+
- **PPL/ACC evaluation**: automatically falls back to the dequantized (FP16) model. No error is raised.
172+
- **Saving**: use `save_dequantized_model()` (FP16) or `save_quantization_results()` instead.
173+
149174
## Enabling QEP
150175

151176
QEP adjusts weights before quantization to compensate for error propagation across layers.

docs/user-guide/configuration.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,8 @@ from onecomp import Runner
3636
runner = Runner(
3737
model_config=model_config,
3838
quantizer=quantizer,
39-
max_length=512,
40-
num_calibration_samples=128,
39+
max_length=2048,
40+
num_calibration_samples=512,
4141
qep=False,
4242
)
4343
```
@@ -57,8 +57,8 @@ runner = Runner(
5757
| Parameter | Type | Description | Default |
5858
|-----------------------------|--------|--------------------------------------------------|------------------|
5959
| `calibration_dataset` | `Dataset` | Custom calibration dataset | `None` |
60-
| `max_length` | `int` | Maximum input sequence length | `512` |
61-
| `num_calibration_samples` | `int` | Number of calibration samples | `128` |
60+
| `max_length` | `int` | Maximum input sequence length | `2048` |
61+
| `num_calibration_samples` | `int` | Number of calibration samples | `512` |
6262
| `calibration_strategy` | `str` | Strategy for preparing calibration inputs | `"drop_rand"` |
6363
| `calibration_seed` | `int` | Random seed for calibration | `0` |
6464
| `calibration_batch_size` | `int` | Batch size for chunked calibration | `None` |

example/example_autobit.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@
3030
model_config=ModelConfig(model_id=MODEL_ID, device="cuda:0"),
3131
quantizer=quantizer,
3232
qep=False,
33+
max_length=512,
34+
num_calibration_samples=128,
3335
)
3436
runner.run()
3537

example/example_gptq.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,13 @@
2222
gptq = GPTQ(wbits=3)
2323

2424
# Configure the runner
25-
runner = Runner(model_config=model_config, quantizer=gptq, qep=False)
25+
runner = Runner(
26+
model_config=model_config,
27+
quantizer=gptq,
28+
qep=False,
29+
max_length=512,
30+
num_calibration_samples=128,
31+
)
2632

2733
# Run quantization
2834
runner.run()

example/example_jointq.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,13 @@
2222
jointq = JointQ(bits=4, group_size=128)
2323

2424
# Configure the runner
25-
runner = Runner(model_config=model_config, quantizer=jointq, qep=False)
25+
runner = Runner(
26+
model_config=model_config,
27+
quantizer=jointq,
28+
qep=False,
29+
max_length=512,
30+
num_calibration_samples=128,
31+
)
2632

2733
# Run quantization
2834
runner.run()

0 commit comments

Comments
 (0)