-
Notifications
You must be signed in to change notification settings - Fork 110
Description
this is the onnx of my model, a very simple model.
I use the code the tutorial provided to quantize to int8
`quant_config = get_default_config("XINT8")
config = Config(global_quant_config=quant_config)
print(f"The configuration for quantization is {config}")
# Create an ONNX quantizer
quantizer = ModelQuantizer(config)
# Quantize the ONNX model
quantizer.quantize_model(input_model_path, output_model_path, dr)`
But why the quantized model can't all run on the NPU, lots of ops still run on the CPU......
--------------------------------------------------- TEST SUMMARY
model mobilenetv3_int8_BS1
threads 1 APU STX
--------------------------------------------------- PERFORMANCE
throughput 15.25 [fps]
latency 65.48 [ms]
processed images 200 preloaded images 100 requested images 100
---------------------------------------------------NODES DISTRIBUTION
total nodes 519
CPU 54
NPU 353
VITIS_EP_CPU 112
I also try the example you provided in "advanced_quark_quantize.py", the mobilenetv2 can't all run on the NPU as well...... Only the resnet18 can all run on the NPU....