Skip to content

mobilenetv3 int8 cannot run on NPU #310

@pantingchn

Description

@pantingchn
Image

this is the onnx of my model, a very simple model.
I use the code the tutorial provided to quantize to int8

`quant_config = get_default_config("XINT8")
config = Config(global_quant_config=quant_config)
print(f"The configuration for quantization is {config}")

# Create an ONNX quantizer
quantizer = ModelQuantizer(config)

# Quantize the ONNX model
quantizer.quantize_model(input_model_path, output_model_path, dr)`

But why the quantized model can't all run on the NPU, lots of ops still run on the CPU......

--------------------------------------------------- TEST SUMMARY
model mobilenetv3_int8_BS1
threads 1 APU STX
--------------------------------------------------- PERFORMANCE
throughput 15.25 [fps]
latency 65.48 [ms]
processed images 200 preloaded images 100 requested images 100
---------------------------------------------------NODES DISTRIBUTION
total nodes 519
CPU 54
NPU 353
VITIS_EP_CPU 112

I also try the example you provided in "advanced_quark_quantize.py", the mobilenetv2 can't all run on the NPU as well...... Only the resnet18 can all run on the NPU....

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions