-
Notifications
You must be signed in to change notification settings - Fork 166
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Prerequisites
- I have searched the existing issues and confirmed this is not a duplicate.
- I am using the latest version of the MLLM framework.
Bug Description
The environment similar to #574, include
Ubuntu: 22.04.5
QNN SDK: 2.41.0.251128
Hexagon NPU Runtime: 6.4.0.1
And i also trying to run mllm-qwen-npu with model Qwen1.5-1.8B-Chat on my OnePlus Ace5 pro but failed.
Steps to Reproduce
- I tried the step on [v2] Missing demo scripts for Android QNN backend execution (was available in v1) #560, nothing but the NPU I used is v79
- And the result
D_LIBRARY_PATH=/data/local/tmp/build-android-arm64-v8a-qnn/bin/ ./mllm-qwen-npu <
[INFO] /home/zcm/mllm/examples/qwen_npu/main.cpp:33 Mixed inference mode: NPU prefill + CPU decode
[INFO] /home/zcm/mllm/examples/qwen_npu/main.cpp:58 CPU decode model loaded from: /data/local/tmp/zhanghao/models/qwen1.5-1.8b-chat-rot_q4_0.mllm
[INFO] /home/zcm/mllm/examples/qwen_npu/main.cpp:62 Loading QNN model for prefill...
[INFO] /home/zcm/mllm/mllm/backends/qnn/QNNUtils.cpp:23 QNN Backend Lib: libQnnHtp.so
[INFO] /home/zcm/mllm/mllm/backends/qnn/QNNBackend.cpp:306 Registered Op Package: libQnnLLaMAPackage_CPU.so and interface provider: LLaMAPackageInterfaceProvider
[INFO] /home/zcm/mllm/mllm/backends/qnn/QNNBackend.cpp:306 Registered Op Package: libQnnLLaMAPackage_HTP.so and interface provider: LLaMAPackageInterfaceProvider
[INFO] /home/zcm/mllm/mllm/backends/qnn/QNNBackend.cpp:47 QNN Backend Build Id: v2.41.0.251128145156_191518
[INFO] /home/zcm/mllm/mllm/backends/qnn/QNNBackend.cpp:49 QNN backend supports tensor sparsity
[INFO] /home/zcm/mllm/mllm/backends/qnn/QNNBackend.cpp:52 QNN backend supports dynamic dimensions
[INFO] /home/zcm/mllm/mllm/backends/base/PluginSystem.cpp:89 Register customized op: DequantizeAdd:4097 -> QNN
[INFO] /home/zcm/mllm/examples/qwen_npu/main.cpp:72 Created shared StaticCache with 24 layers
[INFO] /home/zcm/mllm/examples/qwen_npu/main.cpp:77 QNN prefill model loaded from: /data/local/tmp/zhanghao/models/qwen1.5-1.8b-chat-rot-qnn.mllm
[INFO] /home/zcm/mllm/examples/qwen_npu/main.cpp:87 Configured 24 QNN KVCache layers to use shared StaticCache
[INFO] /home/zcm/mllm/examples/qwen_npu/main.cpp:97 Configured 24 CPU KVCache layers to use shared StaticCache
[INFO] /home/zcm/mllm/examples/qwen_npu/main.cpp:110 Input tokens: 194 tokens
[INFO] /home/zcm/mllm/examples/qwen_npu/main.cpp:179 Starting QNN prefill...
enter npu_model->trace(past, {})
leave npu_model->trace(past, {})
[INFO] /home/zcm/mllm/examples/qwen_npu/main.cpp:203 ************************enter graphBuildPM.run()
linalg.CPU.RMSNormOp [name="model.layers.0.input_layernorm"](%1577:tensor<[1, 194, 2048], Float32, QNN>[is_graph_input:true]) -> (%1578:tensor<[1, 194, 2048], Float32, QNN>)[ERROR] /home/zcm/mllm/mllm/backends/qnn/op/QNNRMSNormOp.cpp:48 Failed to cast to QNNRMSNormOp
[ERROR] /home/zcm/mllm/mllm/backends/qnn/passes/QNNGraphBuildPass.cpp:164 Failed to add node for op type: RMSNorm in graph 'model.layers.0_1'
[ERROR] /home/zcm/mllm/mllm/backends/qnn/op/QNNLinearOp.cpp:135 Failed to cast to QNNLinearOp
[ERROR] /home/zcm/mllm/mllm/backends/qnn/passes/QNNGraphBuildPass.cpp:164 Failed to add node for op type: Linear in graph 'model.layers.0_2'
linalg.CPU.RMSNormOp [name="model.layers.1.input_layernorm"](%1644:tensor<[1, 194, 2048], Float32, QNN>[is_graph_input:true, is_graph_output:true]) -> (%1645:tensor<[1, 194, 2048], Float32, QNN>)[ERROR] /home/zcm/mllm/mllm/backends/qnn/op/QNNRMSNormOp.cpp:48 Failed to cast to QNNRMSNormOp
[ERROR] /home/zcm/mllm/mllm/backends/qnn/passes/QNNGraphBuildPass.cpp:164 Failed to add node for op type: RMSNorm in graph 'model.layers.1_1'
[ERROR] /home/zcm/mllm/mllm/backends/qnn/op/QNNLinearOp.cpp:135 Failed to cast to QNNLinearOp
[ERROR] /home/zcm/mllm/mllm/backends/qnn/passes/QNNGraphBuildPass.cpp:164 Failed to add node for op type: Linear in graph 'model.layers.1_2'
...
[INFO] /home/zcm/mllm/examples/qwen_npu/main.cpp:205 ************************out graphBuildPM.run()
[WARN] /home/zcm/mllm/mllm/backends/cpu/kernels/common/ggml/vec_dot_type.hpp:181 Unsupported DataType Int8
[WARN] /home/zcm/mllm/mllm/backends/cpu/kernels/common/ggml/vec_dot_type.hpp:181 Unsupported DataType Int8
Segmentation fault
I read the source code and found when the Module reg its Layer, it also reg the BaseOp of the Layer
template<typename T, typename... Args>
auto reg(const std::string& name, Args&&... args) {
// Register a module
if constexpr (std::is_base_of_v<Module, T>) {
auto ret = T((impl_->getAbsoluteName() == "" ? name
: impl_->getAbsoluteName() + (name == "" ? "" : ".") + // avoid double dot
name),
std::forward<Args>(args)...);
impl_->regChildNode(ret.impl());
ret.impl()->setName(name);
return ret;
}
// Register to thisThread table
if constexpr (std::is_base_of_v<Layer, T>) {
auto ret = T(std::forward<Args>(args)...);
impl_->regChildNode(ret.impl());
ret.impl()->setAbsoluteName((impl_->getAbsoluteName() == ""
? name
: impl_->getAbsoluteName() + (name == "" ? "" : ".") + // avoid double dot
name));
ret.impl()->setName(name);
auto& ctx = Context::instance();
// Create Op
BaseOp::ptr_t _op = nullptr;
_op = ctx.getBackend(ret.impl()->getDevice())->createOp(ret.opType(), ret.refOptions());
_op->setName(ret.impl()->getAbsoluteName());
// Register Op
ret.impl()->setInstancedOp(_op);
return ret;
}
}
but the default value of ret.impl()->getDevice() is always kCPU, which means the Op's backend is CPUBackend(as well as the ). I don't know it's right, maybe i ingore some details. I dump the subgraph, just like
graph.SubGraphOp @model.layers.0_1 <QNN> {
(%1577:tensor<[1, 194, 2048], Float32, QNN>[is_graph_input:true]) -> (%1594:tensor<[1, 16, 194, 128], Float32, QNN>[is_graph_output:true], %1595:tensor<[1, 16, 194, 128], Float32, QNN>[is_graph_output:true], %1596:tensor<[1, 16, 194, 128], Float32, QNN>[is_graph_output:true]) {
linalg.CPU.RMSNormOp [name="model.layers.0.input_layernorm"](%1577:tensor<[1, 194, 2048], Float32, QNN>[is_graph_input:true]) -> (%1578:tensor<[1, 194, 2048], Float32, QNN>)
linalg.QNN.ViewOp [name="model.layers.0_1.View.0"](%1578:tensor<[1, 194, 2048], Float32, QNN>) -> (%1579:tensor<[1, 194, 1, 2048], Float32, QNN>)
linalg.QNN.CastTypeOp [name="model.layers.0_1.CastType.0"](%1579:tensor<[1, 194, 1, 2048], Float32, QNN>) -> (%1580:tensor<[1, 194, 1, 2048], Int16, QNN>)
linalg.CPU.LinearOp [name="model.layers.0.self_attn.q_proj"](%1580:tensor<[1, 194, 1, 2048], Int16, QNN>) -> (%1582:tensor<[1, 194, 1, 2048], Int16, QNN>)
linalg.CPU.LinearOp [name="model.layers.0.self_attn.k_proj"](%1580:tensor<[1, 194, 1, 2048], Int16, QNN>) -> (%1583:tensor<[1, 194, 1, 2048], Int16, QNN>)
linalg.CPU.LinearOp [name="model.layers.0.self_attn.v_proj"](%1580:tensor<[1, 194, 1, 2048], Int16, QNN>) -> (%1584:tensor<[1, 194, 1, 2048], Int16, QNN>)
linalg.QNN.ViewOp [name="model.layers.0_1.View.1"](%1582:tensor<[1, 194, 1, 2048], Int16, QNN>) -> (%1585:tensor<[1, 194, 16, 128], Int16, QNN>)
linalg.QNN.ViewOp [name="model.layers.0_1.View.2"](%1583:tensor<[1, 194, 1, 2048], Int16, QNN>) -> (%1587:tensor<[1, 194, 16, 128], Int16, QNN>)
linalg.QNN.ViewOp [name="model.layers.0_1.View.3"](%1584:tensor<[1, 194, 1, 2048], Int16, QNN>) -> (%1589:tensor<[1, 194, 16, 128], Int16, QNN>)
linalg.QNN.DequantizeAdd [name="model.layers.0.self_attn.q_proj.dequantize"](%1585:tensor<[1, 194, 16, 128], Int16, QNN>) -> (%1591:tensor<[1, 194, 16, 128], Float32, QNN>)
linalg.QNN.DequantizeAdd [name="model.layers.0.self_attn.k_proj.dequantize"](%1587:tensor<[1, 194, 16, 128], Int16, QNN>) -> (%1592:tensor<[1, 194, 16, 128], Float32, QNN>)
linalg.QNN.DequantizeAdd [name="model.layers.0.self_attn.v_proj.dequantize"](%1589:tensor<[1, 194, 16, 128], Int16, QNN>) -> (%1593:tensor<[1, 194, 16, 128], Float32, QNN>)
linalg.QNN.TransposeOp [name="model.layers.0_1.Transpose.0"](%1591:tensor<[1, 194, 16, 128], Float32, QNN>) -> (%1594:tensor<[1, 16, 194, 128], Float32, QNN>[is_graph_output:true])
linalg.QNN.TransposeOp [name="model.layers.0_1.Transpose.1"](%1592:tensor<[1, 194, 16, 128], Float32, QNN>) -> (%1595:tensor<[1, 16, 194, 128], Float32, QNN>[is_graph_output:true])
linalg.QNN.TransposeOp [name="model.layers.0_1.Transpose.2"](%1593:tensor<[1, 194, 16, 128], Float32, QNN>) -> (%1596:tensor<[1, 16, 194, 128], Float32, QNN>[is_graph_output:true])
cf.ReturnOp (%1594:tensor<[1, 16, 194, 128], Float32, QNN>[is_graph_output:true], %1595:tensor<[1, 16, 194, 128], Float32, QNN>[is_graph_output:true], %1596:tensor<[1, 16, 194, 128], Float32, QNN>[is_graph_output:true]) -> ()
}
}
Is it an issue with with the above QNN/Hexagon versions? Or could there be other possible causes?
Thanks for your help.
Expected Behavior
work correctly
Operating System
Android
Device
OnePlus Ace5 pro
MLLM Framework Version
current version
Model Information
No response
Additional Context
No response
coderabbitai
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working