Run converted Phi3.5-Mini-Instruction model Failed #300

jarodxiangliu · 2025-11-10T03:02:56Z

jarodxiangliu
Nov 10, 2025

I would like to run a local Phi 3.5 mini instruction model that is ultimately fine-tuned on PCs with NPU (Snapdragon Hexagon).
I converted the model downloaded from Huggingface and converted with AI Toolkit in vscode. Seems conversion is successful. (I converted Qwen 2.5 1.5b model in the same way and could be loaded successfully in foundry local)
But when I try to use the converted Phi3.5-Mini-Instruction. It failed. Please help resolve my issue. Thanks!
Below are details for the issue:
Environment
OS: Win11 25H2 26200.6901
NPU: Snapdragon X Elite - X1E78100 - Qualcomm Hexagon NPU
Driver version: 30.0.143.0
Foundry local: 0.7.120+3b92ed4014

Reproduce steps

Set foundry local cache directory to my local model path
Run with "foundry model run Customized-Phi3.5-Mini-3.8B-qnn-npu"
The output is as below:
🕒 Loading model... [15:21:18 ERR] Failed loading model:Customized-Phi3.5-Mini-3.8B-qnn-npu
Exception: Failed: Loading model Customized-Phi3.5-Mini-3.8B-qnn-npu from http://127.0.0.1:57698/openai/load/Customized-Phi3.5-Mini-3.8B-qnn-npu?ttl=600
Internal Server Error
Failed loading model Customized-Phi3.5-Mini-3.8B-qnn-npu
Failed to load from EpContext model. qnn_backend_manager.cc:1138 onnxruntime::qnn::QnnBackendManager::LoadCachedQnnContextFromBuffer Failed to create context from binary. Error code: 1002

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run converted Phi3.5-Mini-Instruction model Failed #300

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Run converted Phi3.5-Mini-Instruction model Failed #300

Uh oh!

jarodxiangliu Nov 10, 2025

Replies: 0 comments

jarodxiangliu
Nov 10, 2025