Support for Non-Qwen Models on Hexagon NPU (e.g., LLaMA, Gemma, Mistral)?

According to the paper “Fast On-Device LLM Inference with NPUs”, experiments were conducted not only with Qwen1.5-1.8B, but also with Gemma-2B, Phi-2.7B, LLaMA2-Chat-7B, and Mistral-7B.

<img width="626" height="155" alt="Image" src="https://github.com/user-attachments/assets/58a5af7b-3904-4855-b1c4-62f13fefc3ea" />


However, if you look at the model card here:
[https://github.com/UbiquitousLearning/mllm?tab=readme-ov-file#supported-models](url)

it states that for Hexagon NPU inference, only Qwen1.5 models from 0.5B to 1.5B are supported (including PhoneLM-1.5 and Qwen2-VL).

In addition, the mllm-based paper 
“Accelerating Mobile Language Models via Speculative Decoding and NPU-Coordinated Execution” mentions that the LLaMA-3.2-3B model was also used.

<img width="663" height="102" alt="Image" src="https://github.com/user-attachments/assets/99e410b4-4628-4e30-8e04-7220d41a5a03" />

Is it possible to run models not explicitly listed in the model card (such as LLaMA-3.2-3B or LLaMA2-Chat-7B) on the NPU?
If so, is there any manual or documentation available?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Non-Qwen Models on Hexagon NPU (e.g., LLaMA, Gemma, Mistral)? #598

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support for Non-Qwen Models on Hexagon NPU (e.g., LLaMA, Gemma, Mistral)? #598

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions