Skip to content

Support for Non-Qwen Models on Hexagon NPU (e.g., LLaMA, Gemma, Mistral)? #598

@kimminsu38oo

Description

@kimminsu38oo

According to the paper “Fast On-Device LLM Inference with NPUs”, experiments were conducted not only with Qwen1.5-1.8B, but also with Gemma-2B, Phi-2.7B, LLaMA2-Chat-7B, and Mistral-7B.

Image

However, if you look at the model card here:
https://github.com/UbiquitousLearning/mllm?tab=readme-ov-file#supported-models

it states that for Hexagon NPU inference, only Qwen1.5 models from 0.5B to 1.5B are supported (including PhoneLM-1.5 and Qwen2-VL).

In addition, the mllm-based paper
“Accelerating Mobile Language Models via Speculative Decoding and NPU-Coordinated Execution” mentions that the LLaMA-3.2-3B model was also used.

Image

Is it possible to run models not explicitly listed in the model card (such as LLaMA-3.2-3B or LLaMA2-Chat-7B) on the NPU?
If so, is there any manual or documentation available?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions