Skip to content

OnnxRuntime GenAI (OGA): flow errors #282

@bconsolvo

Description

@bconsolvo

Following OnnxRuntime GenAI (OGA) Flow

For these instructions,

cd C:\Windows\System32\AMD
xrt-smi configure --pmode performance

I would suggest that you modify to: .\xrt-smi if working in Windows Powershell

(ryzen-ai-1.6.0) PS C:\Windows\System32\AMD> .\xrt-smi configure --pmode performance

Power mode is set to performance 

When trying to copy the executable, the file is not found:

(ryzen-ai-1.6.0) PS C:\Users\User\bconsolv\llm_run> xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\LLM\example\model_benchmark.exe" .
File not found - model_benchmark.exe
0 File(s) copied

Should be:

xcopy /Y $Env:RYZEN_AI_INSTALLATION_PATH\LLM\example\model_benchmark.exe .
xcopy /Y $Env:RYZEN_AI_INSTALLATION_PATH\LLM\example\amd_genai_prompt.txt .
xcopy /Y $Env:RYZEN_AI_INSTALLATION_PATH\deployment\*.dll .

Error on powershell:

git clone https://huggingface.co/amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
Cloning into 'Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid'...
error: RPC failed; curl 56 schannel: server closed abruptly (missing close_notify)
fatal: the remote end hung up unexpectedly
Deletion of directory 'Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid' failed. Should I try again? (y/n) 

Error on git bash:

bconsolv@AUSBCONSOLV MINGW64 /c/Program Files/RyzenAI/1.5.1/LLM/example
$ git clone https://huggingface.co/amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
fatal: could not create work tree dir 'Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid': Permission denied

Had to first turn "Developer Mode" on in Windows Settings -> System -> For developers.

Tried downloading with this script, but this does not get all of the files, like genai_config.json

from huggingface_hub import snapshot_download

# Download model files
model_path = snapshot_download(
    repo_id="amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid"
)
print(f"Model downloaded to: {model_path}")

Try running model again with:

python 'C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py' -m "C:\Users\bconsolv\.cache\huggingface\hub\models--amd--Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid\snapshots\3846242bb462ffc9994a1423df61d45db473d3a6" -l 256 

Get error:

python '..\..\..\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py' -m "Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid" -l 256              
Warning: Invalid or missing prompt input. Using default prompts.
Traceback (most recent call last):
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 140, in <module>
    main()
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 134, in main
    model, tokenizer, model_type = load_model_and_tokenizer(args.model_path, args.verbose)
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 37, in load_model_and_tokenizer
    with open(config_path, 'r') as config_file:
FileNotFoundError: [Errno 2] No such file or directory: 'Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid\\genai_config.json'

Tried:

git config --global http.postBuffer 1048576000

But still getting this error:

git clone https://huggingface.co/amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
Cloning into 'Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid'...
error: RPC failed; curl 56 schannel: server closed abruptly (missing close_notify)

Had to download this way:

pip install huggingface_hub
hf download amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
hf download amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix

Now, trying to run model again

python 'C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py' -m "C:\Users\bconsolv\.cache\huggingface\hub\models--amd--Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid\snapshots\3846242bb462ffc9994a1423df61d45db473d3a6" -l 256 

Getting this error:

python 'C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py' -m "C:\Users\bconsolv\.cache\huggingface\hub\models--amd--Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid\snapshots\3846242bb462ffc9994a1423df61d45db473d3a6" -l 256                      
Warning: Invalid or missing prompt input. Using default prompts.
2025-09-10 14:10:06.0917893 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2281 onnxruntime::InferenceSession::Initialize::<lambda_7ef672f4cfe483829c360ecc666489b0>::operator ()] Exception during initialization: invalid unordered_map<K, T> key
Traceback (most recent call last):
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 140, in <module>
    main()
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 134, in main
    model, tokenizer, model_type = load_model_and_tokenizer(args.model_path, args.verbose)
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 44, in load_model_and_tokenizer
    model = og.Model(model_path)
RuntimeError: Exception during initialization: invalid unordered_map<K, T> key

Same error when running model_benchmark.exe:

(ryzen-ai-1.5.1) PS C:\Users\bconsolv\code\llm_run> python 'C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py' -m .\mistral-7b\ -l 256
Warning: Invalid or missing prompt input. Using default prompts.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20250910 16:00:58.569097 11256 vitisai_compile_model.cpp:1157] Vitis AI EP Load ONNX Model Success
I20250910 16:00:58.570195 11256 vitisai_compile_model.cpp:1158] Graph Input Node Name/Shape (66)
I20250910 16:00:58.570195 11256 vitisai_compile_model.cpp:1162]          input_ids : [-1x-1]
I20250910 16:00:58.570195 11256 vitisai_compile_model.cpp:1162]          attention_mask : [-1x-1]
I20250910 16:00:58.570195 11256 vitisai_compile_model.cpp:1162]          past_key_values.0.key : [-1x8x-1x128]

I20250910 16:00:58.576779 11256 vitisai_compile_model.cpp:1172]          present.31.value : [-1x8x-1x128]
[Vitis AI EP] No. of Operators :   CPU    41 MATMULNBITS   195  SSMLP    32 
[Vitis AI EP] No. of Subgraphs :MATMULNBITS    65  SSMLP    32
2025-09-10 16:01:04.8676809 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2281 onnxruntime::InferenceSession::Initialize::<lambda_7ef672f4cfe483829c360ecc666489b0>::operator ()] Exception during initialization: invalid unordered_map<K, T> key
Traceback (most recent call last):
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 140, in <module>
    main()
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 134, in main
    model, tokenizer, model_type = load_model_and_tokenizer(args.model_path, args.verbose)
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 44, in load_model_and_tokenizer
    model = og.Model(model_path)
RuntimeError: Exception during initialization: invalid unordered_map<K, T> key

Get same error when running the Mistral 7B model:

.\model_benchmark.exe -i ".\mistral-7b\" -f .\amd_genai_prompt.txt -l "1024"
I20250910 15:59:46.230623  2448 vitisai_compile_model.cpp:1172]          present.31.value : [-1x8x-1x128]
[Vitis AI EP] No. of Operators :   CPU    41 MATMULNBITS   195  SSMLP    32 
[Vitis AI EP] No. of Subgraphs :MATMULNBITS    65  SSMLP    32
2025-09-10 15:59:57.5542646 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2281 onnxruntime::InferenceSession::Initialize::<lambda_7ef672f4cfe483829c360ecc666489b0>::operator ()] Exception during initialization: invalid unordered_map<K, T> key
Exception: Exception during initialization: invalid unordered_map<K, T> key

Same with default llama-2-7b model:

(ryzen-ai-1.5.1) PS C:\Users\bconsolv\code\llm_run> .\model_benchmark.exe -i .\llama-2-7b\ -f .\amd_genai_prompt.txt -l "1024"
2025-09-10 16:14:45.2229576 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2281 onnxruntime::InferenceSession::Initialize::<lambda_7ef672f4cfe483829c360ecc666489b0>::operator ()] Exception during initialization: invalid unordered_map<K, T> key
Exception: Exception during initialization: invalid unordered_map<K, T> key

New error when running model on Ryzen AI 300 series:

(ryzen-ai-1.6.0) PS C:\Users\User\bconsolv\llm_run> .\model_benchmark.exe -i Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid -f amd_genai_prompt.txt -l "1024"
Exception: Load model from Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid\Llama-2-7b-chat-hf_jit.onnx failed:This is an invalid model. In Node, ("gqo_3_0", GQO, "com.ryzenai", -1) : ("/model/layers.0/attn/qkv_proj/MatMulNBits/output_0": tensor(float16),"past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qweight": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.scales": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qzeros": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.qweight.packed": tensor(uint8),) -> ("present.0.key": tensor(float16),"present.0.value": tensor(float16),"/model/layers.0/attn/o_proj/MatMulNBits/output_0": tensor(float16),) , Error Node(gqo_3_0) with schema(com.ryzenai::GQO:1) has input size 11 not in range [min=16, max=17].

Getting error with Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid model as well

(ryzen-ai-1.6.0) PS C:\Users\User\bconsolv\llm_run> .\model_benchmark.exe -i Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid -f amd_genai_prompt.txt -l "1024"
Exception: Load model from Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid\Phi-3-mini-4k_jit.onnx failed:This is an invalid model. In Node, ("gqo_3_0", GQO, "com.ryzenai", -1) : ("/model/layers.0/attn/qkv_proj/MatMulNBits/output_0": tensor(float16),"past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qweight": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.scales": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qzeros": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.qweight.packed": tensor(uint8),) -> ("present.0.key": tensor(float16),"present.0.value": tensor(float16),"/model/layers.0/attn/o_proj/MatMulNBits/output_0": tensor(float16),) , Error Node(gqo_3_0) with schema(com.ryzenai::GQO:1) has input size 11 not in range [min=16, max=17].

But this model works! Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix

(ryzen-ai-1.6.0) PS C:\Users\User\bconsolv\llm_run>.\model_benchmark.exe -i Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix -f .\amd_genai_prompt.txt -l "1024" 

...
20251002 11:53:23.381127 20248 vitisai_compile_model.cpp:1281]          present.31.key : [-1x8x-1x128]
I20251002 11:53:23.381127 20248 vitisai_compile_model.cpp:1281]          present.31.value : [-1x8x-1x128]
[Vitis AI EP] No. of Operators :   CPU    41    NPU   227 
[Vitis AI EP] No. of Subgraphs :   NPU    65    NPU    32 Actually running on NPU     97
Prompt Number of Tokens: 1024
Batch size: 1, prompt tokens: 1024, tokens to generate: 128
Prompt processing (time to first token):
        avg (us):       1.88254e+06
        avg (tokens/s): 543.947
        p50 (us):       1.88269e+06
        stddev (us):    2648.22
        n:              5 * 1024 token(s)
Token generation:
        avg (us):       129720
        avg (tokens/s): 7.7089
        p50 (us):       129866
        stddev (us):    986.988
        n:              635 * 1 token(s)
Token sampling:
        avg (us):       5.84
        avg (tokens/s): 171233
        p50 (us):       6
        stddev (us):    0.403733
        n:              5 * 1 token(s)
E2E generation (entire generation loop):
        avg (ms):       18357.1
        p50 (ms):       18356.7
        stddev (ms):    8.66523
        n:              5
Peak working set size (bytes): 7160365056

I tried this path, but it also failed: amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid

(ryzen-ai-1.6.0) PS C:\Users\User\bconsolv\llm_run> .\model_benchmark.exe -i Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid -f .\amd_genai_prompt.txt -l "1024" 
Exception: Load model from Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid\Mistral-7B-Instruct-v0.3_jit.onnx failed:This is an invalid model. In Node, ("gqo_3_0", GQO, "com.ryzenai", -1) : ("/model/layers.0/attn/qkv_proj/MatMulNBits/output_0": tensor(float16),"past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qweight": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.scales": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qzeros": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.qweight.packed": tensor(uint8),) -> ("present.0.key": tensor(float16),"present.0.value": tensor(float16),"/model/layers.0/attn/o_proj/MatMulNBits/output_0": tensor(float16),) , Error Node(gqo_3_0) with schema(com.ryzenai::GQO:1) has input size 11 not in range [min=16, max=17].

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions