OnnxRuntime GenAI (OGA): flow errors

Following [OnnxRuntime GenAI (OGA) Flow](https://ryzenai.docs.amd.com/en/latest/hybrid_oga.html#onnxruntime-genai-oga-flow)

For these instructions, 
```powershell
cd C:\Windows\System32\AMD
xrt-smi configure --pmode performance
```

I would suggest that you modify to: `.\xrt-smi` if working in Windows Powershell

```powershell
(ryzen-ai-1.6.0) PS C:\Windows\System32\AMD> .\xrt-smi configure --pmode performance

Power mode is set to performance 
```

When trying to copy the executable, the file is not found:

```powershell
(ryzen-ai-1.6.0) PS C:\Users\User\bconsolv\llm_run> xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\LLM\example\model_benchmark.exe" .
File not found - model_benchmark.exe
0 File(s) copied
```

Should be:
```powershell
xcopy /Y $Env:RYZEN_AI_INSTALLATION_PATH\LLM\example\model_benchmark.exe .
xcopy /Y $Env:RYZEN_AI_INSTALLATION_PATH\LLM\example\amd_genai_prompt.txt .
xcopy /Y $Env:RYZEN_AI_INSTALLATION_PATH\deployment\*.dll .
```



Error on powershell:
```powershell
git clone https://huggingface.co/amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
Cloning into 'Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid'...
error: RPC failed; curl 56 schannel: server closed abruptly (missing close_notify)
fatal: the remote end hung up unexpectedly
Deletion of directory 'Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid' failed. Should I try again? (y/n) 
```

Error on git bash:
```bash
bconsolv@AUSBCONSOLV MINGW64 /c/Program Files/RyzenAI/1.5.1/LLM/example
$ git clone https://huggingface.co/amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
fatal: could not create work tree dir 'Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid': Permission denied
```
Had to first turn "Developer Mode" on in Windows Settings -> System -> For developers.


Tried downloading with this script, but this does not get all of the files, like `genai_config.json`

```python
from huggingface_hub import snapshot_download

# Download model files
model_path = snapshot_download(
    repo_id="amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid"
)
print(f"Model downloaded to: {model_path}")
```


Try running model again with:
```python
python 'C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py' -m "C:\Users\bconsolv\.cache\huggingface\hub\models--amd--Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid\snapshots\3846242bb462ffc9994a1423df61d45db473d3a6" -l 256 
```

Get error: 
```powershell
python '..\..\..\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py' -m "Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid" -l 256              
Warning: Invalid or missing prompt input. Using default prompts.
Traceback (most recent call last):
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 140, in <module>
    main()
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 134, in main
    model, tokenizer, model_type = load_model_and_tokenizer(args.model_path, args.verbose)
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 37, in load_model_and_tokenizer
    with open(config_path, 'r') as config_file:
FileNotFoundError: [Errno 2] No such file or directory: 'Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid\\genai_config.json'
```

Tried:
```powershell
git config --global http.postBuffer 1048576000
```

But still getting this error:
```
git clone https://huggingface.co/amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
Cloning into 'Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid'...
error: RPC failed; curl 56 schannel: server closed abruptly (missing close_notify)
```

Had to download this way:
```powershell
pip install huggingface_hub
hf download amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
hf download amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix
```

Now, trying to run model again
```powershell
python 'C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py' -m "C:\Users\bconsolv\.cache\huggingface\hub\models--amd--Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid\snapshots\3846242bb462ffc9994a1423df61d45db473d3a6" -l 256 
```

Getting this error:
```
python 'C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py' -m "C:\Users\bconsolv\.cache\huggingface\hub\models--amd--Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid\snapshots\3846242bb462ffc9994a1423df61d45db473d3a6" -l 256                      
Warning: Invalid or missing prompt input. Using default prompts.
2025-09-10 14:10:06.0917893 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2281 onnxruntime::InferenceSession::Initialize::<lambda_7ef672f4cfe483829c360ecc666489b0>::operator ()] Exception during initialization: invalid unordered_map<K, T> key
Traceback (most recent call last):
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 140, in <module>
    main()
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 134, in main
    model, tokenizer, model_type = load_model_and_tokenizer(args.model_path, args.verbose)
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 44, in load_model_and_tokenizer
    model = og.Model(model_path)
RuntimeError: Exception during initialization: invalid unordered_map<K, T> key
```

Same error when running `model_benchmark.exe`:
```powershell
(ryzen-ai-1.5.1) PS C:\Users\bconsolv\code\llm_run> python 'C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py' -m .\mistral-7b\ -l 256
Warning: Invalid or missing prompt input. Using default prompts.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20250910 16:00:58.569097 11256 vitisai_compile_model.cpp:1157] Vitis AI EP Load ONNX Model Success
I20250910 16:00:58.570195 11256 vitisai_compile_model.cpp:1158] Graph Input Node Name/Shape (66)
I20250910 16:00:58.570195 11256 vitisai_compile_model.cpp:1162]          input_ids : [-1x-1]
I20250910 16:00:58.570195 11256 vitisai_compile_model.cpp:1162]          attention_mask : [-1x-1]
I20250910 16:00:58.570195 11256 vitisai_compile_model.cpp:1162]          past_key_values.0.key : [-1x8x-1x128]

I20250910 16:00:58.576779 11256 vitisai_compile_model.cpp:1172]          present.31.value : [-1x8x-1x128]
[Vitis AI EP] No. of Operators :   CPU    41 MATMULNBITS   195  SSMLP    32 
[Vitis AI EP] No. of Subgraphs :MATMULNBITS    65  SSMLP    32
2025-09-10 16:01:04.8676809 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2281 onnxruntime::InferenceSession::Initialize::<lambda_7ef672f4cfe483829c360ecc666489b0>::operator ()] Exception during initialization: invalid unordered_map<K, T> key
Traceback (most recent call last):
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 140, in <module>
    main()
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 134, in main
    model, tokenizer, model_type = load_model_and_tokenizer(args.model_path, args.verbose)
  File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 44, in load_model_and_tokenizer
    model = og.Model(model_path)
RuntimeError: Exception during initialization: invalid unordered_map<K, T> key

```

Get same error when running the Mistral 7B model:
```powershell
.\model_benchmark.exe -i ".\mistral-7b\" -f .\amd_genai_prompt.txt -l "1024"
```

```text
I20250910 15:59:46.230623  2448 vitisai_compile_model.cpp:1172]          present.31.value : [-1x8x-1x128]
[Vitis AI EP] No. of Operators :   CPU    41 MATMULNBITS   195  SSMLP    32 
[Vitis AI EP] No. of Subgraphs :MATMULNBITS    65  SSMLP    32
2025-09-10 15:59:57.5542646 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2281 onnxruntime::InferenceSession::Initialize::<lambda_7ef672f4cfe483829c360ecc666489b0>::operator ()] Exception during initialization: invalid unordered_map<K, T> key
Exception: Exception during initialization: invalid unordered_map<K, T> key
```

Same with default llama-2-7b model:

```powershell
(ryzen-ai-1.5.1) PS C:\Users\bconsolv\code\llm_run> .\model_benchmark.exe -i .\llama-2-7b\ -f .\amd_genai_prompt.txt -l "1024"
2025-09-10 16:14:45.2229576 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2281 onnxruntime::InferenceSession::Initialize::<lambda_7ef672f4cfe483829c360ecc666489b0>::operator ()] Exception during initialization: invalid unordered_map<K, T> key
Exception: Exception during initialization: invalid unordered_map<K, T> key
```


New error when running model on Ryzen AI 300 series:
```powershell
(ryzen-ai-1.6.0) PS C:\Users\User\bconsolv\llm_run> .\model_benchmark.exe -i Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid -f amd_genai_prompt.txt -l "1024"
Exception: Load model from Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid\Llama-2-7b-chat-hf_jit.onnx failed:This is an invalid model. In Node, ("gqo_3_0", GQO, "com.ryzenai", -1) : ("/model/layers.0/attn/qkv_proj/MatMulNBits/output_0": tensor(float16),"past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qweight": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.scales": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qzeros": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.qweight.packed": tensor(uint8),) -> ("present.0.key": tensor(float16),"present.0.value": tensor(float16),"/model/layers.0/attn/o_proj/MatMulNBits/output_0": tensor(float16),) , Error Node(gqo_3_0) with schema(com.ryzenai::GQO:1) has input size 11 not in range [min=16, max=17].
```

Getting error with `Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid` model as well

```powershell
(ryzen-ai-1.6.0) PS C:\Users\User\bconsolv\llm_run> .\model_benchmark.exe -i Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid -f amd_genai_prompt.txt -l "1024"
Exception: Load model from Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid\Phi-3-mini-4k_jit.onnx failed:This is an invalid model. In Node, ("gqo_3_0", GQO, "com.ryzenai", -1) : ("/model/layers.0/attn/qkv_proj/MatMulNBits/output_0": tensor(float16),"past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qweight": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.scales": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qzeros": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.qweight.packed": tensor(uint8),) -> ("present.0.key": tensor(float16),"present.0.value": tensor(float16),"/model/layers.0/attn/o_proj/MatMulNBits/output_0": tensor(float16),) , Error Node(gqo_3_0) with schema(com.ryzenai::GQO:1) has input size 11 not in range [min=16, max=17].

```

But this model works! Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix

```powershell
(ryzen-ai-1.6.0) PS C:\Users\User\bconsolv\llm_run>.\model_benchmark.exe -i Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix -f .\amd_genai_prompt.txt -l "1024" 

...
20251002 11:53:23.381127 20248 vitisai_compile_model.cpp:1281]          present.31.key : [-1x8x-1x128]
I20251002 11:53:23.381127 20248 vitisai_compile_model.cpp:1281]          present.31.value : [-1x8x-1x128]
[Vitis AI EP] No. of Operators :   CPU    41    NPU   227 
[Vitis AI EP] No. of Subgraphs :   NPU    65    NPU    32 Actually running on NPU     97
Prompt Number of Tokens: 1024
Batch size: 1, prompt tokens: 1024, tokens to generate: 128
Prompt processing (time to first token):
        avg (us):       1.88254e+06
        avg (tokens/s): 543.947
        p50 (us):       1.88269e+06
        stddev (us):    2648.22
        n:              5 * 1024 token(s)
Token generation:
        avg (us):       129720
        avg (tokens/s): 7.7089
        p50 (us):       129866
        stddev (us):    986.988
        n:              635 * 1 token(s)
Token sampling:
        avg (us):       5.84
        avg (tokens/s): 171233
        p50 (us):       6
        stddev (us):    0.403733
        n:              5 * 1 token(s)
E2E generation (entire generation loop):
        avg (ms):       18357.1
        p50 (ms):       18356.7
        stddev (ms):    8.66523
        n:              5
Peak working set size (bytes): 7160365056
```

I tried this path, but it also failed: `amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid`

```powershell
(ryzen-ai-1.6.0) PS C:\Users\User\bconsolv\llm_run> .\model_benchmark.exe -i Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid -f .\amd_genai_prompt.txt -l "1024" 
Exception: Load model from Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid\Mistral-7B-Instruct-v0.3_jit.onnx failed:This is an invalid model. In Node, ("gqo_3_0", GQO, "com.ryzenai", -1) : ("/model/layers.0/attn/qkv_proj/MatMulNBits/output_0": tensor(float16),"past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qweight": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.scales": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qzeros": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.qweight.packed": tensor(uint8),) -> ("present.0.key": tensor(float16),"present.0.value": tensor(float16),"/model/layers.0/attn/o_proj/MatMulNBits/output_0": tensor(float16),) , Error Node(gqo_3_0) with schema(com.ryzenai::GQO:1) has input size 11 not in range [min=16, max=17].
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OnnxRuntime GenAI (OGA): flow errors #282

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OnnxRuntime GenAI (OGA): flow errors #282

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions