-
Notifications
You must be signed in to change notification settings - Fork 110
Description
Following OnnxRuntime GenAI (OGA) Flow
For these instructions,
cd C:\Windows\System32\AMD
xrt-smi configure --pmode performanceI would suggest that you modify to: .\xrt-smi if working in Windows Powershell
(ryzen-ai-1.6.0) PS C:\Windows\System32\AMD> .\xrt-smi configure --pmode performance
Power mode is set to performance When trying to copy the executable, the file is not found:
(ryzen-ai-1.6.0) PS C:\Users\User\bconsolv\llm_run> xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\LLM\example\model_benchmark.exe" .
File not found - model_benchmark.exe
0 File(s) copiedShould be:
xcopy /Y $Env:RYZEN_AI_INSTALLATION_PATH\LLM\example\model_benchmark.exe .
xcopy /Y $Env:RYZEN_AI_INSTALLATION_PATH\LLM\example\amd_genai_prompt.txt .
xcopy /Y $Env:RYZEN_AI_INSTALLATION_PATH\deployment\*.dll .Error on powershell:
git clone https://huggingface.co/amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
Cloning into 'Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid'...
error: RPC failed; curl 56 schannel: server closed abruptly (missing close_notify)
fatal: the remote end hung up unexpectedly
Deletion of directory 'Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid' failed. Should I try again? (y/n) Error on git bash:
bconsolv@AUSBCONSOLV MINGW64 /c/Program Files/RyzenAI/1.5.1/LLM/example
$ git clone https://huggingface.co/amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
fatal: could not create work tree dir 'Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid': Permission deniedHad to first turn "Developer Mode" on in Windows Settings -> System -> For developers.
Tried downloading with this script, but this does not get all of the files, like genai_config.json
from huggingface_hub import snapshot_download
# Download model files
model_path = snapshot_download(
repo_id="amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid"
)
print(f"Model downloaded to: {model_path}")Try running model again with:
python 'C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py' -m "C:\Users\bconsolv\.cache\huggingface\hub\models--amd--Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid\snapshots\3846242bb462ffc9994a1423df61d45db473d3a6" -l 256 Get error:
python '..\..\..\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py' -m "Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid" -l 256
Warning: Invalid or missing prompt input. Using default prompts.
Traceback (most recent call last):
File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 140, in <module>
main()
File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 134, in main
model, tokenizer, model_type = load_model_and_tokenizer(args.model_path, args.verbose)
File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 37, in load_model_and_tokenizer
with open(config_path, 'r') as config_file:
FileNotFoundError: [Errno 2] No such file or directory: 'Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid\\genai_config.json'Tried:
git config --global http.postBuffer 1048576000But still getting this error:
git clone https://huggingface.co/amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
Cloning into 'Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid'...
error: RPC failed; curl 56 schannel: server closed abruptly (missing close_notify)
Had to download this way:
pip install huggingface_hub
hf download amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
hf download amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strixNow, trying to run model again
python 'C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py' -m "C:\Users\bconsolv\.cache\huggingface\hub\models--amd--Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid\snapshots\3846242bb462ffc9994a1423df61d45db473d3a6" -l 256 Getting this error:
python 'C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py' -m "C:\Users\bconsolv\.cache\huggingface\hub\models--amd--Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid\snapshots\3846242bb462ffc9994a1423df61d45db473d3a6" -l 256
Warning: Invalid or missing prompt input. Using default prompts.
2025-09-10 14:10:06.0917893 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2281 onnxruntime::InferenceSession::Initialize::<lambda_7ef672f4cfe483829c360ecc666489b0>::operator ()] Exception during initialization: invalid unordered_map<K, T> key
Traceback (most recent call last):
File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 140, in <module>
main()
File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 134, in main
model, tokenizer, model_type = load_model_and_tokenizer(args.model_path, args.verbose)
File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 44, in load_model_and_tokenizer
model = og.Model(model_path)
RuntimeError: Exception during initialization: invalid unordered_map<K, T> key
Same error when running model_benchmark.exe:
(ryzen-ai-1.5.1) PS C:\Users\bconsolv\code\llm_run> python 'C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py' -m .\mistral-7b\ -l 256
Warning: Invalid or missing prompt input. Using default prompts.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20250910 16:00:58.569097 11256 vitisai_compile_model.cpp:1157] Vitis AI EP Load ONNX Model Success
I20250910 16:00:58.570195 11256 vitisai_compile_model.cpp:1158] Graph Input Node Name/Shape (66)
I20250910 16:00:58.570195 11256 vitisai_compile_model.cpp:1162] input_ids : [-1x-1]
I20250910 16:00:58.570195 11256 vitisai_compile_model.cpp:1162] attention_mask : [-1x-1]
I20250910 16:00:58.570195 11256 vitisai_compile_model.cpp:1162] past_key_values.0.key : [-1x8x-1x128]
I20250910 16:00:58.576779 11256 vitisai_compile_model.cpp:1172] present.31.value : [-1x8x-1x128]
[Vitis AI EP] No. of Operators : CPU 41 MATMULNBITS 195 SSMLP 32
[Vitis AI EP] No. of Subgraphs :MATMULNBITS 65 SSMLP 32
2025-09-10 16:01:04.8676809 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2281 onnxruntime::InferenceSession::Initialize::<lambda_7ef672f4cfe483829c360ecc666489b0>::operator ()] Exception during initialization: invalid unordered_map<K, T> key
Traceback (most recent call last):
File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 140, in <module>
main()
File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 134, in main
model, tokenizer, model_type = load_model_and_tokenizer(args.model_path, args.verbose)
File "C:\Program Files\RyzenAI\1.5.1\LLM\example\run_model.py", line 44, in load_model_and_tokenizer
model = og.Model(model_path)
RuntimeError: Exception during initialization: invalid unordered_map<K, T> key
Get same error when running the Mistral 7B model:
.\model_benchmark.exe -i ".\mistral-7b\" -f .\amd_genai_prompt.txt -l "1024"I20250910 15:59:46.230623 2448 vitisai_compile_model.cpp:1172] present.31.value : [-1x8x-1x128]
[Vitis AI EP] No. of Operators : CPU 41 MATMULNBITS 195 SSMLP 32
[Vitis AI EP] No. of Subgraphs :MATMULNBITS 65 SSMLP 32
2025-09-10 15:59:57.5542646 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2281 onnxruntime::InferenceSession::Initialize::<lambda_7ef672f4cfe483829c360ecc666489b0>::operator ()] Exception during initialization: invalid unordered_map<K, T> key
Exception: Exception during initialization: invalid unordered_map<K, T> key
Same with default llama-2-7b model:
(ryzen-ai-1.5.1) PS C:\Users\bconsolv\code\llm_run> .\model_benchmark.exe -i .\llama-2-7b\ -f .\amd_genai_prompt.txt -l "1024"
2025-09-10 16:14:45.2229576 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2281 onnxruntime::InferenceSession::Initialize::<lambda_7ef672f4cfe483829c360ecc666489b0>::operator ()] Exception during initialization: invalid unordered_map<K, T> key
Exception: Exception during initialization: invalid unordered_map<K, T> keyNew error when running model on Ryzen AI 300 series:
(ryzen-ai-1.6.0) PS C:\Users\User\bconsolv\llm_run> .\model_benchmark.exe -i Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid -f amd_genai_prompt.txt -l "1024"
Exception: Load model from Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid\Llama-2-7b-chat-hf_jit.onnx failed:This is an invalid model. In Node, ("gqo_3_0", GQO, "com.ryzenai", -1) : ("/model/layers.0/attn/qkv_proj/MatMulNBits/output_0": tensor(float16),"past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qweight": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.scales": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qzeros": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.qweight.packed": tensor(uint8),) -> ("present.0.key": tensor(float16),"present.0.value": tensor(float16),"/model/layers.0/attn/o_proj/MatMulNBits/output_0": tensor(float16),) , Error Node(gqo_3_0) with schema(com.ryzenai::GQO:1) has input size 11 not in range [min=16, max=17].Getting error with Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid model as well
(ryzen-ai-1.6.0) PS C:\Users\User\bconsolv\llm_run> .\model_benchmark.exe -i Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid -f amd_genai_prompt.txt -l "1024"
Exception: Load model from Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid\Phi-3-mini-4k_jit.onnx failed:This is an invalid model. In Node, ("gqo_3_0", GQO, "com.ryzenai", -1) : ("/model/layers.0/attn/qkv_proj/MatMulNBits/output_0": tensor(float16),"past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qweight": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.scales": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qzeros": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.qweight.packed": tensor(uint8),) -> ("present.0.key": tensor(float16),"present.0.value": tensor(float16),"/model/layers.0/attn/o_proj/MatMulNBits/output_0": tensor(float16),) , Error Node(gqo_3_0) with schema(com.ryzenai::GQO:1) has input size 11 not in range [min=16, max=17].
But this model works! Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix
(ryzen-ai-1.6.0) PS C:\Users\User\bconsolv\llm_run>.\model_benchmark.exe -i Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix -f .\amd_genai_prompt.txt -l "1024"
...
20251002 11:53:23.381127 20248 vitisai_compile_model.cpp:1281] present.31.key : [-1x8x-1x128]
I20251002 11:53:23.381127 20248 vitisai_compile_model.cpp:1281] present.31.value : [-1x8x-1x128]
[Vitis AI EP] No. of Operators : CPU 41 NPU 227
[Vitis AI EP] No. of Subgraphs : NPU 65 NPU 32 Actually running on NPU 97
Prompt Number of Tokens: 1024
Batch size: 1, prompt tokens: 1024, tokens to generate: 128
Prompt processing (time to first token):
avg (us): 1.88254e+06
avg (tokens/s): 543.947
p50 (us): 1.88269e+06
stddev (us): 2648.22
n: 5 * 1024 token(s)
Token generation:
avg (us): 129720
avg (tokens/s): 7.7089
p50 (us): 129866
stddev (us): 986.988
n: 635 * 1 token(s)
Token sampling:
avg (us): 5.84
avg (tokens/s): 171233
p50 (us): 6
stddev (us): 0.403733
n: 5 * 1 token(s)
E2E generation (entire generation loop):
avg (ms): 18357.1
p50 (ms): 18356.7
stddev (ms): 8.66523
n: 5
Peak working set size (bytes): 7160365056I tried this path, but it also failed: amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid
(ryzen-ai-1.6.0) PS C:\Users\User\bconsolv\llm_run> .\model_benchmark.exe -i Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid -f .\amd_genai_prompt.txt -l "1024"
Exception: Load model from Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid\Mistral-7B-Instruct-v0.3_jit.onnx failed:This is an invalid model. In Node, ("gqo_3_0", GQO, "com.ryzenai", -1) : ("/model/layers.0/attn/qkv_proj/MatMulNBits/output_0": tensor(float16),"past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qweight": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.scales": tensor(float16),"model.layers.0.attn.o_proj.MatMulNBits.qzeros": tensor(uint8),"model.layers.0.attn.o_proj.MatMulNBits.qweight.packed": tensor(uint8),) -> ("present.0.key": tensor(float16),"present.0.value": tensor(float16),"/model/layers.0/attn/o_proj/MatMulNBits/output_0": tensor(float16),) , Error Node(gqo_3_0) with schema(com.ryzenai::GQO:1) has input size 11 not in range [min=16, max=17].