Skip to content

BERT model hangs during VitisAI compilation on Ryzen AI 9 HX 370 #312

@cornerstone-report

Description

@cornerstone-report

Environment

  • Hardware: Lenovo Yoga Slim 7x (Ryzen AI 9 HX 370, 50 TOPS NPU)
  • Software:
    • RyzenAI 1.6.0 (via miniforge conda environment)
    • ONNX Runtime 1.20.1 development build with VitisAI EP
    • Windows 11 24H2 (Build 26100)
    • Python 3.12.8
  • Model: BERT-base-uncased (bert-base-uncased) quantized to INT8 and BF16

Issue Description

VitisAI execution provider hangs indefinitely during model compilation for transformer architectures (BERT), while CNN models (e.g., ResNet50) compile and run successfully on the NPU.

Reproduction Steps

  1. Export BERT to ONNX with opset 17 and dynamic axes

  2. Quantize to INT8 or BF16

  3. Load with VitisAI provider:

  4. Result: Process hangs indefinitely with no log output, requiring manual termination.

Expected Behavior

  • Model compiles to NPU (AMD documentation states Ryzen AI v1.6 supports transformer models)
  • OR: Graceful fallback to CPU with warning message
  • OR: Compilation timeout with diagnostic information

Actual Behavior

  • Process hangs during creation
  • No logs generated (even with )
  • No timeout - requires manual kill
  • CPU provider works fine when used alone

Additional Context

Successful Test: ResNet50 compiles and runs on NPU without issues, confirming hardware/driver setup is correct.

BF16 Test Results (08DEC25):

  • Tested BF16 quantization per AMD documentation recommendations
  • Model: BERT-base-uncased, exported to ONNX opset 17, quantized to BF16 (98KB)
  • Environment: Windows 11, RyzenAI 1.6.0, ONNX Runtime with VitisAI EP
  • Test:
  • Result: Process hangs indefinitely at creation, no output after "Loading model..."
  • VitisAI provider is available and detected, but compilation hangs

Conclusion: BF16 quantization does NOT resolve the hang. This confirms the issue is not quantization-specific but rather a fundamental VitisAI compilation problem with transformer architectures. Both INT8 and BF16 quantization formats exhibit identical hanging behavior.

Timeout Test (08DEC25):

  • Tested with 60-second timeout wrapper
  • Confirmed: This is a HANG, not slow compilation
  • Process hangs indefinitely during compilation with no progress output

Questions for AMD Team

  1. Does RyzenAI 1.6.0 support BF16 quantization for transformers? Recommended workflow?
  2. Is there a configuration option to set compilation timeout?
  3. What transformer operators are currently supported? Compatibility matrix?
  4. Timeline for production-ready transformer support?
  5. Verbose logging flags or diagnostic tools available?

Impact

This blocks adoption of Ryzen AI for CORNERSTONE, a privacy-first news aggregator using local NER and clustering (446+ articles, 3,120+ entities).

Willingness to Contribute

  • Can provide detailed testing across multiple quantization strategies
  • Willing to help debug with AMD engineering team
  • Can contribute documentation/examples for transformer deployment once working
  • Considering contributing patches to ONNX Runtime if root cause identified

Related Issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions