BERT model hangs during VitisAI compilation on Ryzen AI 9 HX 370

## Environment
- **Hardware**: Lenovo Yoga Slim 7x (Ryzen AI 9 HX 370, 50 TOPS NPU)
- **Software**: 
  - RyzenAI 1.6.0 (via miniforge conda environment)
  - ONNX Runtime 1.20.1 development build with VitisAI EP
  - Windows 11 24H2 (Build 26100)
  - Python 3.12.8
- **Model**: BERT-base-uncased (bert-base-uncased) quantized to INT8 and BF16

## Issue Description
VitisAI execution provider hangs indefinitely during model compilation for transformer architectures (BERT), while CNN models (e.g., ResNet50) compile and run successfully on the NPU.

## Reproduction Steps
1. Export BERT to ONNX with opset 17 and dynamic axes
2. Quantize to INT8 or BF16
3. Load with VitisAI provider:
   
4. **Result**: Process hangs indefinitely with no log output, requiring manual termination.

## Expected Behavior
- Model compiles to NPU (AMD documentation states Ryzen AI v1.6 supports transformer models)
- OR: Graceful fallback to CPU with warning message
- OR: Compilation timeout with diagnostic information

## Actual Behavior
- Process hangs during  creation
- No logs generated (even with )
- No timeout - requires manual kill
- CPU provider works fine when used alone

## Additional Context

**Successful Test**: ResNet50 compiles and runs on NPU without issues, confirming hardware/driver setup is correct.

**BF16 Test Results (08DEC25)**:
- Tested BF16 quantization per AMD documentation recommendations
- Model: BERT-base-uncased, exported to ONNX opset 17, quantized to BF16 (98KB)
- Environment: Windows 11, RyzenAI 1.6.0, ONNX Runtime with VitisAI EP
- Test: 
- Result: Process hangs indefinitely at  creation, no output after "Loading model..."
- VitisAI provider is available and detected, but compilation hangs

**Conclusion**: BF16 quantization does NOT resolve the hang. This confirms the issue is not quantization-specific but rather a fundamental VitisAI compilation problem with transformer architectures. Both INT8 and BF16 quantization formats exhibit identical hanging behavior.

**Timeout Test (08DEC25)**:
- Tested with 60-second timeout wrapper
- Confirmed: This is a HANG, not slow compilation
- Process hangs indefinitely during compilation with no progress output

## Questions for AMD Team
1. Does RyzenAI 1.6.0 support BF16 quantization for transformers? Recommended workflow?
2. Is there a configuration option to set compilation timeout?
3. What transformer operators are currently supported? Compatibility matrix?
4. Timeline for production-ready transformer support?
5. Verbose logging flags or diagnostic tools available?

## Impact
This blocks adoption of Ryzen AI for CORNERSTONE, a privacy-first news aggregator using local NER and clustering (446+ articles, 3,120+ entities).

## Willingness to Contribute
- Can provide detailed testing across multiple quantization strategies
- Willing to help debug with AMD engineering team
- Can contribute documentation/examples for transformer deployment once working
- Considering contributing patches to ONNX Runtime if root cause identified

## Related Issues
- microsoft/onnxruntime#92 - VitisAI transformer support discussion
- AMD documentation: https://riallto.ai/ryzen-ai-documentation.html (BF16 recommendations)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BERT model hangs during VitisAI compilation on Ryzen AI 9 HX 370 #312

Environment

Issue Description

Reproduction Steps

Expected Behavior

Actual Behavior

Additional Context

Questions for AMD Team

Impact

Willingness to Contribute

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BERT model hangs during VitisAI compilation on Ryzen AI 9 HX 370 #312

Description

Environment

Issue Description

Reproduction Steps

Expected Behavior

Actual Behavior

Additional Context

Questions for AMD Team

Impact

Willingness to Contribute

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions