_ __ ____ __ __
/ | / / __/ __ \/ / / /
/ |/ / |/_/ /_/ / / / /
/ /| /> </ ____/ /_/ /
/_/ |_/_/|_/_/ \____/
WGSL → NPU transpiler for multi-vendor neural processing units
Write ML kernels once in WGSL. Deploy to any NPU.
NPU hardware is fragmented — every vendor ships a different SDK, model format, and toolchain. NxPU solves this by providing a single compilation pipeline from WGSL compute shaders to native NPU formats.
- One language, many targets — Write WGSL once, emit ONNX, TFLite, CoreML, StableHLO, or vendor-specific formats
- Pattern recognition — Automatically classifies compute kernels into MatMul, Conv2D, Attention, and 10+ other ML operations
- Optimization passes — Constant folding, FMA fusion, dead code elimination, common subexpression elimination, and quantization
- Vendor-aware validation — Operator support matrices for 8 NPU vendors with native/emulated/unsupported classification
- Pluggable backends — Add new NPU targets by implementing a single trait
cargo install --path crates/nxpu-cli# Transpile WGSL → ONNX
nxpu examples/vecadd.wgsl --target onnx -o vecadd.onnx
# Transpile → TFLite with int8 quantization
nxpu examples/matmul.wgsl --target tflite --precision int8 -o matmul.tflite
# Dump the intermediate representation
nxpu examples/relu.wgsl --target ir-dump
# List all available backends
nxpu --list-targets Input — examples/vecadd.wgsl |
Output |
@group(0) @binding(0) var<storage, read> a: array<f32>;
@group(0) @binding(1) var<storage, read> b: array<f32>;
@group(0) @binding(2) var<storage, read_write> c: array<f32>;
@compute @workgroup_size(256)
fn main(@builtin(global_invocation_id) gid: vec3<u32>) {
let idx = gid.x;
c[idx] = a[idx] + b[idx];
} |
$ nxpu examples/vecadd.wgsl --target onnx -o vecadd.onnx
Info: entry point 'main': classified as AddThe output |
| Target | Aliases | Format | Status |
|---|---|---|---|
onnx |
— | .onnx (protobuf) |
✅ |
tflite |
litert |
.tflite (FlatBuffers) |
✅ |
coreml |
apple-ane |
.mlmodel |
✅ |
stablehlo |
xla |
.mlir (text) |
✅ |
ir-dump |
ir |
Text (stdout) | ✅ |
intel-npu |
openvino |
OpenVINO IR .xml + .onnx |
✅ |
amd-xdna |
amd-npu |
ONNX + XDNA metadata | ✅ |
arm-ethos |
ethos-u |
TFLite + optional Vela | ✅ |
samsung |
exynos |
ONNX + ONE toolchain hints | ✅ |
qualcomm |
hexagon-npu |
ONNX + QNN SDK hints | ✅ |
mediatek |
neuropilot |
TFLite + NeuroPilot hints | ✅ |
rockchip |
rknn |
ONNX + RKNN Toolkit hints | ✅ |
ceva |
neupro |
ONNX + CDNN compiler hints | ✅ |
Each vendor backend includes an operator support matrix that validates patterns against the target NPU's capabilities, emitting warnings for emulated or unsupported operations.
Vendor backend details
| Vendor | NPU Hardware | Native Precision | Output Format | SDK Toolchain |
|---|---|---|---|---|
| Intel | Meteor Lake / Arrow Lake NPU | F16 | OpenVINO IR v11 (.xml + .bin) + ONNX fallback |
OpenVINO (ov::Core::read_model) |
| AMD | Ryzen AI XDNA | Int8, F16 | ONNX with XDNA metadata props | Vitis AI EP / ONNX Runtime |
| Arm | Ethos-U55 (128 MAC) / U65 (512 MAC) | Int8 (U55), Int8+Int16 (U65) | TFLite + optional Vela compilation | ethos-u-vela compiler |
| Samsung | Exynos NPU | F16, Int8 | ONNX | ONE toolchain (one-import-onnx, one-codegen) |
| Qualcomm | Hexagon NPU | Int8, F16 | ONNX | QNN SDK (qnn-onnx-converter) |
| MediaTek | Dimensity APU | Int8, F16 | TFLite | NeuroPilot SDK (ncc-tflite) |
| Rockchip | RK3588 NPU (3 TOPS) | Int8, F16 | ONNX | RKNN Toolkit 2 (Python API) |
| CEVA | NeuPro-S | Int8 | ONNX | CDNN compiler (cdnn_cli) |
NxPU analyzes WGSL compute kernels and classifies them into ML operations:
| Category | Operations |
|---|---|
| Linear Algebra | MatMul, element-wise Add / Sub / Mul |
| Convolution | Conv2D |
| Pooling | MaxPool |
| Activation | ReLU, Tanh, Sigmoid |
| Normalization | BatchNorm |
| Reduction | ReduceSum |
| Tensor Ops | Transpose, Reshape, Concat, Split |
| Attention | Scaled dot-product attention |
┌─────────────────────────────────────────────┐
│ Optimization Passes │
│ ┌───────┐ ┌───────┐ ┌───────┐ ┌────────┐ │
│ │ Const │ │ FMA │ │ DCE / │ │Quantize│ │
│ │ Fold │ │Fusion │ │ CSE │ │ │ │
│ └───────┘ └───────┘ └───────┘ └────────┘ │
└──────────────────┬──────────────────────────┘
│
┌──────────┐ ┌──────────┐ ┌───────┴──┐ ┌──────────────┐
│ WGSL │────>│ Parser │────>│ SSA IR │────>│ Backend │
│ Source │ │ (naga) │ │ │ │ Emitter │
└──────────┘ └──────────┘ └──────────┘ └──────┬───────┘
│
┌─────────────────┼─────────────────┐
│ │ │
┌────┴───┐ ┌─────┴────┐ ┌────┴────┐
│ ONNX │ │ TFLite │ │ CoreML │
│ HLO │ │ │ │ Vendors │
└────────┘ └──────────┘ └─────────┘
crates/
├── nxpu-parser/ WGSL parsing via naga, lowering to NxPU IR
├── nxpu-ir/ Arena-based SSA intermediate representation
├── nxpu-opt/ Optimization passes (const fold, FMA, DCE, quantize)
├── nxpu-analysis/ Pattern classification and fusion
├── nxpu-backend-core/ Backend trait, plugin registry, IR dump
├── nxpu-backend-onnx/ ONNX protobuf emitter
├── nxpu-backend-tflite/ TFLite FlatBuffers emitter
├── nxpu-backend-coreml/ CoreML emitter
├── nxpu-backend-stablehlo/ StableHLO MLIR emitter
├── nxpu-backend-*/ Vendor-specific backends (8 vendors)
├── nxpu-cli/ Command-line interface
└── nxpu-e2e-tests/ End-to-end numerical correctness tests
examples/ WGSL sample kernels (14 examples)
docs/ Architecture and contributor guides
nxpu [OPTIONS] <INPUT>
Arguments:
<INPUT> Input WGSL file
Options:
-t, --target <TARGET> Target backend [default: ir-dump]
-o, --output <OUTPUT> Output file path (default: stdout)
--opt-level <N> Optimization level: 0, 1, or 2 [default: 1]
--precision <MODE> Precision: keep, f16, bf16, int8, auto [default: auto]
--emit-ir Dump IR to stderr before backend compilation
--dry-run Validate and optimize without output
--list-targets List available backends and exit
-h, --help Print help
-V, --version Print version
Prerequisites: Rust 1.87+ (edition 2024)
cargo build # Build all crates
cargo test # Run all tests
cargo clippy # Lint
cargo fmt --check # Check formattingTo build with specific backends only:
cargo build -p nxpu-cli --no-default-features --features backend-onnx,backend-tfliteAvailable feature flags
backend-onnx · backend-tflite · backend-coreml · backend-stablehlo · backend-samsung · backend-mediatek · backend-intel · backend-amd · backend-qualcomm · backend-arm-ethos · backend-ceva · backend-rockchip
Contributions are welcome! See CONTRIBUTING.md for guidelines and docs/adding-a-backend.md for backend implementation guides.
Licensed under either of
at your option.