GitHub - m96-chan/NxPU: WGSL → NPU transpiler. Compiles WebGPU shading language to native NPU formats (CoreML, ONNX, TFLite, StableHLO).

    _   __     ____  __  __
   / | / /  __/ __ \/ / / /
  /  |/ / |/_/ /_/ / / / /
 / /|  />  </ ____/ /_/ /
/_/ |_/_/|_/_/    \____/

WGSL → NPU transpiler for multi-vendor neural processing units

Write ML kernels once in WGSL. Deploy to any NPU.

Architecture · Changelog · Contributing

Why NxPU?

NPU hardware is fragmented — every vendor ships a different SDK, model format, and toolchain. NxPU solves this by providing a single compilation pipeline from WGSL compute shaders to native NPU formats.

One language, many targets — Write WGSL once, emit ONNX, TFLite, CoreML, StableHLO, or vendor-specific formats
Pattern recognition — Automatically classifies compute kernels into MatMul, Conv2D, Attention, and 10+ other ML operations
Optimization passes — Constant folding, FMA fusion, dead code elimination, common subexpression elimination, and quantization
Vendor-aware validation — Operator support matrices for 8 NPU vendors with native/emulated/unsupported classification
Pluggable backends — Add new NPU targets by implementing a single trait

Quick Start

cargo install --path crates/nxpu-cli

# Transpile WGSL → ONNX
nxpu examples/vecadd.wgsl --target onnx -o vecadd.onnx

# Transpile → TFLite with int8 quantization
nxpu examples/matmul.wgsl --target tflite --precision int8 -o matmul.tflite

# Dump the intermediate representation
nxpu examples/relu.wgsl --target ir-dump

# List all available backends
nxpu --list-targets

Example

Input — examples/vecadd.wgsl Output

@group(0) @binding(0) var<storage, read> a: array<f32>;
@group(0) @binding(1) var<storage, read> b: array<f32>;
@group(0) @binding(2) var<storage, read_write> c: array<f32>;

@compute @workgroup_size(256)
fn main(@builtin(global_invocation_id) gid: vec3<u32>) {
  let idx = gid.x;
  c[idx] = a[idx] + b[idx];
}

$ nxpu examples/vecadd.wgsl --target onnx -o vecadd.onnx
Info: entry point 'main': classified as Add

The output vecadd.onnx can be loaded directly into any ONNX runtime.

Supported Backends

Target	Aliases	Format	Status
`onnx`	—	`.onnx` (protobuf)	✅
`tflite`	`litert`	`.tflite` (FlatBuffers)	✅
`coreml`	`apple-ane`	`.mlmodel`	✅
`stablehlo`	`xla`	`.mlir` (text)	✅
`ir-dump`	`ir`	Text (stdout)	✅
`intel-npu`	`openvino`	OpenVINO IR `.xml` + `.onnx`	✅
`amd-xdna`	`amd-npu`	ONNX + XDNA metadata	✅
`arm-ethos`	`ethos-u`	TFLite + optional Vela	✅
`samsung`	`exynos`	ONNX + ONE toolchain hints	✅
`qualcomm`	`hexagon-npu`	ONNX + QNN SDK hints	✅
`mediatek`	`neuropilot`	TFLite + NeuroPilot hints	✅
`rockchip`	`rknn`	ONNX + RKNN Toolkit hints	✅
`ceva`	`neupro`	ONNX + CDNN compiler hints	✅

Each vendor backend includes an operator support matrix that validates patterns against the target NPU's capabilities, emitting warnings for emulated or unsupported operations.

Vendor backend details

Vendor	NPU Hardware	Native Precision	Output Format	SDK Toolchain
Intel	Meteor Lake / Arrow Lake NPU	F16	OpenVINO IR v11 (`.xml` + `.bin`) + ONNX fallback	OpenVINO (`ov::Core::read_model`)
AMD	Ryzen AI XDNA	Int8, F16	ONNX with XDNA metadata props	Vitis AI EP / ONNX Runtime
Arm	Ethos-U55 (128 MAC) / U65 (512 MAC)	Int8 (U55), Int8+Int16 (U65)	TFLite + optional Vela compilation	`ethos-u-vela` compiler
Samsung	Exynos NPU	F16, Int8	ONNX	ONE toolchain (`one-import-onnx`, `one-codegen`)
Qualcomm	Hexagon NPU	Int8, F16	ONNX	QNN SDK (`qnn-onnx-converter`)
MediaTek	Dimensity APU	Int8, F16	TFLite	NeuroPilot SDK (`ncc-tflite`)
Rockchip	RK3588 NPU (3 TOPS)	Int8, F16	ONNX	RKNN Toolkit 2 (Python API)
CEVA	NeuPro-S	Int8	ONNX	CDNN compiler (`cdnn_cli`)

Recognized ML Patterns

NxPU analyzes WGSL compute kernels and classifies them into ML operations:

Category	Operations
Linear Algebra	MatMul, element-wise Add / Sub / Mul
Convolution	Conv2D
Pooling	MaxPool
Activation	ReLU, Tanh, Sigmoid
Normalization	BatchNorm
Reduction	ReduceSum
Tensor Ops	Transpose, Reshape, Concat, Split
Attention	Scaled dot-product attention

Architecture

                         ┌─────────────────────────────────────────────┐
                         │             Optimization Passes             │
                         │  ┌───────┐ ┌───────┐ ┌───────┐ ┌────────┐  │
                         │  │ Const │ │  FMA  │ │ DCE / │ │Quantize│  │
                         │  │ Fold  │ │Fusion │ │  CSE  │ │        │  │
                         │  └───────┘ └───────┘ └───────┘ └────────┘  │
                         └──────────────────┬──────────────────────────┘
                                            │
  ┌──────────┐     ┌──────────┐     ┌───────┴──┐     ┌──────────────┐
  │   WGSL   │────>│  Parser  │────>│  SSA IR  │────>│   Backend    │
  │  Source   │     │  (naga)  │     │          │     │   Emitter    │
  └──────────┘     └──────────┘     └──────────┘     └──────┬───────┘
                                                            │
                                          ┌─────────────────┼─────────────────┐
                                          │                 │                 │
                                     ┌────┴───┐       ┌─────┴────┐      ┌────┴────┐
                                     │  ONNX  │       │  TFLite  │      │ CoreML  │
                                     │  HLO   │       │          │      │ Vendors │
                                     └────────┘       └──────────┘      └─────────┘

Project Structure

crates/
├── nxpu-parser/              WGSL parsing via naga, lowering to NxPU IR
├── nxpu-ir/                  Arena-based SSA intermediate representation
├── nxpu-opt/                 Optimization passes (const fold, FMA, DCE, quantize)
├── nxpu-analysis/            Pattern classification and fusion
├── nxpu-backend-core/        Backend trait, plugin registry, IR dump
├── nxpu-backend-onnx/        ONNX protobuf emitter
├── nxpu-backend-tflite/      TFLite FlatBuffers emitter
├── nxpu-backend-coreml/      CoreML emitter
├── nxpu-backend-stablehlo/   StableHLO MLIR emitter
├── nxpu-backend-*/           Vendor-specific backends (8 vendors)
├── nxpu-cli/                 Command-line interface
└── nxpu-e2e-tests/           End-to-end numerical correctness tests
examples/                     WGSL sample kernels (14 examples)
docs/                         Architecture and contributor guides

CLI Reference

nxpu [OPTIONS] <INPUT>

Arguments:
  <INPUT>                  Input WGSL file

Options:
  -t, --target <TARGET>    Target backend [default: ir-dump]
  -o, --output <OUTPUT>    Output file path (default: stdout)
      --opt-level <N>      Optimization level: 0, 1, or 2 [default: 1]
      --precision <MODE>   Precision: keep, f16, bf16, int8, auto [default: auto]
      --emit-ir            Dump IR to stderr before backend compilation
      --dry-run            Validate and optimize without output
      --list-targets       List available backends and exit
  -h, --help               Print help
  -V, --version            Print version

Building from Source

Prerequisites: Rust 1.87+ (edition 2024)

cargo build            # Build all crates
cargo test             # Run all tests
cargo clippy           # Lint
cargo fmt --check      # Check formatting

To build with specific backends only:

cargo build -p nxpu-cli --no-default-features --features backend-onnx,backend-tflite

Available feature flags

backend-onnx · backend-tflite · backend-coreml · backend-stablehlo · backend-samsung · backend-mediatek · backend-intel · backend-amd · backend-qualcomm · backend-arm-ethos · backend-ceva · backend-rockchip

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines and docs/adding-a-backend.md for backend implementation guides.

License

Licensed under either of

at your option.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.github		.github
crates		crates
docs		docs
examples		examples
fuzz		fuzz
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
SECURITY.md		SECURITY.md
codecov.yml		codecov.yml
deny.toml		deny.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why NxPU?

Quick Start

Example

Supported Backends

Recognized ML Patterns

Architecture

Project Structure

CLI Reference

Building from Source

Contributing

License

About

Licenses found

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Why NxPU?

Quick Start

Example

Supported Backends

Recognized ML Patterns

Architecture

Project Structure

CLI Reference

Building from Source

Contributing

License

About

Topics

Resources

License

Licenses found

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages