KernelPort

KernelPort is a high-performance inference server that lets you run models from TensorFlow, PyTorch, ONNX, and TensorRT without giving up their native GPU kernels.

Bring your kernels with you. Serve them from one runtime.

Why KernelPort?

Most inference servers force you into a single runtime or a lowest-common-denominator format. KernelPort takes a different approach:

Native execution: TensorFlow runs with TensorFlow kernels. PyTorch runs with PyTorch kernels.
GPU-first design: Built for CUDA streams, batching, and memory reuse.
One server, many backends: ONNX Runtime, TensorRT, LibTorch, TensorFlow.
Production-grade: Dynamic batching, scheduling, backpressure, observability.

KernelPort is written in Rust for safety, performance, and predictable latency.

What KernelPort Is (and Is Not)

KernelPort is:

An inference orchestrator with pluggable execution backends
A way to consolidate heterogeneous model stacks into one serving plane
A system designed for low-latency, high-throughput GPU inference

KernelPort is not:

A training framework
A model compiler
A replacement for TensorFlow or PyTorch

Core Concepts

Native Backend Execution

KernelPort does not reimplement ML frameworks. Instead, it embeds and orchestrates their production runtimes:

Backend	Execution
PyTorch	LibTorch (TorchScript / compiled artifacts)
TensorFlow	TensorFlow C API (SavedModel)
ONNX	ONNX Runtime (CUDA / TensorRT EP)
TensorRT	Native TensorRT engines

This ensures you get:

Vendor-optimized kernels
Framework correctness
No silent performance regressions

Dynamic Batching

Requests are dynamically batched per model and shape:

Configurable max batch size
Tight batching windows (ms-level)
Shape-aware grouping

This improves GPU utilization without sacrificing latency SLAs.

GPU-Aware Scheduling

KernelPort schedules work across:

GPUs (or MIG slices)
CUDA streams
Backend-specific execution contexts

The scheduler can be tuned for:

Throughput
Latency
Fairness across models or tenants

Architecture (High Level)

Local Development

CPU (macOS or Linux)

Install Rust components: rustup component add rustfmt clippy
Run checks: make check
Run the server:
- cargo run -p kernelport-server -- serve --device cpu
ORT identity test (CPU): cargo test -p kernelport-backend-ort --test identity

CUDA (Linux + NVIDIA)

KernelPort supports CUDA via ONNX Runtime. For GPU inference:

Provide a CUDA-enabled ONNX Runtime shared library (build from source or use a CUDA-enabled package).
Set ORT_DYLIB_PATH to the CUDA-enabled libonnxruntime.so.
Build with the CUDA feature and select a device:
- cargo run -p kernelport-server --features ort-cuda -- serve --device cuda:0

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
crates		crates
docs		docs
models		models
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KernelPort

Why KernelPort?

What KernelPort Is (and Is Not)

Core Concepts

Native Backend Execution

Dynamic Batching

GPU-Aware Scheduling

Architecture (High Level)

Local Development

CPU (macOS or Linux)

CUDA (Linux + NVIDIA)

About

Uh oh!

Releases

Packages

Languages

License

beatgeek/kernelport

Folders and files

Latest commit

History

Repository files navigation

KernelPort

Why KernelPort?

What KernelPort Is (and Is Not)

Core Concepts

Native Backend Execution

Dynamic Batching

GPU-Aware Scheduling

Architecture (High Level)

Local Development

CPU (macOS or Linux)

CUDA (Linux + NVIDIA)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages