On-device AI for mobile, desktop, edge.
Run speech, language, and vision models locally — private, offline, fast.
Perfect for any app including games 🕹️
Documentation · SDKs · Models · Join Discord · Issues
Xybrid is a Rust-powered runtime with native bindings for every major platform. Pick your SDK:
| SDK | Platforms | Install | Status | Sample |
|---|---|---|---|---|
| Flutter | iOS, Android, macOS, Linux, Windows | pub.dev | Available | README |
| Unity | macOS, Windows, Linux | See below | Available | Unity 3D AI tavern |
| Swift | iOS, macOS | Swift Package Manager | Coming Soon | README |
| Kotlin | Android | Maven Central | Available | README |
| CLI | macOS, Linux, Windows | Download binary | Available | — |
| Rust | All | xybrid-core / xybrid-sdk |
Available | — |
Every SDK wraps the same Rust core — identical model support and behavior across all platforms.
Unity — Package Manager → Add from git URL:
https://github.com/xybrid-ai/xybrid.git?path=bindings/unityFlutter — add to your pubspec.yaml:
dependencies:
xybrid_flutter: ^0.1.0Kotlin (Android) — add to your build.gradle.kts:
dependencies {
implementation("ai.xybrid:xybrid-kotlin:0.1.0-alpha7")
}See each SDK's README for platform-specific setup: Flutter · Unity · Swift · Kotlin · Rust
Run a model in one line from the CLI, or three lines from any SDK:
CLI:
xybrid run kokoro-82m --input "Hello world" -o output.wavFlutter:
final model = await Xybrid.model(modelId: 'kokoro-82m').load();
final result = await model.run(envelope: Envelope.text(text: 'Hello world'));
// result → 24kHz WAV audioKotlin:
val model = Xybrid.model(modelId = "kokoro-82m").load()
val result = model.run(envelope = XybridEnvelope.Text("Hello world"))
// result → 24kHz WAV audioSwift:
let model = try Xybrid.model(modelId: "kokoro-82m").load()
let result = try model.run(envelope: .text("Hello world"))
// result → 24kHz WAV audioUnity (C#):
var model = Xybrid.Model(modelId: "kokoro-82m").Load();
var result = model.Run(envelope: Envelope.Text("Hello world"));
// result → 24kHz WAV audioRust:
let model = Xybrid::model("kokoro-82m").load()?;
let result = model.run(&Envelope::text("Hello world"))?;
// result → 24kHz WAV audioChain models together — build a voice assistant in 3 lines of YAML:
# voice-assistant.yaml
name: voice-assistant
stages:
- model: whisper-tiny # Speech → text
- model: qwen2.5-0.5b # Process with LLM
- model: kokoro-82m # Text → speechCLI:
xybrid run voice-assistant.yaml --input question.wav -o response.wavFlutter:
final pipeline = await Xybrid.pipeline(yamlContent: yamlString).load();
await pipeline.loadModels();
final result = await pipeline.run(envelope: Envelope.audio(bytes: audioBytes));Kotlin:
val pipeline = Xybrid.pipeline(yamlContent = yamlString).load()
pipeline.loadModels()
val result = pipeline.run(envelope = XybridEnvelope.Audio(bytes = audioBytes))Swift:
let pipeline = try Xybrid.pipeline(yamlContent: yamlString).load()
try pipeline.loadModels()
let result = try pipeline.run(envelope: .audio(bytes: audioBytes))Unity (C#):
var pipeline = Xybrid.Pipeline(yamlContent: yamlString).Load();
pipeline.LoadModels();
var result = pipeline.Run(envelope: Envelope.Audio(bytes: audioBytes));Rust:
let pipeline = Xybrid::pipeline(&yaml_string).load()?;
pipeline.load_models()?;
let result = pipeline.run(&Envelope::audio(audio_bytes))?;All models run entirely on-device. No cloud, no API keys required. Browse the full registry with xybrid models list.
| Model | Params | Format | Description |
|---|---|---|---|
| Whisper Tiny | 39M | SafeTensors | Multilingual transcription (Candle runtime) |
| Wav2Vec2 Base | 95M | ONNX | English ASR with CTC decoding |
| Model | Params | Format | Description |
|---|---|---|---|
| Kokoro 82M | 82M | ONNX | High-quality, 24 natural voices |
| KittenTTS Nano | 15M | ONNX | Ultra-lightweight, 8 voices |
| Model | Params | Format | Description |
|---|---|---|---|
| Gemma 3 1B | 1B | GGUF Q4_K_M | Google's mobile-optimized LLM |
| Llama 3.2 1B | 1B | GGUF Q4_K_M | Meta's general purpose, 128K context |
| Qwen 2.5 0.5B | 500M | GGUF Q4_K_M | Compact on-device chat |
| SmolLM2 360M | 360M | GGUF Q4_K_M | Best tiny LLM, excellent quality/size ratio |
| Model | Type | Params | Priority | Status |
|---|---|---|---|---|
| Phi-4 Mini | LLM | 3.8B | P2 | Spec Ready (first multi-quant: Q4, Q8, FP16) |
| Qwen3 0.6B | LLM | 600M | P2 | Planned |
| Trinity Nano | LLM (MoE) | 6B (1B active) | P2 | Planned |
| LFM2 700M | LLM | 700M | P2 | Planned |
| Nomic Embed Text v1.5 | Embeddings | 137M | P1 | Blocked (needs Tokenize/MeanPool steps) |
| LFM2-VL 450M | Vision | 450M | P2 | Planned |
| Whisper Tiny CoreML | ASR | 39M | P2 | Planned |
| Qwen3-TTS 0.6B | TTS | 600M | P2 | Blocked (needs custom SafeTensors runtime) |
| Chatterbox Turbo | TTS | 350M | P3 | Blocked (needs ModelGraph template) |
| Capability | iOS | Android | macOS | Linux | Windows |
|---|---|---|---|---|---|
| Speech-to-Text | ✅ | ✅ | ✅ | ✅ | ✅ |
| Text-to-Speech | ✅ | ✅ | ✅ | ✅ | ✅ |
| Language Models | ✅ | ✅ | ✅ | ✅ | ✅ |
| Vision Models | ✅ | ✅ | ✅ | ✅ | ✅ |
| Embeddings | ✅ | ✅ | ✅ | ✅ | ✅ |
| Pipeline Orchestration | ✅ | ✅ | ✅ | ✅ | ✅ |
| Model Download & Caching | ✅ | ✅ | ✅ | ✅ | ✅ |
| Hardware Acceleration | Metal, ANE | CPU | Metal, ANE | CUDA | CUDA |
- Privacy first — All inference runs on-device. Your data never leaves the device.
- Offline capable — No internet required after initial model download.
- Cross-platform — One API across iOS, Android, macOS, Linux, and Windows.
- Pipeline orchestration — Chain models together (ASR → LLM → TTS) in a single call.
- Automatic optimization — Hardware acceleration on Apple Neural Engine, Metal, and CUDA.
We welcome contributions! See CONTRIBUTING.md for guidelines on setting up your development environment, submitting pull requests, and adding new models.
Apache License 2.0 — see LICENSE for details.


