ADI plugin for local LLM inference on Apple Silicon using the Uzu engine.
- 🚀 Apple Silicon Optimized: ~35 tokens/sec on M2 (Llama-3.2-1B)
- 🔒 100% Local: No network, fully offline inference
- 📦 Pre-built Binaries: No build tools required for users
- ⚡ Fast Installation:
adi plugin install adi.llm.uzu - 🎯 Simple API: CLI and programmatic access
For Users (recommended):
# Install pre-built binary from plugin registry
adi plugin install adi.llm.uzuFor Developers:
# Requirements: Metal Toolchain
xcodebuild -downloadComponent MetalToolchain
# Build plugin
cargo build --release
# Install locally
adi plugin install --local target/release/libadi_llm_uzu_plugin.dylibadi llm-uzu load models/llama-3.2-1b.ggufadi llm-uzu generate models/llama-3.2-1b.gguf "Explain Rust ownership"adi llm-uzu listadi llm-uzu info models/llama-3.2-1b.ggufadi llm-uzu unload models/llama-3.2-1b.ggufUse the inference service from other plugins or applications:
// Register service dependency in plugin.toml
[[requires]]
id = "adi.llm.inference"
version = "^1.0.0"
// Call from your code
let args = json!({
"model_path": "models/llama-3.2-1b.gguf",
"prompt": "Hello, world!",
"max_tokens": 128,
"temperature": 0.7
});
let result = service.invoke("generate", &args)?;Download GGUF models from:
Recommended models:
- Llama 3.2 1B/3B - Fast, general purpose
- Qwen 2.5 1B/3B - Multilingual
- Gemma 2B - Efficient, high quality
- macOS with Apple Silicon (M1/M2/M3+)
- Model files in GGUF format
| Model | Apple M2 (tokens/sec) |
|---|---|
| Llama-3.2-1B | ~35 |
| Qwen-2.5-1B | ~33 |
| Gemma-2B | ~28 |
vs OpenAI/Anthropic:
- ✅ Free (no API costs)
- ✅ Private (100% local)
- ✅ Fast (no network latency)
- ❌ Smaller models (less capable)
vs lib-client-ollama:
- ✅ Faster on Apple Silicon
- ✅ Lower overhead (no server)
- ❌ macOS only
- ❌ Fewer features
Install from registry:
adi plugin install adi.llm.uzuCheck model file exists:
ls -lh models/llama-3.2-1b.ggufEnsure:
- You're on Apple Silicon (M1/M2/M3)
- Model is GGUF format
- Model fits in memory
MIT
Contributions welcome! Open an issue or PR on GitHub.
- Uzu - Inference engine
- lib-client-uzu - Rust client
- ADI - Agent development infrastructure