lloyal.node provides prebuilt binaries for 13 platforms, covering 93% of production deployment scenarios with instant installation.
lloyal.node ships prebuilt binaries for the following platforms:
macOS (2 packages)
- Apple Silicon (arm64) with Metal GPU acceleration
- Intel (x64) CPU-only
Linux x64 (3 packages)
- CPU-only
- CUDA 12.6 (NVIDIA GPUs)
- Vulkan (AMD/Intel GPUs)
Linux ARM64 (3 packages)
- CPU-only (AWS Graviton, Raspberry Pi)
- CUDA 12.6 (NVIDIA Jetson devices)
- Vulkan (Qualcomm/AMD GPUs)
Windows x64 (3 packages)
- CPU-only
- CUDA 12.6 (NVIDIA GPUs)
- Vulkan (AMD/Intel GPUs)
Windows ARM64 (2 packages)
- CPU-only (Snapdragon X Elite, Surface Pro X)
- Vulkan (Qualcomm GPUs)
Automatic (Recommended)
npm automatically selects the correct prebuilt package for your platform:
npm install @lloyal-labs/lloyal.nodeIf a prebuilt binary is available, installation completes in seconds. Otherwise, lloyal.node builds from source automatically (requires C++ compiler and CMake).
Manual GPU Variant Selection
To force a specific GPU backend, install the platform package directly:
# Force CUDA on Linux
npm install @lloyal-labs/lloyal.node-linux-x64-cuda
# Force Vulkan on Windows
npm install @lloyal-labs/lloyal.node-win32-x64-vulkanOr set an environment variable before installation:
export LLOYAL_GPU=cuda
npm install @lloyal-labs/lloyal.nodeIf no prebuilt binary matches your platform, lloyal.node builds from source using cmake-js.
Requirements:
- Node.js 22+ (LTS)
- C++20 compiler (GCC 9+, Clang 10+, MSVC 2019+)
- CMake 3.18+
Build commands:
# CPU-only build
npm run build
# GPU builds (set LLOYAL_GPU environment variable)
LLOYAL_GPU=cuda npm run build # NVIDIA CUDA
LLOYAL_GPU=vulkan npm run build # Vulkan (AMD/Intel/NVIDIA)
LLOYAL_GPU=metal npm run build # Metal (macOS only)Build time: 5-15 minutes (one-time)
Enabled automatically on Apple Silicon. No additional setup required.
const { createContext } = require('@lloyal-labs/lloyal.node');
const ctx = await createContext({
modelPath: './model.gguf',
// Metal GPU acceleration used automatically on Apple Silicon
});Requires NVIDIA GPU with compute capability 6.0+ and CUDA 12.6 runtime.
Linux/Windows:
npm install @lloyal-labs/lloyal.node-linux-x64-cuda
# or
npm install @lloyal-labs/lloyal.node-win32-x64-cudaJetson (ARM64):
npm install @lloyal-labs/lloyal.node-linux-arm64-cudaWorks with AMD, Intel, NVIDIA, and Qualcomm GPUs. Requires Vulkan 1.3+ drivers.
npm install @lloyal-labs/lloyal.node-linux-x64-vulkan
# or
npm install @lloyal-labs/lloyal.node-win32-x64-vulkanNo GPU acceleration. Works on all platforms.
npm install @lloyal-labs/lloyal.node-darwin-x64 # macOS Intel
npm install @lloyal-labs/lloyal.node-linux-x64 # Linux x64
npm install @lloyal-labs/lloyal.node-win32-x64 # Windows x64lloyal.node is a meta-package with optional dependencies on all platform packages:
{
"name": "@lloyal-labs/lloyal.node",
"optionalDependencies": {
"@lloyal-labs/lloyal.node-darwin-arm64": "1.0.0",
"@lloyal-labs/lloyal.node-linux-x64-cuda": "1.0.0",
...
}
}npm installs only the package matching your platform. Unsupported platforms fall back to source builds.
Each platform package contains:
- Prebuilt native addon (
lloyal.node) - Platform-specific shared libraries (macOS/Linux only:
*.dylib,*.so) - Minimal dependencies (no build tools required)
Note: Windows uses static linking, so only the .node file is included.
Package naming: @lloyal-labs/lloyal.node-{platform}-{arch}[-{gpu}]
Examples:
@lloyal-labs/lloyal.node-darwin-arm64(macOS Apple Silicon with Metal)@lloyal-labs/lloyal.node-linux-x64-cuda(Linux x64 with CUDA 12.6)@lloyal-labs/lloyal.node-win32-arm64-vulkan(Windows ARM64 with Vulkan)
lloyal.node uses runtime dynamic loading with automatic fallback:
- npm install downloads platform packages matching your OS/CPU via
optionalDependencies - At runtime, when you call
createContext():- If
gpuVariantoption orLLOYAL_GPUenv var is set, tries that variant first - If GPU variant fails (missing runtime libs like
libcudart.so), falls back to CPU - If no prebuilt available, tries local
build/Release/(development)
- If
This pattern ensures:
- Graceful degradation: GPU unavailable? Falls back to CPU automatically
- No install-time decisions: Works correctly in Docker multi-stage builds
- Explicit control: Use
gpuVariantoption for deterministic loading
Example:
const { createContext } = require('@lloyal-labs/lloyal.node');
// Automatic: uses LLOYAL_GPU env var or CPU
const ctx = await createContext({ modelPath: './model.gguf' });
// Explicit: request CUDA with fallback to CPU
const ctx = await createContext(
{ modelPath: './model.gguf' },
{ gpuVariant: 'cuda' }
);Strict GPU Loading (No Fallback)
For production or CI environments where you want to fail fast if the requested GPU variant is unavailable:
export LLOYAL_NO_FALLBACK=1
export LLOYAL_GPU=cuda
node app.js # Throws error if CUDA unavailable instead of falling back to CPUThis is useful for:
- CI pipelines testing specific GPU variants
- Production deployments where GPU acceleration is required
- Debugging GPU loading issues
lloyal.node (N-API binding via cmake-js)
↓ includes
liblloyal (header-only C++ library, git submodule)
↓ links
llama.cpp (inference engine, git submodule)
↓ compiles to
Platform-specific binaries:
macOS: lloyal.node + libllama.dylib + Metal
Linux: lloyal.node + libllama.so + OpenMP
Windows: lloyal.node (static, all-in-one)
lloyal.node uses cmake-js to build the native addon:
# Development build
npm run build
# The build script (scripts/build.js) invokes:
npx cmake-js compile --CDGGML_METAL=ON # macOS
npx cmake-js compile # Linux/Windows CPU
npx cmake-js compile --CDGGML_CUDA=ON # CUDA
npx cmake-js compile --CDGGML_VULKAN=ON # VulkanSubmodule Structure:
llama.cpp/- llama.cpp inference engineliblloyal/- Header-only C++ abstraction layer
To update submodules:
git submodule update --init --recursive
git submodule update --remote # Pull latestGitHub Actions builds all 13 platform packages on release:
Native runners:
- macOS:
macos-14(arm64),macos-15-intel(x64) - Linux x64:
ubuntu-22.04 - Linux ARM64:
ubuntu-22.04-arm(native, no emulation) - Windows:
windows-2022
Cross-compilation:
- Windows ARM64: Cross-compiled from x64 using LLVM/clang-cl
Custom actions:
.github/actions/provision-cuda: Unified CUDA 12.6 installation- Uses
jakoch/install-vulkan-sdk-actionfor Vulkan SDK
Testing in CI:
- CPU, Metal, and Vulkan packages: Fully tested (build + verify + API/E2E tests)
- CUDA packages: Build-only (no GPU hardware in CI runners)
- Cross-compiled packages (win32-arm64): Build-only (can't run ARM64 on x64)
CUDA packages are verified to link correctly, but runtime testing requires actual NVIDIA GPU hardware.
Build time:
- x64 platforms: ~10-15 minutes per package
- ARM64 (native): ~10-15 minutes per package
- Total pipeline: ~2-3 hours for all 13 packages
1. Release preparation:
# Update submodules to desired versions
git submodule update --remote
# Update version (updates all platform packages too)
npm version minor # or major/patch/prerelease
# Commit and push
git add .
git commit -m "chore: prepare v1.0.0 release"
git push2. Tag and trigger release:
git tag v1.0.0
git push origin v1.0.03. CI builds and publishes:
GitHub Actions automatically:
- Builds all 13 platform packages
- Publishes to npm registry (
@lloyal-labs/lloyal.node-*) - Publishes main package (
@lloyal-labs/lloyal.node) - Uses
--tag alphafor prerelease versions (*-alpha,*-beta,*-rc)
4. Verify release:
npm info @lloyal-labs/lloyal.node
npm info @lloyal-labs/lloyal.node-linux-x64-cudaAll packages use synchronized versioning. The scripts/sync-versions.js script updates all platform package versions to match the main package.
| Metric | llama.node | lloyal.node |
|---|---|---|
| Total packages | 14 | 13 |
| Platform parity | 100% | 93% |
| x64 coverage | Full | Full |
| ARM64 coverage | Full | Full |
| CUDA version | 12.6 | 12.6 |
| Vulkan support | Full | Full |
| Windows ARM64 | ✅ | ✅ |
| Windows linking | Shared DLLs | Static |
| Snapdragon DSP | ✅ Hexagon | ⏸️ Roadmap |
Key difference: Windows uses static linking for reliability (avoids DLL initialization crashes).
- Snapdragon Hexagon DSP optimization (if demand exists)
- Build caching for faster CI (ccache integration)
- Download metrics to validate platform priorities
- Automated testing on real ARM64 hardware (self-hosted runners)
Platform support priorities are driven by user demand. If you need a specific platform or GPU variant, please open an issue at github.com/lloyal-ai/lloyal.node.