What's Changed
Fixed: App no longer requires CUDA to launch — Previously, llama-server.exe would fail to start without cublas64_13.dll and cublasLt64_13.dll present, even when running in CPU-only mode (-ngl 0). The CUDA backend is now loaded dynamically at runtime, so the app launches cleanly on systems without NVIDIA GPUs or CUDA installed.
Improved: Automatic CPU optimization — The build now ships multiple CPU backend variants that auto-detect your processor's best instruction set at runtime:
| CPU Feature | Example CPUs |
|---|---|
| SSE 4.2 | Core 2nd/3rd gen |
| AVX | Sandy Bridge+ |
| AVX2 + FMA | Haswell, Ryzen+ |
| AVX-512 | Skylake-X, EPYC |
| AVX-VNNI | Alder Lake+ |
Download
Grab Unocpp-Setup-v1.1.2.exe below and run the installer.
First time? You'll also need a GGUF model file — download one from HuggingFace.
Build Info
- Built with
GGML_BACKEND_DL=ONandGGML_CPU_ALL_VARIANTS=ON - CUDA 13.0 / MSVC 14.44