This repository serves as a meta-workspace for managing, building, and running various local AI tools and inference engines. It centralizes dependencies using Nix and orchestrates tasks using Just.
The goal is to provide a reproducible, one-click setup for compiling high-performance inference backends (like llama.cpp) and running services (like TTS and ASR) without managing individual environments for each submodule.
All external projects are managed as git submodules in the external/ directory:
- llama.cpp: Inference of LLaMA model in pure C/C++.
- ik_llama.cpp: A fork of llama.cpp with optimizations.
- Ollama: Get up and running with large language models.
- Kokoro-FastAPI: A Dockerized/FastAPI wrapper for the Kokoro TTS model.
- agent-cli: CLI agent tool (used here for its
faster-whisperserver script).
- Nix: Required for the environment.
- Direnv (Recommended): Automatically loads the Nix environment when you enter the directory.
- Git: To manage the repository and submodules.
-
Clone the repository:
git clone --recursive git@github.com:basnijholt/ai.git cd ai -
Enter the environment: If you have
direnvinstalled:direnv allow
Otherwise, drop into the Nix shell manually:
nix-shell
This provides
cmake,gcc,go,cuda,python,uv, andjustconfigured specifically for these projects. -
Build everything:
just build
The justfile defines all available commands.
| Command | Alias | Description |
|---|---|---|
just build |
just b |
Compiles llama.cpp, ik_llama.cpp, and ollama from scratch. |
just rebuild |
just r |
Incrementally recompiles all projects. |
just sync |
just s |
Pulls the latest changes for all submodules from their upstream remotes. |
just commit-submodules |
just cs |
Commits submodule updates with an auto-generated message (only updated modules). |
just clean |
just c |
Removes build artifacts for all projects. |
| Command | Description |
|---|---|
just start-kokoro |
Starts the Kokoro TTS server (GPU accelerated). Automatically handles python venv and model downloads. |
just start-faster-whisper |
Starts the Faster Whisper ASR server on port 8811 (CUDA, float16). |
You can also target specific projects:
- llama.cpp:
build-llama,rebuild-llama,clean-llama,sync-llama - ik_llama.cpp:
build-ik,rebuild-ik,clean-ik,sync-ik - Ollama:
build-ollama,rebuild-ollama,clean-ollama,sync-ollama - Kokoro:
sync-kokoro - Agent CLI:
sync-agent-cli
Note
This setup is specifically tailored for a machine with NVIDIA CUDA-compatible hardware.
- Build Flags: Configured in
justfile. These include flags for CUDA support and hardware-specific architectures (e.g., targeting NVIDIA GPUs). - Environment: Defined in
shell.nix. It ensuresLD_LIBRARY_PATHincludes necessary CUDA and C++ libraries for Python extensions.
My complete NixOS configuration, which powers this setup, can be found in my dotfiles.
This meta-repository is for personal organization. Each submodule retains its own license.