Skip to content

Feat/embeddings submodule#197

Merged
juligasa merged 82 commits intomainfrom
feat/embeddings-submodule
Feb 19, 2026
Merged

Feat/embeddings submodule#197
juligasa merged 82 commits intomainfrom
feat/embeddings-submodule

Conversation

@juligasa
Copy link
Collaborator

This PR is identical to #152 in functionality but it uses submodules instead of raw source files to not pollute the workspace with third party code. All the submodules initialization is handled by direnv so it transparent for the Developer.

Integrate llama.cpp via Go bindings for local embedding generation.
Add sqlite-vec for vector storage and similarity search.
Include schema migrations, daemon API changes, and proto updates.
…build

- Fix sqlite-vec compilation on Alpine/musl by guarding BSD type aliases with __GLIBC__
- Dockerfile: switch to CPU-only llama.cpp build (Vulkan shaders fail on Alpine)
- Dockerfile: add llama-go go.mod copy for replace directive support
- CI workflows: add GGUF model caching and download steps
- CI workflows: add llama.cpp build steps (CPU-only for tests, GPU for desktop releases)
- CI workflows: add LIBRARY_PATH/C_INCLUDE_PATH env vars for CGO linking
- ci-setup action: add Vulkan SDK and llama.cpp build per platform
Replace the vendored backend/util/llama-go directory (~1200 C/C++ files,
500K+ lines) with a git submodule pointing to seed-hypermedia/llama-go.

Changes:
- Remove vendored llama-go and add as git submodule
- Fix go.mod: use upstream tcpipuk/llama-go module path with replace
  directive pointing to ./backend/util/llama-go
- Update import in llamacpp.go to use upstream module path
- Add submodule init guard to .envrc (before mise activation)
- Add submodule existence check to mise.toml ensure-llama-libs task
- Remove sync_llama_go() and generate_gpu_build_files() from ./dev script
- Add submodules: recursive to 12 CI checkout steps across 10 workflows
- Fix wrapper.cpp in fork: use common_chat_parser_params matching pinned
  llama.cpp version (commit 2eee6c866)
- Use HTTPS URL in .gitmodules so cloning works without SSH keys
- Add ensure-submodule mise task to auto-init submodules
- Make ensure-llama-libs depend on ensure-submodule
- Move setup orchestration from mise enter hook (unreliable with
  direnv) to explicit mise run calls in .envrc
- Result: git clone + cd into repo does everything automatically
The llama-go submodule includes the full llama.cpp source tree (~2500
files, 148MB). The previous glob copied all of them into the Please
sandbox temp dir before building, causing massive disk I/O and memory
pressure that could freeze the machine.

Build in $WORKSPACE in-place (like seed-daemon already does) and copy
only the ~9 output .a files back to the sandbox. The Makefile is kept
as a src for change tracking.
Eliminate the SEED_CPU_ONLY / SEED_USE_GPU toggle that caused build
conflicts when ensure-llama-libs (CPU-only) and plz (GPU) built into
the same directory with different modes.

Now each platform always uses the same GPU mode everywhere:
- macOS: always Metal (built-in, zero deps)
- Linux: always CPU-only for local dev (no Vulkan packages needed)
- CI: handles per-platform GPU builds in ci-setup/action.yml

Changes:
- mise.toml: ensure-llama-libs detects OS and builds Metal on macOS,
  CPU-only on Linux. Detects stale CPU builds on macOS via missing
  libggml-blas.a and forces rebuild.
- backend/BUILD.plz: llama-cpp and seed-daemon genrules use OS
  detection instead of SEED_CPU_ONLY env var.
- dev: remove setup_gpu_build(), --cpu/--gpu flags from all commands.
- .plzconfig: remove SEED_USE_GPU/SEED_CPU_ONLY PassUnsafeEnv.
- Fork Makefile: add Metal mismatch detection alongside existing
  Vulkan detection in CMake cache checks.
The seed-daemon genrule's glob(**/*.c, **/*.h, **/*.cpp, **/*.hpp)
captures ~2500 files from the llama.cpp nested submodule. Please
hashes and copies all of them into the sandbox, causing 10+ minute
builds and extreme CPU/memory usage.

Exclude util/llama-go/llama.cpp/** since the seed-daemon genrule
only needs the compiled .a libraries (via :llama-cpp dependency),
not the C/C++ source files.
plz build takes 10+ minutes for seed-daemon due to sandbox overhead
(copying files, hashing dependencies). go build directly takes ~12s
from cold cache, ~3s incremental.

Replace plz build //backend:seed-daemon with direct go build in all
./dev commands (build-desktop, test-desktop, run-backend, build-backend).
The BUILD.plz genrule is still used by CI workflows.

Also fix build-backend which still referenced the removed
setup_gpu_build() function.
The test waited for embedCalls==2 then immediately checked the DB,
but the INSERT transaction could still be in-flight. Now also waits
for runOnce to fully complete (task deleted from taskMgr) before
checking DB state.
@juligasa juligasa marked this pull request as ready for review February 19, 2026 08:52
@juligasa juligasa force-pushed the feat/embeddings-submodule branch from 3c35dda to 8e6ba0e Compare February 19, 2026 08:55
… and remove test-gpu-build

- Add GGUF model cache + download steps to dev-desktop.yml (already in release-desktop.yml)
- Add Windows DLL verification step to both dev-desktop.yml and release-desktop.yml
- Delete test-gpu-build.yml as all its steps are now in the real workflows
@juligasa juligasa merged commit baed669 into main Feb 19, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant