The llama.cpp backend (backend/cpp/llama-cpp/grpc-server.cpp) is a gRPC adaptation of the upstream HTTP server (llama.cpp/tools/server/server.cpp). It uses the same underlying server infrastructure from llama.cpp/tools/server/server-context.cpp.
- Test llama.cpp backend compilation:
make backends/llama-cpp - The backend is built as part of the main build process
- Check
backend/cpp/llama-cpp/Makefilefor build configuration
- grpc-server.cpp: gRPC server implementation, adapts HTTP server patterns to gRPC
- Uses shared server infrastructure:
server-context.cpp,server-task.cpp,server-queue.cpp,server-common.cpp - The gRPC server mirrors the HTTP server's functionality but uses gRPC instead of HTTP
When fixing compilation errors after upstream changes:
- Check how
server.cpp(HTTP server) handles the same change - Look for new public APIs or getter methods
- Store copies of needed data instead of accessing private members
- Update function calls to match new signatures
- Test with
make backends/llama-cpp
- gRPC uses
BackendServiceImplclass with gRPC service methods - HTTP server uses
server_routeswith HTTP handlers - Both use the same
server_contextand task queue infrastructure - gRPC methods:
LoadModel,Predict,PredictStream,Embedding,Rerank,TokenizeString,GetMetrics,Health
When working on JSON/XML tool call parsing functionality, always check llama.cpp for reference implementation and updates:
- Review XML Format Definitions: Check
llama.cpp/common/chat-parser-xml-toolcall.hforxml_tool_call_formatstruct changes - Review Parsing Logic: Check
llama.cpp/common/chat-parser-xml-toolcall.cppfor parsing algorithm updates - Review Format Presets: Check
llama.cpp/common/chat-parser.cppfor new XML format presets (search forxml_tool_call_format form) - Review Model Lists: Check
llama.cpp/common/chat.hforCOMMON_CHAT_FORMAT_*enum values that use XML parsing:COMMON_CHAT_FORMAT_GLM_4_5COMMON_CHAT_FORMAT_MINIMAX_M2COMMON_CHAT_FORMAT_KIMI_K2COMMON_CHAT_FORMAT_QWEN3_CODER_XMLCOMMON_CHAT_FORMAT_APRIEL_1_5COMMON_CHAT_FORMAT_XIAOMI_MIMO- Any new formats added
Always check llama.cpp for new model configuration options that should be supported in LocalAI:
- Check Server Context: Review
llama.cpp/tools/server/server-context.cppfor new parameters - Check Chat Params: Review
llama.cpp/common/chat.hforcommon_chat_paramsstruct changes - Check Server Options: Review
llama.cpp/tools/server/server.cppfor command-line argument changes - Examples of options to check:
ctx_shift- Context shifting supportparallel_tool_calls- Parallel tool callingreasoning_format- Reasoning format options- Any new flags or parameters
- Feature Parity: Always aim for feature parity with llama.cpp's implementation
- Test Coverage: Add tests for new features matching llama.cpp's behavior
- Documentation: Update relevant documentation when adding new formats or options
- Backward Compatibility: Ensure changes don't break existing functionality
llama.cpp/common/chat-parser-xml-toolcall.h- Format definitionsllama.cpp/common/chat-parser-xml-toolcall.cpp- Parsing logicllama.cpp/common/chat-parser.cpp- Format presets and model-specific handlersllama.cpp/common/chat.h- Format enums and parameter structuresllama.cpp/tools/server/server-context.cpp- Server configuration options