feat(llm): implement model-centric architecture with protocol-prefix dispatching#140
feat(llm): implement model-centric architecture with protocol-prefix dispatching#140Wkstr wants to merge 5 commits intomemovai:mainfrom
Conversation
- Introduce protocol-prefix dispatching (e.g., openai/, anthropic/) - Decouple logical model aliases from physical provider details - Implement automatic endpoint resolution based on api_base - Add unified support for cloud (HTTPS) and local (HTTP) models - Support NVS/Config-driven model list for better extensibility
📝 WalkthroughWalkthroughAdds configurable LLM API base and protocol-aware proxy behavior. Introduces a CLI Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as CLI Handler
participant LLM as LLM Proxy
participant NVS as NVS Storage
participant API as Remote API
User->>CLI: set_api_base("https://api.example.com/v1")
CLI->>LLM: llm_set_api_base(api_base)
LLM->>LLM: llm_validate_api_base()
LLM->>LLM: llm_parse_api_base()
LLM->>NVS: persist MIMI_NVS_KEY_API_BASE
LLM->>LLM: llm_recompute_effective_config()
LLM-->>CLI: return success
CLI-->>User: confirmation
User->>LLM: trigger LLM request
LLM->>LLM: detect protocol & build headers/URL
LLM->>API: send request (protocol-specific endpoint & headers)
API-->>LLM: response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
main/llm/llm_proxy.c (1)
208-225: Consider aligning enum values with array indices for clarity.The
LLM_PROTOCOL_ANTHROPIC = 0andLLM_PROTOCOL_OPENAI = 1enum values don't match thePROTO_MAParray layout where OpenAI is at index 0 and Anthropic at index 1. Theget_current_protofunction correctly handles this via the ternary, but direct indexing with the enum would be clearer.This is a minor readability concern and doesn't affect correctness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/llm/llm_proxy.c` around lines 208 - 225, The PROTO_MAP order doesn't match the enum ordering (LLM_PROTOCOL_ANTHROPIC vs LLM_PROTOCOL_OPENAI), which reduces clarity; either reorder the enum so LLM_PROTOCOL_OPENAI = 0 and LLM_PROTOCOL_ANTHROPIC = 1, or reorder PROTO_MAP to match the current enum ordering, then simplify get_current_proto to index PROTO_MAP directly using the enum (use LLM_PROTOCOL_OPENAI / LLM_PROTOCOL_ANTHROPIC as indices); update references to LLM_PROTOCOL_OPENAI, LLM_PROTOCOL_ANTHROPIC, PROTO_MAP, and get_current_proto accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@main/cli/serial_cli.c`:
- Around line 144-154: The cmd_set_api_base function currently ignores
llm_set_api_base's return value; call llm_set_api_base and check its returned
esp_err_t (e.g., compare against ESP_ERR_INVALID_ARG and ESP_OK), print an error
message (including the problematic URL via api_base_args.base->sval[0]) and
return a non-zero status on failure, and only print "API base set." and return 0
when llm_set_api_base succeeds; update cmd_set_api_base to handle both
validation errors (ESP_ERR_INVALID_ARG) and any other non-OK returns
accordingly.
---
Nitpick comments:
In `@main/llm/llm_proxy.c`:
- Around line 208-225: The PROTO_MAP order doesn't match the enum ordering
(LLM_PROTOCOL_ANTHROPIC vs LLM_PROTOCOL_OPENAI), which reduces clarity; either
reorder the enum so LLM_PROTOCOL_OPENAI = 0 and LLM_PROTOCOL_ANTHROPIC = 1, or
reorder PROTO_MAP to match the current enum ordering, then simplify
get_current_proto to index PROTO_MAP directly using the enum (use
LLM_PROTOCOL_OPENAI / LLM_PROTOCOL_ANTHROPIC as indices); update references to
LLM_PROTOCOL_OPENAI, LLM_PROTOCOL_ANTHROPIC, PROTO_MAP, and get_current_proto
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b1683810-9114-4451-ba91-271fcca25c97
📒 Files selected for processing (5)
main/cli/serial_cli.cmain/llm/llm_proxy.cmain/llm/llm_proxy.hmain/mimi_config.hmain/mimi_secrets.h.example
|
Thank you for your contribution. At the moment, some models may differ in terms of parameter support. Would it be possible to support issues in this area? |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@main/llm/llm_proxy.c`:
- Around line 330-358: llm_recompute_effective_config currently mutates globals
(s_protocol, s_model_id, s_api_url, headers built by llm_build_request_targets,
etc.) without synchronization; to fix race windows either (A) add a dedicated
mutex (e.g., llm_config_lock) and take it around the entire body of
llm_recompute_effective_config and any request-path code that reads the config
(e.g., llm_chat_tools, functions that read
s_model_id/s_api_url/get_current_proto()/llm_build_request_targets callers), or
(B) build a single immutable snapshot struct (e.g., llm_effective_config) inside
llm_recompute_effective_config and then atomically swap the pointer so readers
can grab the snapshot pointer without seeing partially-updated fields; pick one
approach and apply it consistently to protect updates that currently race with
llm_chat_tools and the setters called from serial_cli.
- Around line 709-714: The code currently uses hard-coded prefix checks on
s_model_id to choose "max_tokens" vs "max_completion_tokens"; replace this logic
with a model-capability lookup so future o3/o4 variants and gpt-5 are handled
correctly: implement a small helper or config map that maps model family
patterns (e.g., regexes for ^o[0-9]+(-|$) to use "max_completion_tokens",
patterns for gpt-5 to use "max_output_tokens", and a default to "max_tokens"),
then use that chosen parameter name when calling cJSON_AddNumberToObject(body,
<param_name>, MIMI_LLM_MAX_TOKENS). Ensure you reference s_model_id and
MIMI_LLM_MAX_TOKENS and replace the existing strncasecmp checks with that
helper/lookup.
- Around line 404-406: The proxy path is adding Authorization headers even when
s_api_key is empty and llm_chat_tools() hard-rejects empty keys; update code to
only write auth headers when s_api_key[0] != '\0' (gate any Authorization:
Bearer <key> writes in the proxy OpenAI path and any x-api-key writes in the
Anthropic path), verify the direct Anthropic logic does not unconditionally add
x-api-key, and remove the hard-reject check in llm_chat_tools() so it no longer
rejects empty s_api_key but instead allows the request to proceed and let
upstream return 401 if required.
|
Thank you for your valuable feedback. I've updated the logic to handle the parameter differences:
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
main/llm/llm_proxy.c (1)
960-970:⚠️ Potential issue | 🟠 MajorReject oversized model identifiers before persisting them.
With protocol-prefixed models, 64 bytes is easy to exceed.
llm_set_model()writes the full string to NVS, thensafe_copy()truncates it in RAM; on the next boot, Line 391 reads intomodel_tmp[64], so the persisted longer value cannot be loaded and the device silently falls back. Validate the length beforenvs_set_str(), or increase the model buffers.Proposed fix
esp_err_t llm_set_model(const char *model) { + if (!model || strnlen(model, sizeof(s_model)) >= sizeof(s_model)) { + ESP_LOGE(TAG, "Model id is too long"); + return ESP_ERR_INVALID_ARG; + } + nvs_handle_t nvs; ESP_ERROR_CHECK(nvs_open(MIMI_NVS_LLM, NVS_READWRITE, &nvs)); ESP_ERROR_CHECK(nvs_set_str(nvs, MIMI_NVS_KEY_MODEL, model));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/llm/llm_proxy.c` around lines 960 - 970, The llm_set_model function is persisting model strings longer than the in-RAM buffers can hold (safe_copy -> s_model and the boot-time model_tmp[64] read), causing silent fallbacks; before calling nvs_set_str you should validate the model length and either reject/log an error for strings longer than the in-memory buffer (e.g., sizeof(s_model) or defined MAX_MODEL_LEN) or increase the model buffer sizes and associated read buffer (model_tmp[64]) to accommodate protocol-prefixed identifiers; update llm_set_model to check strlen(model) against the chosen limit and return an error (and do not call nvs_set_str) if it exceeds the limit, ensuring NVS and in-memory sizes remain consistent.
♻️ Duplicate comments (1)
main/llm/llm_proxy.c (1)
330-358:⚠️ Potential issue | 🟠 MajorSynchronize effective-config updates.
llm_recompute_effective_config()still rewritess_protocol,s_model_id, and the deriveds_api_*fields one by one. A concurrentllm_chat_tools()can still observe a mixed config and send a request with the wrong model/URL/header combination. Please guard recompute + request reads with a lock or switch readers to an immutable snapshot.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/llm/llm_proxy.c` around lines 330 - 358, llm_recompute_effective_config currently mutates shared globals (s_protocol, s_model_id, s_api_url, etc.) piecemeal allowing llm_chat_tools to observe an inconsistent config; protect the recompute and readers by either (A) adding a mutex (e.g., llm_config_lock) and acquiring it for the entire body of llm_recompute_effective_config and around reads in llm_chat_tools/other request paths, or (B) build a new immutable config struct (e.g., struct llm_config { protocol, model_id, api_url, headers... }) locally inside llm_recompute_effective_config, fully populate it using llm_parse_api_base and llm_build_request_targets equivalents, then atomically swap a pointer to the new struct (and free the old) so readers of the global pointer see either the old or the new config atomically; apply the same synchronization choice consistently to functions that read s_protocol, s_model_id, s_api_url (and any llm_*_targets) to eliminate mixed-state races.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@main/llm/llm_proxy.c`:
- Around line 404-406: Currently the code logs via ESP_LOGW(TAG, ...) whenever
s_api_key is empty which misleads when api_base intentionally points to a
keyless local backend; instead, suppress this boot-time warning and either (a)
only emit the warning when an actual authentication failure occurs by moving the
ESP_LOGW call into the HTTP response handling path and checking for a 401 status
code, or (b) skip the warning when api_base indicates a local/non-TLS endpoint
(e.g., starts with "http://" or contains "localhost"/"127.0.0.1"); update the
code that checks s_api_key and the HTTP response handling to implement one of
these behaviors (reference s_api_key, ESP_LOGW, TAG, and api_base to locate the
relevant checks and response handling).
- Around line 481-487: The loop that reads into tmp using proxy_conn_read and
calls resp_buf_append(rb, tmp, n) must treat a non-ESP_OK return from
resp_buf_append as a fatal OOM/error and propagate it immediately instead of
breaking and continuing parsing; update the read loop in llm_proxy.c to check
resp_buf_append's return, abort reading on failure, free/cleanup as needed and
return/propagate an appropriate error code (e.g., ESP_ERR_NO_MEM or the
function's error convention) so callers do not see a truncated response; locate
the while loop that uses proxy_conn_read, the resp_buf_append(rb, tmp, n) call,
and ensure the function returns the error up the stack rather than silently
breaking.
---
Outside diff comments:
In `@main/llm/llm_proxy.c`:
- Around line 960-970: The llm_set_model function is persisting model strings
longer than the in-RAM buffers can hold (safe_copy -> s_model and the boot-time
model_tmp[64] read), causing silent fallbacks; before calling nvs_set_str you
should validate the model length and either reject/log an error for strings
longer than the in-memory buffer (e.g., sizeof(s_model) or defined
MAX_MODEL_LEN) or increase the model buffer sizes and associated read buffer
(model_tmp[64]) to accommodate protocol-prefixed identifiers; update
llm_set_model to check strlen(model) against the chosen limit and return an
error (and do not call nvs_set_str) if it exceeds the limit, ensuring NVS and
in-memory sizes remain consistent.
---
Duplicate comments:
In `@main/llm/llm_proxy.c`:
- Around line 330-358: llm_recompute_effective_config currently mutates shared
globals (s_protocol, s_model_id, s_api_url, etc.) piecemeal allowing
llm_chat_tools to observe an inconsistent config; protect the recompute and
readers by either (A) adding a mutex (e.g., llm_config_lock) and acquiring it
for the entire body of llm_recompute_effective_config and around reads in
llm_chat_tools/other request paths, or (B) build a new immutable config struct
(e.g., struct llm_config { protocol, model_id, api_url, headers... }) locally
inside llm_recompute_effective_config, fully populate it using
llm_parse_api_base and llm_build_request_targets equivalents, then atomically
swap a pointer to the new struct (and free the old) so readers of the global
pointer see either the old or the new config atomically; apply the same
synchronization choice consistently to functions that read s_protocol,
s_model_id, s_api_url (and any llm_*_targets) to eliminate mixed-state races.
|
Cool, I will test it. |
Summary
This PR introduces a major architectural evolution to the LLM proxy: a model-centric configuration approach. By decoupling agent logic from specific providers, we now support adding new LLM providers simply by specifying a
protocol/model_idformat—zero code changes required for adding compatible vendors.Core Advantages
The new configuration approach offers several key benefits:
openai,anthropic) to automatically route requests to the correct internal adapter.http://vshttps://in theapi_baseto handle TLS/Non-TLS connections seamlessly, making local models (Ollama, vLLM) first-class citizens.Design Inspiration: LiteLLM
The architecture is heavily inspired by the LiteLLM design philosophy:
provider/modelformat for unambiguous dispatching, falling back to the defaultPROVIDERvariable if no prefix is detected.Common Model Configurations
The system easily maps diverse vendors to standard provider interfaces:
MIMI_SECRET_API_BASE)MIMI_SECRET_MODEL_PROVIDER)https://api.openai.com/v1openaihttps://api.anthropic.com/v1anthropichttps://api.deepseek.com/v1openaihttp://localhost:11434/v1openaihttps://api.groq.com/openai/v1openaihttps://api.minimaxi.com/anthropic/v1anthropichttps://openrouter.ai/api/v1openaihttp://localhost:8000/v1openaiTechnical Implementation Details
api_baseto distinguish betweenhttp://(Local/No-TLS) andhttps://(Cloud/TLS), ensuring seamless connectivity for local inference servers./chat/completionsvs/messages) are handled internally by the adapters, preventing configuration errors.Summary by CodeRabbit
New Features
Documentation