Skip to content

feat(llm): implement model-centric architecture with protocol-prefix dispatching#140

Open
Wkstr wants to merge 5 commits intomemovai:mainfrom
Wkstr:feat/model_url
Open

feat(llm): implement model-centric architecture with protocol-prefix dispatching#140
Wkstr wants to merge 5 commits intomemovai:mainfrom
Wkstr:feat/model_url

Conversation

@Wkstr
Copy link
Copy Markdown

@Wkstr Wkstr commented Mar 10, 2026

Summary

This PR introduces a major architectural evolution to the LLM proxy: a model-centric configuration approach. By decoupling agent logic from specific providers, we now support adding new LLM providers simply by specifying a protocol/model_id format—zero code changes required for adding compatible vendors.

Core Advantages

The new configuration approach offers several key benefits:

  • Zero-code provider addition: Add any OpenAI or Anthropic-compatible service (e.g., DeepSeek, MiniMax, Groq) through configuration alone.
  • Protocol-Prefix Dispatching: Use Provider(e.g., openai, anthropic) to automatically route requests to the correct internal adapter.
  • Cloud & Local Transparency: Automatically detects http:// vs https:// in the api_base to handle TLS/Non-TLS connections seamlessly, making local models (Ollama, vLLM) first-class citizens.
  • Resolution Layer: Recomputes effective configurations (URLs, headers, and protocol types) during initialization to ensure high performance during runtime execution.

Design Inspiration: LiteLLM

The architecture is heavily inspired by the LiteLLM design philosophy:

  • Model-centric: Users care about the model , while the system handles the provider-specific implementation details.
  • Provider Prefix Support: The system supports the provider/model format for unambiguous dispatching, falling back to the default PROVIDER variable if no prefix is detected.
  • Configuration-driven: Adding support for new compatible vendors (like SiliconFlow or custom local gateways) requires only NVS or header changes, with no modifications to the C source code.

Common Model Configurations

The system easily maps diverse vendors to standard provider interfaces:

Vendor API Base (MIMI_SECRET_API_BASE) Provider (MIMI_SECRET_MODEL_PROVIDER)
OpenAI https://api.openai.com/v1 openai
Anthropic https://api.anthropic.com/v1 anthropic
DeepSeek https://api.deepseek.com/v1 openai
Ollama (Local) http://localhost:11434/v1 openai
Groq https://api.groq.com/openai/v1 openai
MiniMax https://api.minimaxi.com/anthropic/v1 anthropic
OpenRouter https://openrouter.ai/api/v1 openai
VLLM (Local) http://localhost:8000/v1 openai

Technical Implementation Details

  • Automatic Connection Handling: The system parses the api_base to distinguish between http:// (Local/No-TLS) and https:// (Cloud/TLS), ensuring seamless connectivity for local inference servers.
  • Endpoint Resolution: Protocol-specific paths (e.g., /chat/completions vs /messages) are handled internally by the adapters, preventing configuration errors.
  • Startup Compilation: Configuration is resolved at boot time to minimize runtime string processing and CPU overhead on the ESP32.

Summary by CodeRabbit

  • New Features

    • New CLI command to set the LLM API base URL with validation, persistence, and user confirmation.
    • API Base now appears in the configuration view alongside other settings.
    • Added protocol-aware support for Anthropic and OpenAI, with dynamic request/header behavior based on chosen provider.
  • Documentation

    • Updated example secrets/config defaults to illustrate API base usage.

Wkstr added 2 commits March 11, 2026 01:20
- Introduce protocol-prefix dispatching (e.g., openai/, anthropic/)
- Decouple logical model aliases from physical provider details
- Implement automatic endpoint resolution based on api_base
- Add unified support for cloud (HTTPS) and local (HTTP) models
- Support NVS/Config-driven model list for better extensibility
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

Adds configurable LLM API base and protocol-aware proxy behavior. Introduces a CLI set_api_base command, new llm_proxy API and logic to parse/validate API base, derive host/path/headers for Anthropic/OpenAI, persist to NVS, and recompute effective request targets.

Changes

Cohort / File(s) Summary
CLI Interface
main/cli/serial_cli.c, CMakeLists.txt
Added set_api_base CLI subcommand, argument table, handler cmd_set_api_base, registration, and build list update. Updated config display to include API base.
Core LLM Proxy Implementation
main/llm/llm_proxy.c
Added protocol enum/state, API base validation/parsing, host/port/base-path/tls/url construction, llm_recompute_effective_config(), protocol-aware header/path building, proxy and response handling adjustments, NVS read on init, and updated request/response payload handling.
LLM Proxy Public API
main/llm/llm_proxy.h
Exposed esp_err_t llm_set_api_base(const char *api_base) with documentation; other setters now trigger recompute.
Configuration & Secrets
main/mimi_config.h, main/mimi_secrets.h.example
Added macros MIMI_SECRET_API_BASE, MIMI_LLM_API_BASE_ANTHROPIC, MIMI_LLM_API_BASE_OPENAI, MIMI_NVS_KEY_API_BASE; removed legacy URL macros; updated example secrets comment.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as CLI Handler
    participant LLM as LLM Proxy
    participant NVS as NVS Storage
    participant API as Remote API

    User->>CLI: set_api_base("https://api.example.com/v1")
    CLI->>LLM: llm_set_api_base(api_base)
    LLM->>LLM: llm_validate_api_base()
    LLM->>LLM: llm_parse_api_base()
    LLM->>NVS: persist MIMI_NVS_KEY_API_BASE
    LLM->>LLM: llm_recompute_effective_config()
    LLM-->>CLI: return success
    CLI-->>User: confirmation

    User->>LLM: trigger LLM request
    LLM->>LLM: detect protocol & build headers/URL
    LLM->>API: send request (protocol-specific endpoint & headers)
    API-->>LLM: response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibble on bytes and map a new base,

Paths and headers stitched in their place,
Anthropic or OpenAI, I skip and decide,
Saved snug in NVS, the endpoints abide,
Hooray — the proxy sings, and I hop with pride!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 39.13% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title describes the overall architectural approach (model-centric with protocol-prefix dispatching) which is the primary design pattern across all changes, aligning with the PR's core objective of replacing provider-centric logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
main/llm/llm_proxy.c (1)

208-225: Consider aligning enum values with array indices for clarity.

The LLM_PROTOCOL_ANTHROPIC = 0 and LLM_PROTOCOL_OPENAI = 1 enum values don't match the PROTO_MAP array layout where OpenAI is at index 0 and Anthropic at index 1. The get_current_proto function correctly handles this via the ternary, but direct indexing with the enum would be clearer.

This is a minor readability concern and doesn't affect correctness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/llm/llm_proxy.c` around lines 208 - 225, The PROTO_MAP order doesn't
match the enum ordering (LLM_PROTOCOL_ANTHROPIC vs LLM_PROTOCOL_OPENAI), which
reduces clarity; either reorder the enum so LLM_PROTOCOL_OPENAI = 0 and
LLM_PROTOCOL_ANTHROPIC = 1, or reorder PROTO_MAP to match the current enum
ordering, then simplify get_current_proto to index PROTO_MAP directly using the
enum (use LLM_PROTOCOL_OPENAI / LLM_PROTOCOL_ANTHROPIC as indices); update
references to LLM_PROTOCOL_OPENAI, LLM_PROTOCOL_ANTHROPIC, PROTO_MAP, and
get_current_proto accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@main/cli/serial_cli.c`:
- Around line 144-154: The cmd_set_api_base function currently ignores
llm_set_api_base's return value; call llm_set_api_base and check its returned
esp_err_t (e.g., compare against ESP_ERR_INVALID_ARG and ESP_OK), print an error
message (including the problematic URL via api_base_args.base->sval[0]) and
return a non-zero status on failure, and only print "API base set." and return 0
when llm_set_api_base succeeds; update cmd_set_api_base to handle both
validation errors (ESP_ERR_INVALID_ARG) and any other non-OK returns
accordingly.

---

Nitpick comments:
In `@main/llm/llm_proxy.c`:
- Around line 208-225: The PROTO_MAP order doesn't match the enum ordering
(LLM_PROTOCOL_ANTHROPIC vs LLM_PROTOCOL_OPENAI), which reduces clarity; either
reorder the enum so LLM_PROTOCOL_OPENAI = 0 and LLM_PROTOCOL_ANTHROPIC = 1, or
reorder PROTO_MAP to match the current enum ordering, then simplify
get_current_proto to index PROTO_MAP directly using the enum (use
LLM_PROTOCOL_OPENAI / LLM_PROTOCOL_ANTHROPIC as indices); update references to
LLM_PROTOCOL_OPENAI, LLM_PROTOCOL_ANTHROPIC, PROTO_MAP, and get_current_proto
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b1683810-9114-4451-ba91-271fcca25c97

📥 Commits

Reviewing files that changed from the base of the PR and between 6ee8aff and 939cb0d.

📒 Files selected for processing (5)
  • main/cli/serial_cli.c
  • main/llm/llm_proxy.c
  • main/llm/llm_proxy.h
  • main/mimi_config.h
  • main/mimi_secrets.h.example

@IRONICBo
Copy link
Copy Markdown
Member

Thank you for your contribution. At the moment, some models may differ in terms of parameter support. Would it be possible to support issues in this area?

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@main/llm/llm_proxy.c`:
- Around line 330-358: llm_recompute_effective_config currently mutates globals
(s_protocol, s_model_id, s_api_url, headers built by llm_build_request_targets,
etc.) without synchronization; to fix race windows either (A) add a dedicated
mutex (e.g., llm_config_lock) and take it around the entire body of
llm_recompute_effective_config and any request-path code that reads the config
(e.g., llm_chat_tools, functions that read
s_model_id/s_api_url/get_current_proto()/llm_build_request_targets callers), or
(B) build a single immutable snapshot struct (e.g., llm_effective_config) inside
llm_recompute_effective_config and then atomically swap the pointer so readers
can grab the snapshot pointer without seeing partially-updated fields; pick one
approach and apply it consistently to protect updates that currently race with
llm_chat_tools and the setters called from serial_cli.
- Around line 709-714: The code currently uses hard-coded prefix checks on
s_model_id to choose "max_tokens" vs "max_completion_tokens"; replace this logic
with a model-capability lookup so future o3/o4 variants and gpt-5 are handled
correctly: implement a small helper or config map that maps model family
patterns (e.g., regexes for ^o[0-9]+(-|$) to use "max_completion_tokens",
patterns for gpt-5 to use "max_output_tokens", and a default to "max_tokens"),
then use that chosen parameter name when calling cJSON_AddNumberToObject(body,
<param_name>, MIMI_LLM_MAX_TOKENS). Ensure you reference s_model_id and
MIMI_LLM_MAX_TOKENS and replace the existing strncasecmp checks with that
helper/lookup.
- Around line 404-406: The proxy path is adding Authorization headers even when
s_api_key is empty and llm_chat_tools() hard-rejects empty keys; update code to
only write auth headers when s_api_key[0] != '\0' (gate any Authorization:
Bearer <key> writes in the proxy OpenAI path and any x-api-key writes in the
Anthropic path), verify the direct Anthropic logic does not unconditionally add
x-api-key, and remove the hard-reject check in llm_chat_tools() so it no longer
rejects empty s_api_key but instead allows the request to proceed and let
upstream return 401 if required.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b3e6d8fa-21cd-4a8e-90ad-e0a3347d0922

📥 Commits

Reviewing files that changed from the base of the PR and between 939cb0d and 1c92c08.

📒 Files selected for processing (1)
  • main/llm/llm_proxy.c

@Wkstr
Copy link
Copy Markdown
Author

Wkstr commented Mar 11, 2026

Thank you for your valuable feedback. I've updated the logic to handle the parameter differences:

  • Token Limits: I added a prefix check. Models like o1 and gpt-5 now use max_completion_tokens, while the rest keep using max_tokens. This should fix the compatibility issues with newer OpenAI models.
  • Other Parameters: Standard stuff like tools is already compatible. For things like temperature, top_p, or seed, we're letting the models use their default values for now since we don't support custom overrides yet.
  • Testing: I've already verified this with Qwen, Minimax, Kimi, GLM, and CLIProxyAPI. Everything is working smoothly.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
main/llm/llm_proxy.c (1)

960-970: ⚠️ Potential issue | 🟠 Major

Reject oversized model identifiers before persisting them.

With protocol-prefixed models, 64 bytes is easy to exceed. llm_set_model() writes the full string to NVS, then safe_copy() truncates it in RAM; on the next boot, Line 391 reads into model_tmp[64], so the persisted longer value cannot be loaded and the device silently falls back. Validate the length before nvs_set_str(), or increase the model buffers.

Proposed fix
 esp_err_t llm_set_model(const char *model)
 {
+    if (!model || strnlen(model, sizeof(s_model)) >= sizeof(s_model)) {
+        ESP_LOGE(TAG, "Model id is too long");
+        return ESP_ERR_INVALID_ARG;
+    }
+
     nvs_handle_t nvs;
     ESP_ERROR_CHECK(nvs_open(MIMI_NVS_LLM, NVS_READWRITE, &nvs));
     ESP_ERROR_CHECK(nvs_set_str(nvs, MIMI_NVS_KEY_MODEL, model));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/llm/llm_proxy.c` around lines 960 - 970, The llm_set_model function is
persisting model strings longer than the in-RAM buffers can hold (safe_copy ->
s_model and the boot-time model_tmp[64] read), causing silent fallbacks; before
calling nvs_set_str you should validate the model length and either reject/log
an error for strings longer than the in-memory buffer (e.g., sizeof(s_model) or
defined MAX_MODEL_LEN) or increase the model buffer sizes and associated read
buffer (model_tmp[64]) to accommodate protocol-prefixed identifiers; update
llm_set_model to check strlen(model) against the chosen limit and return an
error (and do not call nvs_set_str) if it exceeds the limit, ensuring NVS and
in-memory sizes remain consistent.
♻️ Duplicate comments (1)
main/llm/llm_proxy.c (1)

330-358: ⚠️ Potential issue | 🟠 Major

Synchronize effective-config updates.

llm_recompute_effective_config() still rewrites s_protocol, s_model_id, and the derived s_api_* fields one by one. A concurrent llm_chat_tools() can still observe a mixed config and send a request with the wrong model/URL/header combination. Please guard recompute + request reads with a lock or switch readers to an immutable snapshot.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/llm/llm_proxy.c` around lines 330 - 358, llm_recompute_effective_config
currently mutates shared globals (s_protocol, s_model_id, s_api_url, etc.)
piecemeal allowing llm_chat_tools to observe an inconsistent config; protect the
recompute and readers by either (A) adding a mutex (e.g., llm_config_lock) and
acquiring it for the entire body of llm_recompute_effective_config and around
reads in llm_chat_tools/other request paths, or (B) build a new immutable config
struct (e.g., struct llm_config { protocol, model_id, api_url, headers... })
locally inside llm_recompute_effective_config, fully populate it using
llm_parse_api_base and llm_build_request_targets equivalents, then atomically
swap a pointer to the new struct (and free the old) so readers of the global
pointer see either the old or the new config atomically; apply the same
synchronization choice consistently to functions that read s_protocol,
s_model_id, s_api_url (and any llm_*_targets) to eliminate mixed-state races.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@main/llm/llm_proxy.c`:
- Around line 404-406: Currently the code logs via ESP_LOGW(TAG, ...) whenever
s_api_key is empty which misleads when api_base intentionally points to a
keyless local backend; instead, suppress this boot-time warning and either (a)
only emit the warning when an actual authentication failure occurs by moving the
ESP_LOGW call into the HTTP response handling path and checking for a 401 status
code, or (b) skip the warning when api_base indicates a local/non-TLS endpoint
(e.g., starts with "http://" or contains "localhost"/"127.0.0.1"); update the
code that checks s_api_key and the HTTP response handling to implement one of
these behaviors (reference s_api_key, ESP_LOGW, TAG, and api_base to locate the
relevant checks and response handling).
- Around line 481-487: The loop that reads into tmp using proxy_conn_read and
calls resp_buf_append(rb, tmp, n) must treat a non-ESP_OK return from
resp_buf_append as a fatal OOM/error and propagate it immediately instead of
breaking and continuing parsing; update the read loop in llm_proxy.c to check
resp_buf_append's return, abort reading on failure, free/cleanup as needed and
return/propagate an appropriate error code (e.g., ESP_ERR_NO_MEM or the
function's error convention) so callers do not see a truncated response; locate
the while loop that uses proxy_conn_read, the resp_buf_append(rb, tmp, n) call,
and ensure the function returns the error up the stack rather than silently
breaking.

---

Outside diff comments:
In `@main/llm/llm_proxy.c`:
- Around line 960-970: The llm_set_model function is persisting model strings
longer than the in-RAM buffers can hold (safe_copy -> s_model and the boot-time
model_tmp[64] read), causing silent fallbacks; before calling nvs_set_str you
should validate the model length and either reject/log an error for strings
longer than the in-memory buffer (e.g., sizeof(s_model) or defined
MAX_MODEL_LEN) or increase the model buffer sizes and associated read buffer
(model_tmp[64]) to accommodate protocol-prefixed identifiers; update
llm_set_model to check strlen(model) against the chosen limit and return an
error (and do not call nvs_set_str) if it exceeds the limit, ensuring NVS and
in-memory sizes remain consistent.

---

Duplicate comments:
In `@main/llm/llm_proxy.c`:
- Around line 330-358: llm_recompute_effective_config currently mutates shared
globals (s_protocol, s_model_id, s_api_url, etc.) piecemeal allowing
llm_chat_tools to observe an inconsistent config; protect the recompute and
readers by either (A) adding a mutex (e.g., llm_config_lock) and acquiring it
for the entire body of llm_recompute_effective_config and around reads in
llm_chat_tools/other request paths, or (B) build a new immutable config struct
(e.g., struct llm_config { protocol, model_id, api_url, headers... }) locally
inside llm_recompute_effective_config, fully populate it using
llm_parse_api_base and llm_build_request_targets equivalents, then atomically
swap a pointer to the new struct (and free the old) so readers of the global
pointer see either the old or the new config atomically; apply the same
synchronization choice consistently to functions that read s_protocol,
s_model_id, s_api_url (and any llm_*_targets) to eliminate mixed-state races.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7f766fea-0e31-4595-8648-3befbe41c9ad

📥 Commits

Reviewing files that changed from the base of the PR and between 1c92c08 and c5cb7d1.

📒 Files selected for processing (1)
  • main/llm/llm_proxy.c

@IRONICBo
Copy link
Copy Markdown
Member

Cool, I will test it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants