Skip to content

feat(llm): support OpenAI-compatible endpoints alongside ACP agents #37

@jmagar

Description

@jmagar

Summary

Axon currently has two separate LLM integration paths that don't talk to each other:

  1. ACP agents (crates/services/acp/) — subprocess-based bridge to Claude Code, Codex, and Gemini CLI. Full agent capabilities (tool calls, sessions, permissions) but locked to those three CLI tools.
  2. OpenAI-compatible HTTP (crates/vector/ops/commands/streaming.rs) — raw POST /chat/completions used exclusively by axon ask, axon extract, and axon suggest. One hardcoded endpoint (OPENAI_BASE_URL + OPENAI_MODEL), no sessions, no tool calls.

Neither path is a superset of the other. The result: Axon can't use Ollama, LM Studio, vLLM, OpenRouter, Groq, Mistral API, or any other OpenAI-compatible service as an interactive chat agent — only as a dumb completion backend for ask/extract. First-class OpenAI-compatible endpoint support would unlock every self-hosted and cloud LLM behind a standard API.

What "First-Class" Means

An OpenAI-compatible endpoint should be available as a selectable agent in the Reboot shell — same UX as Claude/Codex/Gemini — with:

  • Chat sessions (multi-turn with history)
  • Streaming responses displayed progressively
  • Tool call support via the OpenAI function-calling protocol (tools array in the request)
  • Model selection from a dynamic list (fetched from GET /models)
  • Named configurations: ollama/llama3.2, openrouter/claude-3-5-sonnet, groq/llama-3.1-70b, etc.

Changes Needed

1. Multi-endpoint config

Replace the single OPENAI_BASE_URL + OPENAI_MODEL with a named endpoint registry:

# axon.toml
[[llm.endpoints]]
name    = "ollama-local"
url     = "http://localhost:11434/v1"
model   = "llama3.2"
api_key = ""            # empty = no auth

[[llm.endpoints]]
name    = "openrouter"
url     = "https://openrouter.ai/api/v1"
model   = "anthropic/claude-3-5-sonnet"
api_key = "sk-or-..."   # from env: LLM_OPENROUTER_API_KEY

[[llm.endpoints]]
name    = "groq"
url     = "https://api.groq.com/openai/v1"
model   = "llama-3.1-70b-versatile"
api_key = ""            # from env: LLM_GROQ_API_KEY
  • OPENAI_BASE_URL / OPENAI_MODEL / OPENAI_API_KEY kept as single-endpoint fallback (backwards compat)
  • Each endpoint's API key sourced from env: LLM_<NAME>_API_KEY or inline (dev only)
  • Default endpoint used by axon ask / axon extract when no --endpoint specified

2. OpenAI agent in the ACP session model

Add OpenAI as a first-class agent type alongside Claude, Codex, Gemini:

pub enum AcpAgent {
    Claude,
    Codex,
    Gemini,
    OpenAI { endpoint_name: String },   // NEW
}

The OpenAI agent uses the HTTP chat completions API directly (no subprocess) — same streaming.rs infrastructure, wrapped in the ACP session lifecycle:

  • Session create → start a new conversation (empty history)
  • Message send → POST /chat/completions with accumulated history
  • Stream → SSE/chunked response forwarded to the WS bridge
  • Tool calls → parse tool_calls from response, execute via existing ACP tool dispatcher, append results and continue

3. Tool call support via OpenAI function-calling protocol

OpenAI-compatible tool calls use:

{
  "tools": [{ "type": "function", "function": { "name": "...", "parameters": {...} } }],
  "tool_choice": "auto"
}

Map Axon's existing ACP tool definitions to OpenAI function schemas. The execution path (running the actual tool) is already implemented — just needs the protocol adapter layer.

4. Model list from /models

GET {endpoint_url}/models
→ [{ "id": "llama3.2", ... }, ...]
  • Fetch and cache available models for each configured endpoint
  • Expose via GET /api/llm/endpoints and GET /api/llm/endpoints/:name/models
  • Surface in the Reboot shell model picker alongside Claude/Codex/Gemini models

5. --endpoint flag on axon ask / axon extract

axon ask "explain this code" --endpoint ollama-local
axon ask "translate this" --endpoint openrouter --model mistral/mistral-large
axon extract https://example.com --endpoint groq

6. Reboot UI — OpenAI agent in session rail

  • OpenAI-compatible endpoints appear in the agent selector alongside Claude/Codex/Gemini
  • Custom icon/badge per endpoint (generic robot icon + endpoint name label)
  • Model picker shows models fetched from the endpoint's /models
  • Sessions with OpenAI agents behave identically to ACP sessions from the UI's perspective

Supported Endpoints (to test against)

Service Base URL Notes
Ollama http://localhost:11434/v1 Self-hosted, already in our stack
LM Studio http://localhost:1234/v1 Self-hosted
vLLM http://localhost:8000/v1 Self-hosted
OpenRouter https://openrouter.ai/api/v1 Cloud aggregator, 200+ models
Groq https://api.groq.com/openai/v1 Fast inference
Mistral API https://api.mistral.ai/v1 Mistral models
Together AI https://api.together.xyz/v1 Self-hosted and cloud

Files

File Action
crates/core/config/types/config.rs Replace single endpoint fields with Vec<LlmEndpointConfig>
crates/services/acp/ Add OpenAI agent variant; HTTP session implementation
crates/vector/ops/commands/streaming.rs Wire named endpoint lookup; add --endpoint flag
crates/web.rs / REST API GET /api/llm/endpoints, GET /api/llm/endpoints/:name/models
axon.toml.example Document [[llm.endpoints]] config
apps/web/components/reboot/ OpenAI endpoints in agent/model picker
docs/DEPLOYMENT.md Ollama + LM Studio + OpenRouter setup examples

Acceptance Criteria

  • [[llm.endpoints]] config supports multiple named OpenAI-compatible endpoints
  • OPENAI_BASE_URL / OPENAI_MODEL / OPENAI_API_KEY still work as single-endpoint fallback
  • axon ask --endpoint <name> routes to the specified endpoint
  • OpenAI-compatible agent available in Reboot shell session rail
  • Multi-turn chat sessions work (history accumulated per session)
  • Streaming responses display progressively in the UI
  • Tool calls via OpenAI function-calling protocol work end-to-end
  • Model list fetched from GET /models and shown in picker
  • Tested against Ollama (self-hosted, already in stack)
  • GET /api/llm/endpoints returns configured endpoints + connection status
  • cargo clippy clean, all tests pass

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions