-
Notifications
You must be signed in to change notification settings - Fork 0
feat(llm): support OpenAI-compatible endpoints alongside ACP agents #37
Description
Summary
Axon currently has two separate LLM integration paths that don't talk to each other:
- ACP agents (
crates/services/acp/) — subprocess-based bridge to Claude Code, Codex, and Gemini CLI. Full agent capabilities (tool calls, sessions, permissions) but locked to those three CLI tools. - OpenAI-compatible HTTP (
crates/vector/ops/commands/streaming.rs) — rawPOST /chat/completionsused exclusively byaxon ask,axon extract, andaxon suggest. One hardcoded endpoint (OPENAI_BASE_URL+OPENAI_MODEL), no sessions, no tool calls.
Neither path is a superset of the other. The result: Axon can't use Ollama, LM Studio, vLLM, OpenRouter, Groq, Mistral API, or any other OpenAI-compatible service as an interactive chat agent — only as a dumb completion backend for ask/extract. First-class OpenAI-compatible endpoint support would unlock every self-hosted and cloud LLM behind a standard API.
What "First-Class" Means
An OpenAI-compatible endpoint should be available as a selectable agent in the Reboot shell — same UX as Claude/Codex/Gemini — with:
- Chat sessions (multi-turn with history)
- Streaming responses displayed progressively
- Tool call support via the OpenAI function-calling protocol (
toolsarray in the request) - Model selection from a dynamic list (fetched from
GET /models) - Named configurations:
ollama/llama3.2,openrouter/claude-3-5-sonnet,groq/llama-3.1-70b, etc.
Changes Needed
1. Multi-endpoint config
Replace the single OPENAI_BASE_URL + OPENAI_MODEL with a named endpoint registry:
# axon.toml
[[llm.endpoints]]
name = "ollama-local"
url = "http://localhost:11434/v1"
model = "llama3.2"
api_key = "" # empty = no auth
[[llm.endpoints]]
name = "openrouter"
url = "https://openrouter.ai/api/v1"
model = "anthropic/claude-3-5-sonnet"
api_key = "sk-or-..." # from env: LLM_OPENROUTER_API_KEY
[[llm.endpoints]]
name = "groq"
url = "https://api.groq.com/openai/v1"
model = "llama-3.1-70b-versatile"
api_key = "" # from env: LLM_GROQ_API_KEYOPENAI_BASE_URL/OPENAI_MODEL/OPENAI_API_KEYkept as single-endpoint fallback (backwards compat)- Each endpoint's API key sourced from env:
LLM_<NAME>_API_KEYor inline (dev only) - Default endpoint used by
axon ask/axon extractwhen no--endpointspecified
2. OpenAI agent in the ACP session model
Add OpenAI as a first-class agent type alongside Claude, Codex, Gemini:
pub enum AcpAgent {
Claude,
Codex,
Gemini,
OpenAI { endpoint_name: String }, // NEW
}The OpenAI agent uses the HTTP chat completions API directly (no subprocess) — same streaming.rs infrastructure, wrapped in the ACP session lifecycle:
- Session create → start a new conversation (empty history)
- Message send →
POST /chat/completionswith accumulated history - Stream → SSE/chunked response forwarded to the WS bridge
- Tool calls → parse
tool_callsfrom response, execute via existing ACP tool dispatcher, append results and continue
3. Tool call support via OpenAI function-calling protocol
OpenAI-compatible tool calls use:
{
"tools": [{ "type": "function", "function": { "name": "...", "parameters": {...} } }],
"tool_choice": "auto"
}Map Axon's existing ACP tool definitions to OpenAI function schemas. The execution path (running the actual tool) is already implemented — just needs the protocol adapter layer.
4. Model list from /models
GET {endpoint_url}/models
→ [{ "id": "llama3.2", ... }, ...]- Fetch and cache available models for each configured endpoint
- Expose via
GET /api/llm/endpointsandGET /api/llm/endpoints/:name/models - Surface in the Reboot shell model picker alongside Claude/Codex/Gemini models
5. --endpoint flag on axon ask / axon extract
axon ask "explain this code" --endpoint ollama-local
axon ask "translate this" --endpoint openrouter --model mistral/mistral-large
axon extract https://example.com --endpoint groq6. Reboot UI — OpenAI agent in session rail
- OpenAI-compatible endpoints appear in the agent selector alongside Claude/Codex/Gemini
- Custom icon/badge per endpoint (generic robot icon + endpoint name label)
- Model picker shows models fetched from the endpoint's
/models - Sessions with OpenAI agents behave identically to ACP sessions from the UI's perspective
Supported Endpoints (to test against)
| Service | Base URL | Notes |
|---|---|---|
| Ollama | http://localhost:11434/v1 |
Self-hosted, already in our stack |
| LM Studio | http://localhost:1234/v1 |
Self-hosted |
| vLLM | http://localhost:8000/v1 |
Self-hosted |
| OpenRouter | https://openrouter.ai/api/v1 |
Cloud aggregator, 200+ models |
| Groq | https://api.groq.com/openai/v1 |
Fast inference |
| Mistral API | https://api.mistral.ai/v1 |
Mistral models |
| Together AI | https://api.together.xyz/v1 |
Self-hosted and cloud |
Files
| File | Action |
|---|---|
crates/core/config/types/config.rs |
Replace single endpoint fields with Vec<LlmEndpointConfig> |
crates/services/acp/ |
Add OpenAI agent variant; HTTP session implementation |
crates/vector/ops/commands/streaming.rs |
Wire named endpoint lookup; add --endpoint flag |
crates/web.rs / REST API |
GET /api/llm/endpoints, GET /api/llm/endpoints/:name/models |
axon.toml.example |
Document [[llm.endpoints]] config |
apps/web/components/reboot/ |
OpenAI endpoints in agent/model picker |
docs/DEPLOYMENT.md |
Ollama + LM Studio + OpenRouter setup examples |
Acceptance Criteria
-
[[llm.endpoints]]config supports multiple named OpenAI-compatible endpoints -
OPENAI_BASE_URL/OPENAI_MODEL/OPENAI_API_KEYstill work as single-endpoint fallback -
axon ask --endpoint <name>routes to the specified endpoint - OpenAI-compatible agent available in Reboot shell session rail
- Multi-turn chat sessions work (history accumulated per session)
- Streaming responses display progressively in the UI
- Tool calls via OpenAI function-calling protocol work end-to-end
- Model list fetched from
GET /modelsand shown in picker - Tested against Ollama (self-hosted, already in stack)
-
GET /api/llm/endpointsreturns configured endpoints + connection status -
cargo clippyclean, all tests pass