feat(llm): support OpenAI-compatible endpoints alongside ACP agents

## Summary

Axon currently has two separate LLM integration paths that don't talk to each other:

1. **ACP agents** (`crates/services/acp/`) — subprocess-based bridge to Claude Code, Codex, and Gemini CLI. Full agent capabilities (tool calls, sessions, permissions) but locked to those three CLI tools.
2. **OpenAI-compatible HTTP** (`crates/vector/ops/commands/streaming.rs`) — raw `POST /chat/completions` used exclusively by `axon ask`, `axon extract`, and `axon suggest`. One hardcoded endpoint (`OPENAI_BASE_URL` + `OPENAI_MODEL`), no sessions, no tool calls.

Neither path is a superset of the other. The result: Axon can't use Ollama, LM Studio, vLLM, OpenRouter, Groq, Mistral API, or any other OpenAI-compatible service as an interactive chat agent — only as a dumb completion backend for `ask`/`extract`. First-class OpenAI-compatible endpoint support would unlock every self-hosted and cloud LLM behind a standard API.

## What "First-Class" Means

An OpenAI-compatible endpoint should be available as a selectable agent in the Reboot shell — same UX as Claude/Codex/Gemini — with:
- Chat sessions (multi-turn with history)
- Streaming responses displayed progressively
- Tool call support via the OpenAI function-calling protocol (`tools` array in the request)
- Model selection from a dynamic list (fetched from `GET /models`)
- Named configurations: `ollama/llama3.2`, `openrouter/claude-3-5-sonnet`, `groq/llama-3.1-70b`, etc.

## Changes Needed

### 1. Multi-endpoint config

Replace the single `OPENAI_BASE_URL` + `OPENAI_MODEL` with a named endpoint registry:

```toml
# axon.toml
[[llm.endpoints]]
name    = "ollama-local"
url     = "http://localhost:11434/v1"
model   = "llama3.2"
api_key = ""            # empty = no auth

[[llm.endpoints]]
name    = "openrouter"
url     = "https://openrouter.ai/api/v1"
model   = "anthropic/claude-3-5-sonnet"
api_key = "sk-or-..."   # from env: LLM_OPENROUTER_API_KEY

[[llm.endpoints]]
name    = "groq"
url     = "https://api.groq.com/openai/v1"
model   = "llama-3.1-70b-versatile"
api_key = ""            # from env: LLM_GROQ_API_KEY
```

- `OPENAI_BASE_URL` / `OPENAI_MODEL` / `OPENAI_API_KEY` kept as single-endpoint fallback (backwards compat)
- Each endpoint's API key sourced from env: `LLM_<NAME>_API_KEY` or inline (dev only)
- Default endpoint used by `axon ask` / `axon extract` when no `--endpoint` specified

### 2. OpenAI agent in the ACP session model

Add `OpenAI` as a first-class agent type alongside `Claude`, `Codex`, `Gemini`:

```rust
pub enum AcpAgent {
    Claude,
    Codex,
    Gemini,
    OpenAI { endpoint_name: String },   // NEW
}
```

The OpenAI agent uses the HTTP chat completions API directly (no subprocess) — same streaming.rs infrastructure, wrapped in the ACP session lifecycle:
- Session create → start a new conversation (empty history)
- Message send → `POST /chat/completions` with accumulated history
- Stream → SSE/chunked response forwarded to the WS bridge
- Tool calls → parse `tool_calls` from response, execute via existing ACP tool dispatcher, append results and continue

### 3. Tool call support via OpenAI function-calling protocol

OpenAI-compatible tool calls use:
```json
{
  "tools": [{ "type": "function", "function": { "name": "...", "parameters": {...} } }],
  "tool_choice": "auto"
}
```

Map Axon's existing ACP tool definitions to OpenAI function schemas. The execution path (running the actual tool) is already implemented — just needs the protocol adapter layer.

### 4. Model list from `/models`

```rust
GET {endpoint_url}/models
→ [{ "id": "llama3.2", ... }, ...]
```

- Fetch and cache available models for each configured endpoint
- Expose via `GET /api/llm/endpoints` and `GET /api/llm/endpoints/:name/models`
- Surface in the Reboot shell model picker alongside Claude/Codex/Gemini models

### 5. `--endpoint` flag on `axon ask` / `axon extract`

```bash
axon ask "explain this code" --endpoint ollama-local
axon ask "translate this" --endpoint openrouter --model mistral/mistral-large
axon extract https://example.com --endpoint groq
```

### 6. Reboot UI — OpenAI agent in session rail

- OpenAI-compatible endpoints appear in the agent selector alongside Claude/Codex/Gemini
- Custom icon/badge per endpoint (generic robot icon + endpoint name label)
- Model picker shows models fetched from the endpoint's `/models`
- Sessions with OpenAI agents behave identically to ACP sessions from the UI's perspective

## Supported Endpoints (to test against)

| Service | Base URL | Notes |
|---------|---------|-------|
| Ollama | `http://localhost:11434/v1` | Self-hosted, already in our stack |
| LM Studio | `http://localhost:1234/v1` | Self-hosted |
| vLLM | `http://localhost:8000/v1` | Self-hosted |
| OpenRouter | `https://openrouter.ai/api/v1` | Cloud aggregator, 200+ models |
| Groq | `https://api.groq.com/openai/v1` | Fast inference |
| Mistral API | `https://api.mistral.ai/v1` | Mistral models |
| Together AI | `https://api.together.xyz/v1` | Self-hosted and cloud |

## Files

| File | Action |
|------|--------|
| `crates/core/config/types/config.rs` | Replace single endpoint fields with `Vec<LlmEndpointConfig>` |
| `crates/services/acp/` | Add `OpenAI` agent variant; HTTP session implementation |
| `crates/vector/ops/commands/streaming.rs` | Wire named endpoint lookup; add `--endpoint` flag |
| `crates/web.rs` / REST API | `GET /api/llm/endpoints`, `GET /api/llm/endpoints/:name/models` |
| `axon.toml.example` | Document `[[llm.endpoints]]` config |
| `apps/web/components/reboot/` | OpenAI endpoints in agent/model picker |
| `docs/DEPLOYMENT.md` | Ollama + LM Studio + OpenRouter setup examples |

## Acceptance Criteria

- [ ] `[[llm.endpoints]]` config supports multiple named OpenAI-compatible endpoints
- [ ] `OPENAI_BASE_URL` / `OPENAI_MODEL` / `OPENAI_API_KEY` still work as single-endpoint fallback
- [ ] `axon ask --endpoint <name>` routes to the specified endpoint
- [ ] OpenAI-compatible agent available in Reboot shell session rail
- [ ] Multi-turn chat sessions work (history accumulated per session)
- [ ] Streaming responses display progressively in the UI
- [ ] Tool calls via OpenAI function-calling protocol work end-to-end
- [ ] Model list fetched from `GET /models` and shown in picker
- [ ] Tested against Ollama (self-hosted, already in stack)
- [ ] `GET /api/llm/endpoints` returns configured endpoints + connection status
- [ ] `cargo clippy` clean, all tests pass

File	Action
`crates/core/config/types/config.rs`	Replace single endpoint fields with `Vec<LlmEndpointConfig>`
`crates/services/acp/`	Add `OpenAI` agent variant; HTTP session implementation
`crates/vector/ops/commands/streaming.rs`	Wire named endpoint lookup; add `--endpoint` flag
`crates/web.rs` / REST API	`GET /api/llm/endpoints`, `GET /api/llm/endpoints/:name/models`
`axon.toml.example`	Document `[[llm.endpoints]]` config
`apps/web/components/reboot/`	OpenAI endpoints in agent/model picker
`docs/DEPLOYMENT.md`	Ollama + LM Studio + OpenRouter setup examples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): support OpenAI-compatible endpoints alongside ACP agents #37

Summary

What "First-Class" Means

Changes Needed

1. Multi-endpoint config

2. OpenAI agent in the ACP session model

3. Tool call support via OpenAI function-calling protocol

4. Model list from `/models`

5. `--endpoint` flag on `axon ask` / `axon extract`

6. Reboot UI — OpenAI agent in session rail

Supported Endpoints (to test against)

Files

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Service	Base URL	Notes
Ollama	`http://localhost:11434/v1`	Self-hosted, already in our stack
LM Studio	`http://localhost:1234/v1`	Self-hosted
vLLM	`http://localhost:8000/v1`	Self-hosted
OpenRouter	`https://openrouter.ai/api/v1`	Cloud aggregator, 200+ models
Groq	`https://api.groq.com/openai/v1`	Fast inference
Mistral API	`https://api.mistral.ai/v1`	Mistral models
Together AI	`https://api.together.xyz/v1`	Self-hosted and cloud

feat(llm): support OpenAI-compatible endpoints alongside ACP agents #37

Description

Summary

What "First-Class" Means

Changes Needed

1. Multi-endpoint config

2. OpenAI agent in the ACP session model

3. Tool call support via OpenAI function-calling protocol

4. Model list from /models

5. --endpoint flag on axon ask / axon extract

6. Reboot UI — OpenAI agent in session rail

Supported Endpoints (to test against)

Files

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

4. Model list from `/models`

5. `--endpoint` flag on `axon ask` / `axon extract`