feat: Add Ollama API compatibility layer to proxy

## Description

Add Ollama-native API support to the gglib proxy, making it a drop-in replacement for Ollama on port 11434.

## Problem

Apps expecting an Ollama endpoint reject the gglib proxy even when configured on port 11434 because:
- Proxy only served OpenAI `/v1/*` endpoints
- Ollama clients hit `/api/*` endpoints (version, tags, chat, generate, embed, etc.) — all returned 404
- Response formats were incompatible:
  - Ollama: NDJSON streaming + flat JSON responses
  - OpenAI: SSE streaming + nested choices array
- Model names were strict (no `:latest` suffix handling)

## Solution

Implement an Adapter pattern with simultaneous dual API surfaces:
- `/v1/*` — OpenAI-compatible (existing, unchanged)
- `/api/*` — Ollama-native (new)

### Core Features

✅ **Ollama Endpoints Implemented:**
- `GET /` — Root probe (returns "Ollama is running")
- `GET /api/version` — Version info
- `GET /api/tags` — List models
- `POST /api/show` — Model metadata
- `GET /api/ps` — Running models
- `POST /api/chat` — Chat completions (streaming + non-streaming)
- `POST /api/generate` — Text generation (streaming + non-streaming)
- `POST /api/embed` — Embeddings
- `POST /api/embeddings` — Legacy single-embedding endpoint
- Stubs for `/api/pull`, `/api/delete`, `/api/copy`, `/api/create` (redirect to CLI)

✅ **Format Translation:**
- Client request → OpenAI format → llama-server → OpenAI response → Ollama format
- SSE ↔ NDJSON streaming adapter using `futures_util::stream::unfold`
- Proper timestamp handling via `chrono` crate

✅ **Model Name Normalization:**
- Strips `:latest` suffix automatically (e.g., `phi3:latest` → `phi3`)
- Preserves other tags and variants

✅ **Code Quality:**
- Single unified `ProxyState` struct shared by both API surfaces
- Extracted helper functions (`apply_openai_options`, `parse_upstream_completion`)
- Comprehensive unit tests (10 new tests in `ollama_models.rs`)
- All 164 existing tests still pass

## Files Changed

- **New:** `crates/gglib-proxy/src/ollama_models.rs` — Ollama data types + normalization
- **New:** `crates/gglib-proxy/src/ollama_handlers.rs` — Route handlers + translation logic
- **New:** `crates/gglib-proxy/src/ollama_stream.rs` — SSE→NDJSON streaming adapter
- **Modified:** `crates/gglib-proxy/src/server.rs` — Route registration, state unification
- **Modified:** `crates/gglib-proxy/src/lib.rs` — Module exports
- **Modified:** `crates/gglib-proxy/Cargo.toml` — Added `chrono` dependency

## Testing

- ✅ All 12 proxy unit tests pass
- ✅ All 2 proxy doc-tests pass
- ✅ All 67 runtime tests pass
- ✅ All 34 CLI tests pass
- ✅ All 49 axum tests pass
- **Total: 164 tests passing, 0 failures**

## Usage

```bash
# Start proxy on port 11434 (Ollama default)
gglib proxy --port 11434

# Now compatible with any Ollama-expecting client:
curl http://localhost:11434/api/tags
curl -X POST http://localhost:11434/api/chat -d '{"model":"phi3","messages":[...]}'
```

## Breaking Changes

None. Both API surfaces coexist without configuration or toggling.

## Closes

TBD (link to related issues if applicable)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Ollama API compatibility layer to proxy #167

Description

Problem

Solution

Core Features

Files Changed

Testing

Usage

Breaking Changes

Closes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

feat: Add Ollama API compatibility layer to proxy #167

Description

Description

Problem

Solution

Core Features

Files Changed

Testing

Usage

Breaking Changes

Closes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions