Skip to content

feat: Add Ollama API compatibility layer to proxy #167

@mmogr

Description

@mmogr

Description

Add Ollama-native API support to the gglib proxy, making it a drop-in replacement for Ollama on port 11434.

Problem

Apps expecting an Ollama endpoint reject the gglib proxy even when configured on port 11434 because:

  • Proxy only served OpenAI /v1/* endpoints
  • Ollama clients hit /api/* endpoints (version, tags, chat, generate, embed, etc.) — all returned 404
  • Response formats were incompatible:
    • Ollama: NDJSON streaming + flat JSON responses
    • OpenAI: SSE streaming + nested choices array
  • Model names were strict (no :latest suffix handling)

Solution

Implement an Adapter pattern with simultaneous dual API surfaces:

  • /v1/* — OpenAI-compatible (existing, unchanged)
  • /api/* — Ollama-native (new)

Core Features

Ollama Endpoints Implemented:

  • GET / — Root probe (returns "Ollama is running")
  • GET /api/version — Version info
  • GET /api/tags — List models
  • POST /api/show — Model metadata
  • GET /api/ps — Running models
  • POST /api/chat — Chat completions (streaming + non-streaming)
  • POST /api/generate — Text generation (streaming + non-streaming)
  • POST /api/embed — Embeddings
  • POST /api/embeddings — Legacy single-embedding endpoint
  • Stubs for /api/pull, /api/delete, /api/copy, /api/create (redirect to CLI)

Format Translation:

  • Client request → OpenAI format → llama-server → OpenAI response → Ollama format
  • SSE ↔ NDJSON streaming adapter using futures_util::stream::unfold
  • Proper timestamp handling via chrono crate

Model Name Normalization:

  • Strips :latest suffix automatically (e.g., phi3:latestphi3)
  • Preserves other tags and variants

Code Quality:

  • Single unified ProxyState struct shared by both API surfaces
  • Extracted helper functions (apply_openai_options, parse_upstream_completion)
  • Comprehensive unit tests (10 new tests in ollama_models.rs)
  • All 164 existing tests still pass

Files Changed

  • New: crates/gglib-proxy/src/ollama_models.rs — Ollama data types + normalization
  • New: crates/gglib-proxy/src/ollama_handlers.rs — Route handlers + translation logic
  • New: crates/gglib-proxy/src/ollama_stream.rs — SSE→NDJSON streaming adapter
  • Modified: crates/gglib-proxy/src/server.rs — Route registration, state unification
  • Modified: crates/gglib-proxy/src/lib.rs — Module exports
  • Modified: crates/gglib-proxy/Cargo.toml — Added chrono dependency

Testing

  • ✅ All 12 proxy unit tests pass
  • ✅ All 2 proxy doc-tests pass
  • ✅ All 67 runtime tests pass
  • ✅ All 34 CLI tests pass
  • ✅ All 49 axum tests pass
  • Total: 164 tests passing, 0 failures

Usage

# Start proxy on port 11434 (Ollama default)
gglib proxy --port 11434

# Now compatible with any Ollama-expecting client:
curl http://localhost:11434/api/tags
curl -X POST http://localhost:11434/api/chat -d '{"model":"phi3","messages":[...]}'

Breaking Changes

None. Both API surfaces coexist without configuration or toggling.

Closes

TBD (link to related issues if applicable)

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions