feat: Add Ollama API compatibility layer to proxy by mmogr · Pull Request #168 · mmogr/gglib

mmogr · 2026-02-09T14:00:45Z

Overview

This PR implements Ollama API compatibility for the gglib proxy, making it a drop-in replacement for Ollama. Both OpenAI (/v1/*) and Ollama (/api/*) endpoints are served simultaneously on the same port.

Changes

New Files (446 lines)

ollama_models.rs (~520 lines): Data types, normalization, chrono-based timestamps, 10 unit tests
ollama_stream.rs (~250 lines): SSE↔NDJSON streaming adapter using futures_util
ollama_handlers.rs (~620 lines): All 13 Ollama route handlers with proper format translation

Modified Files

lib.rs: Export 3 new modules
server.rs: Merge Ollama routes, unify state struct, apply model name normalization
Cargo.toml: Add chrono = { workspace = true }
Cargo.lock: Updated for new dependency

API Endpoints Now Supported

✅ GET / — Root probe ("Ollama is running")
✅ GET /api/version — Version info
✅ GET /api/tags — List models
✅ POST /api/show — Model metadata
✅ GET /api/ps — Running models
✅ POST /api/chat — Chat completions (streaming + non-streaming)
✅ POST /api/generate — Text generation (streaming + non-streaming)
✅ POST /api/embed — Embeddings
✅ POST /api/embeddings — Legacy single-embedding
✅ POST /api/pull, DELETE /api/delete, POST /api/copy, POST /api/create — Stubs

Format Translation

Client (Ollama NDJSON) → Internal (OpenAI JSON) → llama-server → Internal (OpenAI JSON) → Client (Ollama NDJSON)
Streaming: Server-Sent Events (SSE) ↔ Newline-Delimited JSON (NDJSON)
Model names: Strips :latest suffix automatically (e.g., phi3:latest → phi3)

Code Quality

Single ProxyState struct eliminates state duplication
Extracted helpers (apply_openai_options, parse_upstream_completion) reduce duplication
164 existing tests all pass (0 failures)
10 new unit tests for normalization, parsing, and timestamps

Usage

# Start proxy on Ollama's default port
gglib proxy --port 11434

# Now works with any Ollama-expecting app
curl http://localhost:11434/api/tags
curl -X POST http://localhost:11434/api/chat -d '{"model":"phi3","messages":[...]}'

Testing

cargo test --package gglib-proxy        # All 12 tests pass
cargo test --package gglib-runtime      # All 67 tests pass
cargo test --package gglib-cli          # All 34 tests pass
cargo test --package gglib-axum         # All 49 tests pass
# Total: 164 tests passing, 0 failures

Closes

#167

Breaking Changes

None. Both API surfaces coexist seamlessly without configuration."

- Implement OllamaChatRequest/Response with streaming support - Implement OllamaGenerateRequest/Response for text generation - Implement OllamaEmbeddingRequest/Response for embeddings - Add legacy /api/embeddings endpoint support - Implement normalize_model_name() to strip :latest suffix - Add chrono-based timestamp helpers for RFC3339 format - Include comprehensive unit tests (10 test cases)

- Implement sse_to_ndjson() converter for real-time streaming translation - Support Chat and Generate stream kinds with proper chunking - Use futures_util::stream::unfold for efficient state machine - Emit content chunks with done=false during streaming - Emit final chunk with timing statistics when done - Handle upstream stream errors gracefully

- Implement ProxyState struct for shared state (client, runtime, catalog) - Add GET / endpoint (Ollama root probe) - Add GET /api/version endpoint - Add GET /api/tags endpoint listing all models - Add POST /api/show endpoint for model metadata - Add GET /api/ps endpoint for running models - Add POST /api/chat handler with streaming and non-streaming modes - Add POST /api/generate handler with streaming support - Add POST /api/embed and POST /api/embeddings endpoints - Add management endpoint stubs (/pull, /delete, /copy, /create) - Implement model translation from Ollama to OpenAI format - Add error handling with Ollama-format error responses - Extract apply_openai_options() and parse_upstream_completion() helpers

- Add chrono = { workspace = true } to gglib-proxy dependencies - Enables RFC3339 timestamp generation for Ollama API responses

- Export three new modules: ollama_models, ollama_handlers, ollama_stream - Unify AppState and OllamaState into single ProxyState struct - Register 13 Ollama routes via Router::merge() alongside OpenAI routes - Both /v1/* (OpenAI) and /api/* (Ollama) served simultaneously - Apply model name normalization to both API surfaces - Update server startup logs to show both API endpoints - Maintain backward compatibility with OpenAI-only clients

- Update overview to mention dual OpenAI + Ollama API support - Add new module architecture diagram showing both API surfaces - Document all 13 new Ollama endpoints in API table - Add module descriptions for ollama_models.rs, ollama_handlers.rs, ollama_stream.rs - Module table automatically updated with 3 new modules (LOC/complexity/coverage badges) - Maintains comprehensive documentation strategy with GitHub CI integration

- Add inference.rs entry to domain module table - Reorder entries in ports module table (alphabetical consistency) - Generated by scripts/generate_module_tables.sh

…client compatibility - Change version format from 'gglib-0.3.3' to '0.3.3' - Fixes VSCode Ollama extension validation ('Unable to verify ollama server version') - Ollama clients expect semantic versioning format without prefix

…ments - VSCode Ollama extension requires version >= 0.6.4 - Changed from returning gglib version (0.3.3) to Ollama-compatible 0.6.4 - Satisfies minimum version requirement while maintaining stable version claim

…kward compatibility for 'name'

The normalize_model_name helper (which strips ':latest') was applied to the OpenAI /v1/chat/completions endpoint, breaking clients that use version-tagged model names. Scope it to Ollama handlers only.

- Handle negative num_predict correctly: -1 (unlimited) omits max_tokens, -2 (fill context) logs a warning, 0/negative omitted - Forward top_k, seed, repeat_penalty to llama-server - Map Ollama format:'json' to OpenAI response_format:{type:'json_object'} - Warn when messages contain unsupported images field - Request stream_options:{include_usage:true} for accurate token counts - Add doc comments for synthetic timing approximations - Remove redundant content-type header (.json() sets it) - Collapse nested if (clippy::collapsible_if)

Parse prompt_tokens and completion_tokens from the OpenAI usage chunk (emitted when stream_options:{include_usage:true} is set). Falls back to the previous chunk-counting heuristic when usage data is not available. Also passes prompt_eval_count through to the final Ollama NDJSON chunk instead of hardcoding 0.

- Add synthetic_digest() helper that hashes (name, id) into a deterministic 'gglib-' prefixed hex string, used by both /api/tags and /api/ps for consistency - Remove dead OpenAiEmbeddingRequest struct (never referenced) - Strip redundant skip_serializing_if from Deserialize-only types - Add unit tests for digest determinism and uniqueness

mmogr added 6 commits February 9, 2026 23:59

build(proxy): Add chrono dependency for timestamp handling

b456c2d

- Add chrono = { workspace = true } to gglib-proxy dependencies - Enables RFC3339 timestamp generation for Ollama API responses

chore: Update Cargo.lock for chrono dependency

24ca573

mmogr added type: feature New functionality or enhancement arch: integration External service integration arch: ports-adapters Touches adapter/port boundaries component: proxy OpenAI-compatible proxy enhancement New feature or request labels Feb 9, 2026

mmogr added 9 commits February 10, 2026 00:07

docs(core): Update module tables from automated generation

ab64406

- Add inference.rs entry to domain module table - Reorder entries in ports module table (alphabetical consistency) - Generated by scripts/generate_module_tables.sh

fix(proxy): Return Ollama-compatible version 0.6.4 for client require…

298b749

…ments - VSCode Ollama extension requires version >= 0.6.4 - Changed from returning gglib version (0.3.3) to Ollama-compatible 0.6.4 - Satisfies minimum version requirement while maintaining stable version claim

fix(proxy): Update OllamaShowRequest to use 'model' field and add bac…

a1fb10a

…kward compatibility for 'name'

fix: remove normalize_model_name from OpenAI chat path

8417f1b

The normalize_model_name helper (which strips ':latest') was applied to the OpenAI /v1/chat/completions endpoint, breaking clients that use version-tagged model names. Scope it to Ollama handlers only.

mmogr added priority: medium Should be done soon size: l 1-3 days type: test Testing improvements status: needs-review Waiting for PR review integration: llama.cpp llama.cpp binary/integration issues labels Mar 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Ollama API compatibility layer to proxy#168

feat: Add Ollama API compatibility layer to proxy#168
mmogr wants to merge 15 commits intomainfrom
feat/ollama-api-compatibility

mmogr commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mmogr commented Feb 9, 2026

Overview

Changes

New Files (446 lines)

Modified Files

API Endpoints Now Supported

Format Translation

Code Quality

Usage

Testing

Closes

Breaking Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant