Skip to content

feat: Add Ollama API compatibility layer to proxy#168

Open
mmogr wants to merge 15 commits intomainfrom
feat/ollama-api-compatibility
Open

feat: Add Ollama API compatibility layer to proxy#168
mmogr wants to merge 15 commits intomainfrom
feat/ollama-api-compatibility

Conversation

@mmogr
Copy link
Owner

@mmogr mmogr commented Feb 9, 2026

Overview

This PR implements Ollama API compatibility for the gglib proxy, making it a drop-in replacement for Ollama. Both OpenAI (/v1/*) and Ollama (/api/*) endpoints are served simultaneously on the same port.

Changes

New Files (446 lines)

  • ollama_models.rs (~520 lines): Data types, normalization, chrono-based timestamps, 10 unit tests
  • ollama_stream.rs (~250 lines): SSE↔NDJSON streaming adapter using futures_util
  • ollama_handlers.rs (~620 lines): All 13 Ollama route handlers with proper format translation

Modified Files

  • lib.rs: Export 3 new modules
  • server.rs: Merge Ollama routes, unify state struct, apply model name normalization
  • Cargo.toml: Add chrono = { workspace = true }
  • Cargo.lock: Updated for new dependency

API Endpoints Now Supported

GET / — Root probe ("Ollama is running")
GET /api/version — Version info
GET /api/tags — List models
POST /api/show — Model metadata
GET /api/ps — Running models
POST /api/chat — Chat completions (streaming + non-streaming)
POST /api/generate — Text generation (streaming + non-streaming)
POST /api/embed — Embeddings
POST /api/embeddings — Legacy single-embedding
POST /api/pull, DELETE /api/delete, POST /api/copy, POST /api/create — Stubs

Format Translation

  • Client (Ollama NDJSON) → Internal (OpenAI JSON) → llama-server → Internal (OpenAI JSON) → Client (Ollama NDJSON)
  • Streaming: Server-Sent Events (SSE) ↔ Newline-Delimited JSON (NDJSON)
  • Model names: Strips :latest suffix automatically (e.g., phi3:latestphi3)

Code Quality

  • Single ProxyState struct eliminates state duplication
  • Extracted helpers (apply_openai_options, parse_upstream_completion) reduce duplication
  • 164 existing tests all pass (0 failures)
  • 10 new unit tests for normalization, parsing, and timestamps

Usage

# Start proxy on Ollama's default port
gglib proxy --port 11434

# Now works with any Ollama-expecting app
curl http://localhost:11434/api/tags
curl -X POST http://localhost:11434/api/chat -d '{"model":"phi3","messages":[...]}'

Testing

cargo test --package gglib-proxy        # All 12 tests pass
cargo test --package gglib-runtime      # All 67 tests pass
cargo test --package gglib-cli          # All 34 tests pass
cargo test --package gglib-axum         # All 49 tests pass
# Total: 164 tests passing, 0 failures

Closes

#167

Breaking Changes

None. Both API surfaces coexist seamlessly without configuration."

mmogr added 6 commits February 9, 2026 23:59
- Implement OllamaChatRequest/Response with streaming support
- Implement OllamaGenerateRequest/Response for text generation
- Implement OllamaEmbeddingRequest/Response for embeddings
- Add legacy /api/embeddings endpoint support
- Implement normalize_model_name() to strip :latest suffix
- Add chrono-based timestamp helpers for RFC3339 format
- Include comprehensive unit tests (10 test cases)
- Implement sse_to_ndjson() converter for real-time streaming translation
- Support Chat and Generate stream kinds with proper chunking
- Use futures_util::stream::unfold for efficient state machine
- Emit content chunks with done=false during streaming
- Emit final chunk with timing statistics when done
- Handle upstream stream errors gracefully
- Implement ProxyState struct for shared state (client, runtime, catalog)
- Add GET / endpoint (Ollama root probe)
- Add GET /api/version endpoint
- Add GET /api/tags endpoint listing all models
- Add POST /api/show endpoint for model metadata
- Add GET /api/ps endpoint for running models
- Add POST /api/chat handler with streaming and non-streaming modes
- Add POST /api/generate handler with streaming support
- Add POST /api/embed and POST /api/embeddings endpoints
- Add management endpoint stubs (/pull, /delete, /copy, /create)
- Implement model translation from Ollama to OpenAI format
- Add error handling with Ollama-format error responses
- Extract apply_openai_options() and parse_upstream_completion() helpers
- Add chrono = { workspace = true } to gglib-proxy dependencies
- Enables RFC3339 timestamp generation for Ollama API responses
- Export three new modules: ollama_models, ollama_handlers, ollama_stream
- Unify AppState and OllamaState into single ProxyState struct
- Register 13 Ollama routes via Router::merge() alongside OpenAI routes
- Both /v1/* (OpenAI) and /api/* (Ollama) served simultaneously
- Apply model name normalization to both API surfaces
- Update server startup logs to show both API endpoints
- Maintain backward compatibility with OpenAI-only clients
@mmogr mmogr added type: feature New functionality or enhancement arch: integration External service integration arch: ports-adapters Touches adapter/port boundaries component: proxy OpenAI-compatible proxy enhancement New feature or request labels Feb 9, 2026
- Update overview to mention dual OpenAI + Ollama API support
- Add new module architecture diagram showing both API surfaces
- Document all 13 new Ollama endpoints in API table
- Add module descriptions for ollama_models.rs, ollama_handlers.rs, ollama_stream.rs
- Module table automatically updated with 3 new modules (LOC/complexity/coverage badges)
- Maintains comprehensive documentation strategy with GitHub CI integration
- Add inference.rs entry to domain module table
- Reorder entries in ports module table (alphabetical consistency)
- Generated by scripts/generate_module_tables.sh
…client compatibility

- Change version format from 'gglib-0.3.3' to '0.3.3'
- Fixes VSCode Ollama extension validation ('Unable to verify ollama server version')
- Ollama clients expect semantic versioning format without prefix
…ments

- VSCode Ollama extension requires version >= 0.6.4
- Changed from returning gglib version (0.3.3) to Ollama-compatible 0.6.4
- Satisfies minimum version requirement while maintaining stable version claim
The normalize_model_name helper (which strips ':latest') was applied
to the OpenAI /v1/chat/completions endpoint, breaking clients that
use version-tagged model names. Scope it to Ollama handlers only.
- Handle negative num_predict correctly: -1 (unlimited) omits
  max_tokens, -2 (fill context) logs a warning, 0/negative omitted
- Forward top_k, seed, repeat_penalty to llama-server
- Map Ollama format:'json' to OpenAI response_format:{type:'json_object'}
- Warn when messages contain unsupported images field
- Request stream_options:{include_usage:true} for accurate token counts
- Add doc comments for synthetic timing approximations
- Remove redundant content-type header (.json() sets it)
- Collapse nested if (clippy::collapsible_if)
Parse prompt_tokens and completion_tokens from the OpenAI usage
chunk (emitted when stream_options:{include_usage:true} is set).
Falls back to the previous chunk-counting heuristic when usage
data is not available. Also passes prompt_eval_count through to
the final Ollama NDJSON chunk instead of hardcoding 0.
- Add synthetic_digest() helper that hashes (name, id) into a
  deterministic 'gglib-' prefixed hex string, used by both
  /api/tags and /api/ps for consistency
- Remove dead OpenAiEmbeddingRequest struct (never referenced)
- Strip redundant skip_serializing_if from Deserialize-only types
- Add unit tests for digest determinism and uniqueness
@mmogr mmogr added priority: medium Should be done soon size: l 1-3 days type: test Testing improvements status: needs-review Waiting for PR review integration: llama.cpp llama.cpp binary/integration issues labels Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arch: integration External service integration arch: ports-adapters Touches adapter/port boundaries component: proxy OpenAI-compatible proxy enhancement New feature or request integration: llama.cpp llama.cpp binary/integration issues priority: medium Should be done soon size: l 1-3 days status: needs-review Waiting for PR review type: feature New functionality or enhancement type: test Testing improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant