feat: Add Ollama API compatibility layer to proxy#168
Open
Conversation
- Implement OllamaChatRequest/Response with streaming support - Implement OllamaGenerateRequest/Response for text generation - Implement OllamaEmbeddingRequest/Response for embeddings - Add legacy /api/embeddings endpoint support - Implement normalize_model_name() to strip :latest suffix - Add chrono-based timestamp helpers for RFC3339 format - Include comprehensive unit tests (10 test cases)
- Implement sse_to_ndjson() converter for real-time streaming translation - Support Chat and Generate stream kinds with proper chunking - Use futures_util::stream::unfold for efficient state machine - Emit content chunks with done=false during streaming - Emit final chunk with timing statistics when done - Handle upstream stream errors gracefully
- Implement ProxyState struct for shared state (client, runtime, catalog) - Add GET / endpoint (Ollama root probe) - Add GET /api/version endpoint - Add GET /api/tags endpoint listing all models - Add POST /api/show endpoint for model metadata - Add GET /api/ps endpoint for running models - Add POST /api/chat handler with streaming and non-streaming modes - Add POST /api/generate handler with streaming support - Add POST /api/embed and POST /api/embeddings endpoints - Add management endpoint stubs (/pull, /delete, /copy, /create) - Implement model translation from Ollama to OpenAI format - Add error handling with Ollama-format error responses - Extract apply_openai_options() and parse_upstream_completion() helpers
- Add chrono = { workspace = true } to gglib-proxy dependencies
- Enables RFC3339 timestamp generation for Ollama API responses
- Export three new modules: ollama_models, ollama_handlers, ollama_stream - Unify AppState and OllamaState into single ProxyState struct - Register 13 Ollama routes via Router::merge() alongside OpenAI routes - Both /v1/* (OpenAI) and /api/* (Ollama) served simultaneously - Apply model name normalization to both API surfaces - Update server startup logs to show both API endpoints - Maintain backward compatibility with OpenAI-only clients
- Update overview to mention dual OpenAI + Ollama API support - Add new module architecture diagram showing both API surfaces - Document all 13 new Ollama endpoints in API table - Add module descriptions for ollama_models.rs, ollama_handlers.rs, ollama_stream.rs - Module table automatically updated with 3 new modules (LOC/complexity/coverage badges) - Maintains comprehensive documentation strategy with GitHub CI integration
- Add inference.rs entry to domain module table - Reorder entries in ports module table (alphabetical consistency) - Generated by scripts/generate_module_tables.sh
…client compatibility
- Change version format from 'gglib-0.3.3' to '0.3.3'
- Fixes VSCode Ollama extension validation ('Unable to verify ollama server version')
- Ollama clients expect semantic versioning format without prefix
…ments - VSCode Ollama extension requires version >= 0.6.4 - Changed from returning gglib version (0.3.3) to Ollama-compatible 0.6.4 - Satisfies minimum version requirement while maintaining stable version claim
…kward compatibility for 'name'
The normalize_model_name helper (which strips ':latest') was applied to the OpenAI /v1/chat/completions endpoint, breaking clients that use version-tagged model names. Scope it to Ollama handlers only.
- Handle negative num_predict correctly: -1 (unlimited) omits
max_tokens, -2 (fill context) logs a warning, 0/negative omitted
- Forward top_k, seed, repeat_penalty to llama-server
- Map Ollama format:'json' to OpenAI response_format:{type:'json_object'}
- Warn when messages contain unsupported images field
- Request stream_options:{include_usage:true} for accurate token counts
- Add doc comments for synthetic timing approximations
- Remove redundant content-type header (.json() sets it)
- Collapse nested if (clippy::collapsible_if)
Parse prompt_tokens and completion_tokens from the OpenAI usage
chunk (emitted when stream_options:{include_usage:true} is set).
Falls back to the previous chunk-counting heuristic when usage
data is not available. Also passes prompt_eval_count through to
the final Ollama NDJSON chunk instead of hardcoding 0.
- Add synthetic_digest() helper that hashes (name, id) into a deterministic 'gglib-' prefixed hex string, used by both /api/tags and /api/ps for consistency - Remove dead OpenAiEmbeddingRequest struct (never referenced) - Strip redundant skip_serializing_if from Deserialize-only types - Add unit tests for digest determinism and uniqueness
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR implements Ollama API compatibility for the gglib proxy, making it a drop-in replacement for Ollama. Both OpenAI (
/v1/*) and Ollama (/api/*) endpoints are served simultaneously on the same port.Changes
New Files (446 lines)
Modified Files
chrono = { workspace = true }API Endpoints Now Supported
✅
GET /— Root probe ("Ollama is running")✅
GET /api/version— Version info✅
GET /api/tags— List models✅
POST /api/show— Model metadata✅
GET /api/ps— Running models✅
POST /api/chat— Chat completions (streaming + non-streaming)✅
POST /api/generate— Text generation (streaming + non-streaming)✅
POST /api/embed— Embeddings✅
POST /api/embeddings— Legacy single-embedding✅
POST /api/pull,DELETE /api/delete,POST /api/copy,POST /api/create— StubsFormat Translation
:latestsuffix automatically (e.g.,phi3:latest→phi3)Code Quality
ProxyStatestruct eliminates state duplicationapply_openai_options,parse_upstream_completion) reduce duplicationUsage
Testing
Closes
#167
Breaking Changes
None. Both API surfaces coexist seamlessly without configuration."