refactor: replace litellm with lightweight provider catalog + native SDK adapters by Nanguage · Pull Request #60 · aristoteleo/PantheonOS

Nanguage · 2026-04-03T22:01:33Z

Summary

Remove litellm dependency entirely and replace with a catalog-driven provider abstraction layer using native SDKs (openai, anthropic, google-genai)
14 providers, 80+ models supported via JSON catalog (llm_catalog.json)
Codex OAuth support for free ChatGPT backend-api access
Ollama auto-detected as local provider
Error propagation from backend to frontend via NATS

Architecture

llm_catalog.json (config) → provider_registry.py (metadata/cost)
                           → adapters/ (per-SDK: openai, anthropic, gemini, codex)
                           → stream_chunk_builder (unified streaming)

New files

File	Purpose
`pantheon/utils/llm_catalog.json`	Provider & model catalog (pricing, capabilities, token limits)
`pantheon/utils/provider_registry.py`	`get_model_info()`, `completion_cost()`, `token_counter()`, `models_by_provider()`
`pantheon/utils/adapters/openai_adapter.py`	OpenAI + 10 compatible providers (DeepSeek, Groq, Zhipu, MiniMax, Moonshot, Qwen, Mistral, Together, OpenRouter, Ollama)
`pantheon/utils/adapters/anthropic_adapter.py`	Anthropic native SDK (message format conversion, thinking support)
`pantheon/utils/adapters/gemini_adapter.py`	Google GenAI native SDK (thinking support)
`pantheon/utils/adapters/codex_adapter.py`	ChatGPT backend-api via OAuth (Responses API format)
`pantheon/utils/oauth/codex.py`	OAuth 2.0 + PKCE flow, token storage, auto-refresh, Codex CLI import
`tests/test_provider_adapters.py`	Integration tests for all providers (52 tests)

Key changes

Area	Change
Dependencies	`litellm` → `anthropic`, `google-genai`, `tiktoken`
Provider routing	`ProviderType.LITELLM` → `ProviderType.NATIVE`, catalog-based detection
Streaming	New `stream_chunk_builder()` with `reasoning_content` support
Message cleanup	Whitelist-based field sanitization (strict providers like Groq)
Tool call recovery	Partial response on server-side validation errors
Error propagation	Backend errors shown in frontend chat via NATS `chat_finished` event
Naming	All `litellm` references removed from variable names, comments, docs

Providers

Provider	SDK	Auth	Models
OpenAI	openai	API key	20 (incl. Responses API for pro/codex)
Anthropic	anthropic	API key	7
Gemini	google-genai	API key	9
DeepSeek, Zhipu, MiniMax, Moonshot, Qwen, Groq, Mistral, Together, OpenRouter	openai (compat)	API key	46
Codex	codex	OAuth 2.0	5 (free via ChatGPT Plus)
Ollama	openai (compat)	none	auto-detected from localhost:11434

Test plan

Unit tests: provider_registry, stream_chunk_builder (15 tests)
Integration tests: all 37 models across 7 providers with real API calls (52 tests total)
Responses API: gpt-5.4-pro, gpt-5.2-pro, gpt-5.2-codex
Thinking/reasoning: Anthropic, Gemini, DeepSeek, Zhipu, Groq
Tool calling: OpenAI, Anthropic
Codex OAuth: login, import, token refresh
Ollama: auto-detection, streaming, model listing
Error propagation: OAuth errors shown in frontend with action button
Chatroom UI: model selector, OAuth settings panel, Ollama config

🤖 Generated with Claude Code

…SDK adapters Remove litellm dependency entirely and replace with a catalog-driven provider abstraction layer using native SDKs (openai, anthropic, google-genai). New architecture: - llm_catalog.json: single source of truth for 12 providers, 80+ models (OpenAI, Anthropic, Gemini, DeepSeek, Zhipu, MiniMax, Moonshot, Qwen, Groq, Mistral, Together AI, OpenRouter) - provider_registry.py: catalog loader + get_model_info(), completion_cost(), token_counter(), models_by_provider() - adapters/: per-SDK adapters (openai, anthropic, gemini) with unified interface - OpenAI adapter handles all OpenAI-compatible providers - Anthropic adapter converts message format + normalizes streaming events - Gemini adapter wraps google-genai SDK - stream_chunk_builder(): local replacement for litellm.stream_chunk_builder() with reasoning_content support Key changes: - All litellm imports removed from codebase - pyproject.toml: litellm → anthropic, google-genai, tiktoken - Proxy mode: backward-compat LITELLM_PROXY_* env vars + new LLM_PROXY_* - remove_metadata(): whitelist-based field cleanup (strict providers like Groq reject any non-standard fields) - Null field cleanup: tool_calls=null → field removed - Tool call error recovery: stream interruptions from server-side validation (e.g. Groq hallucinated tool names) return partial text instead of crashing - stream_chunk_builder: handles usage=null from partial/interrupted streams - Responses API support via OpenAI adapter for gpt-5.x-pro and codex models Tested with real API calls across all providers (52/52 tests passing). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Clean up all remaining litellm references in variable names, function names, enum values, parameters, comments, and documentation: - ProviderType.LITELLM → ProviderType.NATIVE - force_litellm parameter → relaxed_schema (Agent, detect_provider) - acompletion_litellm() → acompletion() - litellm_mode parameter → removed (only relaxed_schema remains) - _convert_functions(litellm_mode=) → _convert_functions(relaxed_schema=) - get_litellm_proxy_kwargs() backward-compat alias deleted - litellm_model variable → resolved_model - All comments and docstrings updated - Documentation updated (agent.rst, utils.rst, models.rst, etc.) - Test names updated (test_agent_force_litellm → test_agent_relaxed_schema) Only remaining "LITELLM" references are env var names in get_proxy_kwargs() for backward compatibility (LITELLM_PROXY_ENABLED/URL/KEY). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ters Anthropic: thinking_delta events now written into collected_chunks (previously only sent via process_chunk callback, lost in stream_chunk_builder) Gemini: add include_thoughts=True to ThinkingConfig, capture thought=True parts as reasoning_content chunks (previously thinking parts were ignored) Both adapters now emit reasoning_content in the standard delta format, compatible with stream_chunk_builder's reasoning_content accumulation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Groq gpt-oss models use 'reasoning' (not 'reasoning_content') for thinking output. stream_chunk_builder now accumulates both field names. OpenAI gpt-5 does not expose reasoning content at all (by design). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New OAuth infrastructure for browser-based authentication: - pantheon/utils/oauth/codex.py: CodexOAuthManager with login(), refresh(), import_from_codex_cli(), and persistent token storage (~/.pantheon/oauth/) - OAuth 2.0 Authorization Code + PKCE flow, local callback server - Auto-refresh expired tokens, import from Codex CLI (~/.codex/auth.json) New Codex adapter: - pantheon/utils/adapters/codex_adapter.py: calls chatgpt.com/backend-api using Responses API format with OAuth bearer tokens - Handles SSE streaming, tool calls, usage extraction Integration: - llm_catalog.json: new "codex" provider with sdk="codex", auth_mode="oauth" - acompletion(): detects codex provider, auto-fetches OAuth token - call_llm_provider(): routes codex/ models to dedicated adapter - Models: gpt-5.4, gpt-5.4-mini, gpt-5.2-codex, gpt-5, o4-mini (free via OAuth) Usage: # Import from Codex CLI (if installed) from pantheon.utils.oauth import CodexOAuthManager CodexOAuthManager().import_from_codex_cli() # Or browser login CodexOAuthManager().login() # Then use codex/ prefix await acompletion(model="codex/gpt-5.4-mini", messages=[...]) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…gration CLI commands (pantheon-chatroom oauth): - oauth status: check auth status - oauth login: browser-based OAuth login - oauth import: import from Codex CLI (~/.codex/auth.json) - oauth logout: remove stored tokens NATS RPC tools for frontend: - oauth_status(): returns all OAuth provider statuses - oauth_login(provider): start browser-based login - oauth_import(provider): import from native CLI Model selector: - Detects codex as available provider when OAuth tokens exist - Added codex to DEFAULT_PROVIDER_MODELS and PROVIDER_API_KEYS - codex/ models appear in list_available_models() when authenticated acompletion(): - Routes codex/ models through OAuth token + CodexAdapter - Passes account_id for chatgpt-account-id header - Returns message dict directly (no stream_chunk_builder) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

OpenAI refresh_tokens are single-use. If Codex CLI already used the refresh_token, our refresh attempt fails with "refresh_token_reused". Now import_from_codex_cli() copies tokens as-is without refreshing. get_access_token() handles lazy refresh when actually needed. Only attempt refresh if there's no access_token at all. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

oauth_status() now returns supports_import=true only when Codex CLI auth file is detected. Frontend hides the import button otherwise. Also renamed button to "Import from Codex CLI" for clarity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Ollama is detected automatically when running at localhost:11434. No API key or manual configuration needed. - llm_catalog.json: new "ollama" provider with local=true, sdk=openai - model_selector.py: _detect_ollama() pings /api/tags to check availability, _list_ollama_models() fetches model names (cached 30s), _get_provider_models() returns dynamic ollama model list - llm.py: auto-fills dummy api_key="ollama" for local providers Models appear in the UI model selector as ollama/model-name. Usage: just run `ollama serve` and models show up automatically. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When a chat fails (e.g. OAuth token expired, model error), the error was silently swallowed — frontend just saw the model stop responding. Now chat_finished event includes status="error" and metadata.message when thread.response indicates failure. Frontend ChatManager shows the error as an assistant message in the chat. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Previously is_authenticated() returned true if refresh_token existed in the file, even if both access_token and refresh_token were expired/reused. Now oauth_status() calls get_access_token(auto_refresh=True) to actually verify the token works before reporting "Connected". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Improved error messages for Codex OAuth failures to be user-friendly and include [OAUTH_REQUIRED] tag for frontend to detect and show actionable UI (settings button). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolved conflict in pantheon/repl/__main__.py: - main added _update_litellm_cost_map() wrapper - our branch removed all litellm code - kept our version (no litellm) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The test workflow referenced --extra slack but pyproject.toml has no slack optional-dependency group (slack-sdk/slack-bolt are in main deps). This caused all CI jobs to fail with "Extra slack is not defined". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PR #60 removed force_litellm from Agent.__init__ but left references in agent.py (get_tools_for_llm) and test_background.py. Also add missing gpt-5.4-nano to llm_catalog.json so model_selector defaults are covered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Nanguage and others added 14 commits March 30, 2026 23:53

Merge branch 'main' into feature/replace-litellm

efe5b9f

Resolved conflict in pantheon/repl/__main__.py: - main added _update_litellm_cost_map() wrapper - our branch removed all litellm code - kept our version (no litellm) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Nanguage merged commit eaa4053 into main Apr 3, 2026
0 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: replace litellm with lightweight provider catalog + native SDK adapters#60

refactor: replace litellm with lightweight provider catalog + native SDK adapters#60
Nanguage merged 14 commits intomainfrom
feature/replace-litellm

Nanguage commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Nanguage commented Apr 3, 2026

Summary

Architecture

New files

Key changes

Providers

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant