refactor: replace litellm with lightweight provider catalog + native SDK adapters#60
Merged
refactor: replace litellm with lightweight provider catalog + native SDK adapters#60
Conversation
…SDK adapters Remove litellm dependency entirely and replace with a catalog-driven provider abstraction layer using native SDKs (openai, anthropic, google-genai). New architecture: - llm_catalog.json: single source of truth for 12 providers, 80+ models (OpenAI, Anthropic, Gemini, DeepSeek, Zhipu, MiniMax, Moonshot, Qwen, Groq, Mistral, Together AI, OpenRouter) - provider_registry.py: catalog loader + get_model_info(), completion_cost(), token_counter(), models_by_provider() - adapters/: per-SDK adapters (openai, anthropic, gemini) with unified interface - OpenAI adapter handles all OpenAI-compatible providers - Anthropic adapter converts message format + normalizes streaming events - Gemini adapter wraps google-genai SDK - stream_chunk_builder(): local replacement for litellm.stream_chunk_builder() with reasoning_content support Key changes: - All litellm imports removed from codebase - pyproject.toml: litellm → anthropic, google-genai, tiktoken - Proxy mode: backward-compat LITELLM_PROXY_* env vars + new LLM_PROXY_* - remove_metadata(): whitelist-based field cleanup (strict providers like Groq reject any non-standard fields) - Null field cleanup: tool_calls=null → field removed - Tool call error recovery: stream interruptions from server-side validation (e.g. Groq hallucinated tool names) return partial text instead of crashing - stream_chunk_builder: handles usage=null from partial/interrupted streams - Responses API support via OpenAI adapter for gpt-5.x-pro and codex models Tested with real API calls across all providers (52/52 tests passing). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clean up all remaining litellm references in variable names, function names, enum values, parameters, comments, and documentation: - ProviderType.LITELLM → ProviderType.NATIVE - force_litellm parameter → relaxed_schema (Agent, detect_provider) - acompletion_litellm() → acompletion() - litellm_mode parameter → removed (only relaxed_schema remains) - _convert_functions(litellm_mode=) → _convert_functions(relaxed_schema=) - get_litellm_proxy_kwargs() backward-compat alias deleted - litellm_model variable → resolved_model - All comments and docstrings updated - Documentation updated (agent.rst, utils.rst, models.rst, etc.) - Test names updated (test_agent_force_litellm → test_agent_relaxed_schema) Only remaining "LITELLM" references are env var names in get_proxy_kwargs() for backward compatibility (LITELLM_PROXY_ENABLED/URL/KEY). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ters Anthropic: thinking_delta events now written into collected_chunks (previously only sent via process_chunk callback, lost in stream_chunk_builder) Gemini: add include_thoughts=True to ThinkingConfig, capture thought=True parts as reasoning_content chunks (previously thinking parts were ignored) Both adapters now emit reasoning_content in the standard delta format, compatible with stream_chunk_builder's reasoning_content accumulation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Groq gpt-oss models use 'reasoning' (not 'reasoning_content') for thinking output. stream_chunk_builder now accumulates both field names. OpenAI gpt-5 does not expose reasoning content at all (by design). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New OAuth infrastructure for browser-based authentication: - pantheon/utils/oauth/codex.py: CodexOAuthManager with login(), refresh(), import_from_codex_cli(), and persistent token storage (~/.pantheon/oauth/) - OAuth 2.0 Authorization Code + PKCE flow, local callback server - Auto-refresh expired tokens, import from Codex CLI (~/.codex/auth.json) New Codex adapter: - pantheon/utils/adapters/codex_adapter.py: calls chatgpt.com/backend-api using Responses API format with OAuth bearer tokens - Handles SSE streaming, tool calls, usage extraction Integration: - llm_catalog.json: new "codex" provider with sdk="codex", auth_mode="oauth" - acompletion(): detects codex provider, auto-fetches OAuth token - call_llm_provider(): routes codex/ models to dedicated adapter - Models: gpt-5.4, gpt-5.4-mini, gpt-5.2-codex, gpt-5, o4-mini (free via OAuth) Usage: # Import from Codex CLI (if installed) from pantheon.utils.oauth import CodexOAuthManager CodexOAuthManager().import_from_codex_cli() # Or browser login CodexOAuthManager().login() # Then use codex/ prefix await acompletion(model="codex/gpt-5.4-mini", messages=[...]) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…gration CLI commands (pantheon-chatroom oauth): - oauth status: check auth status - oauth login: browser-based OAuth login - oauth import: import from Codex CLI (~/.codex/auth.json) - oauth logout: remove stored tokens NATS RPC tools for frontend: - oauth_status(): returns all OAuth provider statuses - oauth_login(provider): start browser-based login - oauth_import(provider): import from native CLI Model selector: - Detects codex as available provider when OAuth tokens exist - Added codex to DEFAULT_PROVIDER_MODELS and PROVIDER_API_KEYS - codex/ models appear in list_available_models() when authenticated acompletion(): - Routes codex/ models through OAuth token + CodexAdapter - Passes account_id for chatgpt-account-id header - Returns message dict directly (no stream_chunk_builder) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OpenAI refresh_tokens are single-use. If Codex CLI already used the refresh_token, our refresh attempt fails with "refresh_token_reused". Now import_from_codex_cli() copies tokens as-is without refreshing. get_access_token() handles lazy refresh when actually needed. Only attempt refresh if there's no access_token at all. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
oauth_status() now returns supports_import=true only when Codex CLI auth file is detected. Frontend hides the import button otherwise. Also renamed button to "Import from Codex CLI" for clarity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ollama is detected automatically when running at localhost:11434. No API key or manual configuration needed. - llm_catalog.json: new "ollama" provider with local=true, sdk=openai - model_selector.py: _detect_ollama() pings /api/tags to check availability, _list_ollama_models() fetches model names (cached 30s), _get_provider_models() returns dynamic ollama model list - llm.py: auto-fills dummy api_key="ollama" for local providers Models appear in the UI model selector as ollama/model-name. Usage: just run `ollama serve` and models show up automatically. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a chat fails (e.g. OAuth token expired, model error), the error was silently swallowed — frontend just saw the model stop responding. Now chat_finished event includes status="error" and metadata.message when thread.response indicates failure. Frontend ChatManager shows the error as an assistant message in the chat. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously is_authenticated() returned true if refresh_token existed in the file, even if both access_token and refresh_token were expired/reused. Now oauth_status() calls get_access_token(auto_refresh=True) to actually verify the token works before reporting "Connected". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Improved error messages for Codex OAuth failures to be user-friendly and include [OAUTH_REQUIRED] tag for frontend to detect and show actionable UI (settings button). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolved conflict in pantheon/repl/__main__.py: - main added _update_litellm_cost_map() wrapper - our branch removed all litellm code - kept our version (no litellm) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test workflow referenced --extra slack but pyproject.toml has no slack optional-dependency group (slack-sdk/slack-bolt are in main deps). This caused all CI jobs to fail with "Extra slack is not defined". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Starlitnightly
added a commit
that referenced
this pull request
Apr 3, 2026
PR #60 removed force_litellm from Agent.__init__ but left references in agent.py (get_tools_for_llm) and test_background.py. Also add missing gpt-5.4-nano to llm_catalog.json so model_selector defaults are covered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
llm_catalog.json)Architecture
New files
pantheon/utils/llm_catalog.jsonpantheon/utils/provider_registry.pyget_model_info(),completion_cost(),token_counter(),models_by_provider()pantheon/utils/adapters/openai_adapter.pypantheon/utils/adapters/anthropic_adapter.pypantheon/utils/adapters/gemini_adapter.pypantheon/utils/adapters/codex_adapter.pypantheon/utils/oauth/codex.pytests/test_provider_adapters.pyKey changes
litellm→anthropic,google-genai,tiktokenProviderType.LITELLM→ProviderType.NATIVE, catalog-based detectionstream_chunk_builder()withreasoning_contentsupportchat_finishedeventlitellmreferences removed from variable names, comments, docsProviders
Test plan
🤖 Generated with Claude Code