Skip to content

refactor: replace litellm with lightweight provider catalog + native SDK adapters#60

Merged
Nanguage merged 14 commits intomainfrom
feature/replace-litellm
Apr 3, 2026
Merged

refactor: replace litellm with lightweight provider catalog + native SDK adapters#60
Nanguage merged 14 commits intomainfrom
feature/replace-litellm

Conversation

@Nanguage
Copy link
Copy Markdown
Member

@Nanguage Nanguage commented Apr 3, 2026

Summary

  • Remove litellm dependency entirely and replace with a catalog-driven provider abstraction layer using native SDKs (openai, anthropic, google-genai)
  • 14 providers, 80+ models supported via JSON catalog (llm_catalog.json)
  • Codex OAuth support for free ChatGPT backend-api access
  • Ollama auto-detected as local provider
  • Error propagation from backend to frontend via NATS

Architecture

llm_catalog.json (config) → provider_registry.py (metadata/cost)
                           → adapters/ (per-SDK: openai, anthropic, gemini, codex)
                           → stream_chunk_builder (unified streaming)

New files

File Purpose
pantheon/utils/llm_catalog.json Provider & model catalog (pricing, capabilities, token limits)
pantheon/utils/provider_registry.py get_model_info(), completion_cost(), token_counter(), models_by_provider()
pantheon/utils/adapters/openai_adapter.py OpenAI + 10 compatible providers (DeepSeek, Groq, Zhipu, MiniMax, Moonshot, Qwen, Mistral, Together, OpenRouter, Ollama)
pantheon/utils/adapters/anthropic_adapter.py Anthropic native SDK (message format conversion, thinking support)
pantheon/utils/adapters/gemini_adapter.py Google GenAI native SDK (thinking support)
pantheon/utils/adapters/codex_adapter.py ChatGPT backend-api via OAuth (Responses API format)
pantheon/utils/oauth/codex.py OAuth 2.0 + PKCE flow, token storage, auto-refresh, Codex CLI import
tests/test_provider_adapters.py Integration tests for all providers (52 tests)

Key changes

Area Change
Dependencies litellmanthropic, google-genai, tiktoken
Provider routing ProviderType.LITELLMProviderType.NATIVE, catalog-based detection
Streaming New stream_chunk_builder() with reasoning_content support
Message cleanup Whitelist-based field sanitization (strict providers like Groq)
Tool call recovery Partial response on server-side validation errors
Error propagation Backend errors shown in frontend chat via NATS chat_finished event
Naming All litellm references removed from variable names, comments, docs

Providers

Provider SDK Auth Models
OpenAI openai API key 20 (incl. Responses API for pro/codex)
Anthropic anthropic API key 7
Gemini google-genai API key 9
DeepSeek, Zhipu, MiniMax, Moonshot, Qwen, Groq, Mistral, Together, OpenRouter openai (compat) API key 46
Codex codex OAuth 2.0 5 (free via ChatGPT Plus)
Ollama openai (compat) none auto-detected from localhost:11434

Test plan

  • Unit tests: provider_registry, stream_chunk_builder (15 tests)
  • Integration tests: all 37 models across 7 providers with real API calls (52 tests total)
  • Responses API: gpt-5.4-pro, gpt-5.2-pro, gpt-5.2-codex
  • Thinking/reasoning: Anthropic, Gemini, DeepSeek, Zhipu, Groq
  • Tool calling: OpenAI, Anthropic
  • Codex OAuth: login, import, token refresh
  • Ollama: auto-detection, streaming, model listing
  • Error propagation: OAuth errors shown in frontend with action button
  • Chatroom UI: model selector, OAuth settings panel, Ollama config

🤖 Generated with Claude Code

Nanguage and others added 14 commits March 30, 2026 23:53
…SDK adapters

Remove litellm dependency entirely and replace with a catalog-driven provider
abstraction layer using native SDKs (openai, anthropic, google-genai).

New architecture:
- llm_catalog.json: single source of truth for 12 providers, 80+ models
  (OpenAI, Anthropic, Gemini, DeepSeek, Zhipu, MiniMax, Moonshot, Qwen,
   Groq, Mistral, Together AI, OpenRouter)
- provider_registry.py: catalog loader + get_model_info(), completion_cost(),
  token_counter(), models_by_provider()
- adapters/: per-SDK adapters (openai, anthropic, gemini) with unified interface
  - OpenAI adapter handles all OpenAI-compatible providers
  - Anthropic adapter converts message format + normalizes streaming events
  - Gemini adapter wraps google-genai SDK
- stream_chunk_builder(): local replacement for litellm.stream_chunk_builder()
  with reasoning_content support

Key changes:
- All litellm imports removed from codebase
- pyproject.toml: litellm → anthropic, google-genai, tiktoken
- Proxy mode: backward-compat LITELLM_PROXY_* env vars + new LLM_PROXY_*
- remove_metadata(): whitelist-based field cleanup (strict providers like Groq
  reject any non-standard fields)
- Null field cleanup: tool_calls=null → field removed
- Tool call error recovery: stream interruptions from server-side validation
  (e.g. Groq hallucinated tool names) return partial text instead of crashing
- stream_chunk_builder: handles usage=null from partial/interrupted streams
- Responses API support via OpenAI adapter for gpt-5.x-pro and codex models

Tested with real API calls across all providers (52/52 tests passing).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clean up all remaining litellm references in variable names, function names,
enum values, parameters, comments, and documentation:

- ProviderType.LITELLM → ProviderType.NATIVE
- force_litellm parameter → relaxed_schema (Agent, detect_provider)
- acompletion_litellm() → acompletion()
- litellm_mode parameter → removed (only relaxed_schema remains)
- _convert_functions(litellm_mode=) → _convert_functions(relaxed_schema=)
- get_litellm_proxy_kwargs() backward-compat alias deleted
- litellm_model variable → resolved_model
- All comments and docstrings updated
- Documentation updated (agent.rst, utils.rst, models.rst, etc.)
- Test names updated (test_agent_force_litellm → test_agent_relaxed_schema)

Only remaining "LITELLM" references are env var names in get_proxy_kwargs()
for backward compatibility (LITELLM_PROXY_ENABLED/URL/KEY).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ters

Anthropic: thinking_delta events now written into collected_chunks
(previously only sent via process_chunk callback, lost in stream_chunk_builder)

Gemini: add include_thoughts=True to ThinkingConfig, capture thought=True
parts as reasoning_content chunks (previously thinking parts were ignored)

Both adapters now emit reasoning_content in the standard delta format,
compatible with stream_chunk_builder's reasoning_content accumulation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Groq gpt-oss models use 'reasoning' (not 'reasoning_content') for thinking
output. stream_chunk_builder now accumulates both field names.

OpenAI gpt-5 does not expose reasoning content at all (by design).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New OAuth infrastructure for browser-based authentication:
- pantheon/utils/oauth/codex.py: CodexOAuthManager with login(), refresh(),
  import_from_codex_cli(), and persistent token storage (~/.pantheon/oauth/)
- OAuth 2.0 Authorization Code + PKCE flow, local callback server
- Auto-refresh expired tokens, import from Codex CLI (~/.codex/auth.json)

New Codex adapter:
- pantheon/utils/adapters/codex_adapter.py: calls chatgpt.com/backend-api
  using Responses API format with OAuth bearer tokens
- Handles SSE streaming, tool calls, usage extraction

Integration:
- llm_catalog.json: new "codex" provider with sdk="codex", auth_mode="oauth"
- acompletion(): detects codex provider, auto-fetches OAuth token
- call_llm_provider(): routes codex/ models to dedicated adapter
- Models: gpt-5.4, gpt-5.4-mini, gpt-5.2-codex, gpt-5, o4-mini (free via OAuth)

Usage:
  # Import from Codex CLI (if installed)
  from pantheon.utils.oauth import CodexOAuthManager
  CodexOAuthManager().import_from_codex_cli()

  # Or browser login
  CodexOAuthManager().login()

  # Then use codex/ prefix
  await acompletion(model="codex/gpt-5.4-mini", messages=[...])

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…gration

CLI commands (pantheon-chatroom oauth):
- oauth status: check auth status
- oauth login: browser-based OAuth login
- oauth import: import from Codex CLI (~/.codex/auth.json)
- oauth logout: remove stored tokens

NATS RPC tools for frontend:
- oauth_status(): returns all OAuth provider statuses
- oauth_login(provider): start browser-based login
- oauth_import(provider): import from native CLI

Model selector:
- Detects codex as available provider when OAuth tokens exist
- Added codex to DEFAULT_PROVIDER_MODELS and PROVIDER_API_KEYS
- codex/ models appear in list_available_models() when authenticated

acompletion():
- Routes codex/ models through OAuth token + CodexAdapter
- Passes account_id for chatgpt-account-id header
- Returns message dict directly (no stream_chunk_builder)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OpenAI refresh_tokens are single-use. If Codex CLI already used the
refresh_token, our refresh attempt fails with "refresh_token_reused".

Now import_from_codex_cli() copies tokens as-is without refreshing.
get_access_token() handles lazy refresh when actually needed.
Only attempt refresh if there's no access_token at all.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
oauth_status() now returns supports_import=true only when Codex CLI
auth file is detected. Frontend hides the import button otherwise.
Also renamed button to "Import from Codex CLI" for clarity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ollama is detected automatically when running at localhost:11434.
No API key or manual configuration needed.

- llm_catalog.json: new "ollama" provider with local=true, sdk=openai
- model_selector.py: _detect_ollama() pings /api/tags to check availability,
  _list_ollama_models() fetches model names (cached 30s),
  _get_provider_models() returns dynamic ollama model list
- llm.py: auto-fills dummy api_key="ollama" for local providers

Models appear in the UI model selector as ollama/model-name.
Usage: just run `ollama serve` and models show up automatically.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a chat fails (e.g. OAuth token expired, model error), the error
was silently swallowed — frontend just saw the model stop responding.

Now chat_finished event includes status="error" and metadata.message
when thread.response indicates failure. Frontend ChatManager shows
the error as an assistant message in the chat.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously is_authenticated() returned true if refresh_token existed in
the file, even if both access_token and refresh_token were expired/reused.
Now oauth_status() calls get_access_token(auto_refresh=True) to actually
verify the token works before reporting "Connected".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Improved error messages for Codex OAuth failures to be user-friendly
and include [OAUTH_REQUIRED] tag for frontend to detect and show
actionable UI (settings button).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolved conflict in pantheon/repl/__main__.py:
- main added _update_litellm_cost_map() wrapper
- our branch removed all litellm code
- kept our version (no litellm)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test workflow referenced --extra slack but pyproject.toml has no
slack optional-dependency group (slack-sdk/slack-bolt are in main deps).
This caused all CI jobs to fail with "Extra slack is not defined".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Nanguage Nanguage merged commit eaa4053 into main Apr 3, 2026
0 of 5 checks passed
Starlitnightly added a commit that referenced this pull request Apr 3, 2026
PR #60 removed force_litellm from Agent.__init__ but left references in
agent.py (get_tools_for_llm) and test_background.py. Also add missing
gpt-5.4-nano to llm_catalog.json so model_selector defaults are covered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant