Skip to content

Refactor to replace litellm with provider catalog and SDK adapters#61

Merged
Starlitnightly merged 18 commits intodevfrom
main
Apr 3, 2026
Merged

Refactor to replace litellm with provider catalog and SDK adapters#61
Starlitnightly merged 18 commits intodevfrom
main

Conversation

@Starlitnightly
Copy link
Copy Markdown
Collaborator

No description provided.

Nanguage and others added 18 commits March 30, 2026 23:53
…SDK adapters

Remove litellm dependency entirely and replace with a catalog-driven provider
abstraction layer using native SDKs (openai, anthropic, google-genai).

New architecture:
- llm_catalog.json: single source of truth for 12 providers, 80+ models
  (OpenAI, Anthropic, Gemini, DeepSeek, Zhipu, MiniMax, Moonshot, Qwen,
   Groq, Mistral, Together AI, OpenRouter)
- provider_registry.py: catalog loader + get_model_info(), completion_cost(),
  token_counter(), models_by_provider()
- adapters/: per-SDK adapters (openai, anthropic, gemini) with unified interface
  - OpenAI adapter handles all OpenAI-compatible providers
  - Anthropic adapter converts message format + normalizes streaming events
  - Gemini adapter wraps google-genai SDK
- stream_chunk_builder(): local replacement for litellm.stream_chunk_builder()
  with reasoning_content support

Key changes:
- All litellm imports removed from codebase
- pyproject.toml: litellm → anthropic, google-genai, tiktoken
- Proxy mode: backward-compat LITELLM_PROXY_* env vars + new LLM_PROXY_*
- remove_metadata(): whitelist-based field cleanup (strict providers like Groq
  reject any non-standard fields)
- Null field cleanup: tool_calls=null → field removed
- Tool call error recovery: stream interruptions from server-side validation
  (e.g. Groq hallucinated tool names) return partial text instead of crashing
- stream_chunk_builder: handles usage=null from partial/interrupted streams
- Responses API support via OpenAI adapter for gpt-5.x-pro and codex models

Tested with real API calls across all providers (52/52 tests passing).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clean up all remaining litellm references in variable names, function names,
enum values, parameters, comments, and documentation:

- ProviderType.LITELLM → ProviderType.NATIVE
- force_litellm parameter → relaxed_schema (Agent, detect_provider)
- acompletion_litellm() → acompletion()
- litellm_mode parameter → removed (only relaxed_schema remains)
- _convert_functions(litellm_mode=) → _convert_functions(relaxed_schema=)
- get_litellm_proxy_kwargs() backward-compat alias deleted
- litellm_model variable → resolved_model
- All comments and docstrings updated
- Documentation updated (agent.rst, utils.rst, models.rst, etc.)
- Test names updated (test_agent_force_litellm → test_agent_relaxed_schema)

Only remaining "LITELLM" references are env var names in get_proxy_kwargs()
for backward compatibility (LITELLM_PROXY_ENABLED/URL/KEY).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ters

Anthropic: thinking_delta events now written into collected_chunks
(previously only sent via process_chunk callback, lost in stream_chunk_builder)

Gemini: add include_thoughts=True to ThinkingConfig, capture thought=True
parts as reasoning_content chunks (previously thinking parts were ignored)

Both adapters now emit reasoning_content in the standard delta format,
compatible with stream_chunk_builder's reasoning_content accumulation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Groq gpt-oss models use 'reasoning' (not 'reasoning_content') for thinking
output. stream_chunk_builder now accumulates both field names.

OpenAI gpt-5 does not expose reasoning content at all (by design).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New OAuth infrastructure for browser-based authentication:
- pantheon/utils/oauth/codex.py: CodexOAuthManager with login(), refresh(),
  import_from_codex_cli(), and persistent token storage (~/.pantheon/oauth/)
- OAuth 2.0 Authorization Code + PKCE flow, local callback server
- Auto-refresh expired tokens, import from Codex CLI (~/.codex/auth.json)

New Codex adapter:
- pantheon/utils/adapters/codex_adapter.py: calls chatgpt.com/backend-api
  using Responses API format with OAuth bearer tokens
- Handles SSE streaming, tool calls, usage extraction

Integration:
- llm_catalog.json: new "codex" provider with sdk="codex", auth_mode="oauth"
- acompletion(): detects codex provider, auto-fetches OAuth token
- call_llm_provider(): routes codex/ models to dedicated adapter
- Models: gpt-5.4, gpt-5.4-mini, gpt-5.2-codex, gpt-5, o4-mini (free via OAuth)

Usage:
  # Import from Codex CLI (if installed)
  from pantheon.utils.oauth import CodexOAuthManager
  CodexOAuthManager().import_from_codex_cli()

  # Or browser login
  CodexOAuthManager().login()

  # Then use codex/ prefix
  await acompletion(model="codex/gpt-5.4-mini", messages=[...])

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…gration

CLI commands (pantheon-chatroom oauth):
- oauth status: check auth status
- oauth login: browser-based OAuth login
- oauth import: import from Codex CLI (~/.codex/auth.json)
- oauth logout: remove stored tokens

NATS RPC tools for frontend:
- oauth_status(): returns all OAuth provider statuses
- oauth_login(provider): start browser-based login
- oauth_import(provider): import from native CLI

Model selector:
- Detects codex as available provider when OAuth tokens exist
- Added codex to DEFAULT_PROVIDER_MODELS and PROVIDER_API_KEYS
- codex/ models appear in list_available_models() when authenticated

acompletion():
- Routes codex/ models through OAuth token + CodexAdapter
- Passes account_id for chatgpt-account-id header
- Returns message dict directly (no stream_chunk_builder)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OpenAI refresh_tokens are single-use. If Codex CLI already used the
refresh_token, our refresh attempt fails with "refresh_token_reused".

Now import_from_codex_cli() copies tokens as-is without refreshing.
get_access_token() handles lazy refresh when actually needed.
Only attempt refresh if there's no access_token at all.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
oauth_status() now returns supports_import=true only when Codex CLI
auth file is detected. Frontend hides the import button otherwise.
Also renamed button to "Import from Codex CLI" for clarity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Provide structured project documentation for AI assistants (Claude Code,
Cursor, Copilot, etc.) covering architecture, conventions, module reference,
team templates, and task toolset mechanism.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: add .agents/ directory for AI coding tool context
Ollama is detected automatically when running at localhost:11434.
No API key or manual configuration needed.

- llm_catalog.json: new "ollama" provider with local=true, sdk=openai
- model_selector.py: _detect_ollama() pings /api/tags to check availability,
  _list_ollama_models() fetches model names (cached 30s),
  _get_provider_models() returns dynamic ollama model list
- llm.py: auto-fills dummy api_key="ollama" for local providers

Models appear in the UI model selector as ollama/model-name.
Usage: just run `ollama serve` and models show up automatically.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a chat fails (e.g. OAuth token expired, model error), the error
was silently swallowed — frontend just saw the model stop responding.

Now chat_finished event includes status="error" and metadata.message
when thread.response indicates failure. Frontend ChatManager shows
the error as an assistant message in the chat.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously is_authenticated() returned true if refresh_token existed in
the file, even if both access_token and refresh_token were expired/reused.
Now oauth_status() calls get_access_token(auto_refresh=True) to actually
verify the token works before reporting "Connected".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Improved error messages for Codex OAuth failures to be user-friendly
and include [OAUTH_REQUIRED] tag for frontend to detect and show
actionable UI (settings button).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolved conflict in pantheon/repl/__main__.py:
- main added _update_litellm_cost_map() wrapper
- our branch removed all litellm code
- kept our version (no litellm)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test workflow referenced --extra slack but pyproject.toml has no
slack optional-dependency group (slack-sdk/slack-bolt are in main deps).
This caused all CI jobs to fail with "Extra slack is not defined".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
refactor: replace litellm with lightweight provider catalog + native SDK adapters
feat: CC-aligned token optimization pipeline (5-stage)
@Starlitnightly Starlitnightly merged commit b624140 into dev Apr 3, 2026
0 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants