Skip to content

Migrate to LiteLLM Responses API; default gpt-5; fast uses gpt-5-nano#4

Open
shouryamaanjain wants to merge 1 commit intomainfrom
capy/set-default-model-to-a9340cba
Open

Migrate to LiteLLM Responses API; default gpt-5; fast uses gpt-5-nano#4
shouryamaanjain wants to merge 1 commit intomainfrom
capy/set-default-model-to-a9340cba

Conversation

@shouryamaanjain
Copy link
Contributor

Summary (why)

  • Default to gpt-5 for improved capability and alignment with OpenAI Responses API.
  • Fast mode now uses gpt-5-nano to provide a lower-latency option while keeping the same UX.
  • Migrate from litellm.completion (Chat Completions) to litellm.responses (OpenAI Responses API) to support reasoning effort control and modern tool-call semantics.

What changed

  • Default model set to gpt-5 (emplode/emplode.py).
  • --fast now maps to gpt-5-nano and help text updated (emplode/cli.py).
  • All cloud calls now use litellm.responses with:
    • input=messages (instead of messages=...)
    • tools=[{"type":"custom","name":"run_code", ...}] mirroring existing run_code(language, code) schema
    • stream=True
    • reasoning={"effort":"high"}
    • temperature preserved
    • Azure and custom api_base branches preserved
  • Streaming/event handling adapted to Responses API: normalized streamed events to existing merge_deltas buffer and preserved tool/code-exec flow.
  • Local/llama path unchanged.
  • Dependency: bump litellm to ^1.63.8 to ensure Responses API support.

Impact

  • CLI UX unchanged except model names.
  • Streaming output and code execution tool-calls continue to work.
  • Azure mode and custom api_base still function with responses().

Testing

  • Default: run emplode → model gpt-5, responses() streaming, reasoning={effort:high}.
  • Fast: run emplode --fast → model gpt-5-nano with same behavior.
  • Tool call: ask it to run a short Python snippet → confirm run_code executes and follow-up continues.
  • Azure: set EMPLODE_CLI_USE_AZURE=true with env vars (AZURE_API_KEY/BASE/VERSION/DEPLOYMENT_NAME) → confirm streaming works.
  • Custom base: pass --api_base <url> → confirm call uses model="custom/"+model and streams.
  • Local: --local unchanged.

Files

  • emplode/emplode.py: default model + responses() migration + streaming normalization for Responses events.
  • emplode/cli.py: --fast maps to gpt-5-nano; help updated.
  • pyproject.toml: litellm ^1.63.8.

Notes

  • If further Responses tool output chaining requires previous_response_id in more complex sessions, the normalized event handling is encapsulated for minimal churn and can be extended.

₍ᐢ•(ܫ)•ᐢ₎ Generated by Capy (view task)

…sponses with high reasoning + streaming/tool support; bump litellm to ^1.63.8
@shouryamaanjain shouryamaanjain added the capy PR created by Capy label Sep 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

capy PR created by Capy

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant