Skip to content

fix(streaming): restore real-time chunk delivery in tool/chat paths#261

Merged
veithly merged 2 commits intoXSpoonAi:mainfrom
veithly:fix/realtime-streaming-gemini-async
Mar 4, 2026
Merged

fix(streaming): restore real-time chunk delivery in tool/chat paths#261
veithly merged 2 commits intoXSpoonAi:mainfrom
veithly:fix/realtime-streaming-gemini-async

Conversation

@veithly
Copy link
Collaborator

@veithly veithly commented Mar 3, 2026

Summary

  • forward output_queue through tool-call chat path so provider deltas are emitted upstream
  • stream tool-call responses directly in OpenAI-compatible / Anthropic / Gemini / Ollama providers
  • switch Gemini streaming loops to client.aio.models.generate_content_stream(...) + async for to avoid event-loop blocking
  • prevent duplicate full-content enqueue when provider already streamed text (streamed_content metadata)
  • add focused regression tests for provider/tool streaming and incremental delivery behavior

Verification

  • uv run python -m pytest tests/test_tool_streaming_output.py tests/test_chatbot_integration.py::TestChatBotIntegration::test_ask_tool_with_manager tests/test_agent_llm_integration.py::TestAgentLLMIntegration::test_streamed_tool_response_does_not_enqueue_full_content_twice -q
    • 9 passed

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e16f651871

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


if delta.tool_calls:
for tc_chunk in delta.tool_calls:
tc_id = tc_chunk.id or f"call_{tc_chunk.index}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve streamed tool-call fragments under one call ID

In the new streaming chat_with_tools path, the accumulator key is recomputed as tc_chunk.id or f"call_{tc_chunk.index}" for every delta, so when later deltas omit id (a common streaming pattern after the first tool-call chunk), argument/name fragments are split across two entries instead of being merged. That can produce malformed tool_calls (e.g., one entry with partial args and another with missing name), which then breaks downstream tool execution in ToolCallAgent.act() when output_queue is enabled.

Useful? React with 👍 / 👎.

and part_response.candidates
and getattr(part_response.candidates[0], "finish_reason", None)
):
finish_reason = str(part_response.candidates[0].finish_reason)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize Gemini streamed finish reasons before returning

The streamed Gemini tool path now returns finish_reason as str(part_response.candidates[0].finish_reason) (for example "STOP") without mapping it to the normalized values used elsewhere ("stop", "length", "tool_calls"). In streaming agent flows (output_queue set), this can prevent ToolCallAgent._should_terminate_on_finish_reason() from recognizing terminal no-tool responses, causing extra think/act iterations instead of clean termination.

Useful? React with 👍 / 👎.

@veithly
Copy link
Collaborator Author

veithly commented Mar 3, 2026

Addressed the automated review items in commit e578bf7:

  • OpenAI-compatible streaming tool-call accumulation now keeps a stable per-index call ID even when later deltas omit id, so argument/name fragments merge into the same ToolCall.
  • Gemini streaming paths now normalize finish reasons (for example STOP -> stop) and preserve tool_calls as the standardized finish reason when function calls are emitted.

Validation:

  • uv run python -m pytest tests/test_tool_streaming_output.py tests/test_chatbot_integration.py::TestChatBotIntegration::test_ask_tool_with_manager tests/test_agent_llm_integration.py::TestAgentLLMIntegration::test_streamed_tool_response_does_not_enqueue_full_content_twice -q
  • 11 passed

@veithly veithly merged commit 3d0a2fa into XSpoonAi:main Mar 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant