-
Notifications
You must be signed in to change notification settings - Fork 704
Description
- I have looked for existing issues (including closed) about this
Bug Report
The tool call deduplication heuristic added in rig-core 0.32 (src/providers/openai/completion/streaming.rs, around line 201) breaks streaming tool calls from providers/models that send a unique id on every SSE delta chunk for the same logical tool call.
The heuristic (designed for API gateways like LiteLLM/OneAPI that send multiple distinct tool calls all sharing index 0) compares the id field between chunks at the same index. When a new, non-empty id differs from the existing one, the previous entry is evicted as a "completed" tool call:
// streaming.rs ~line 201-214
if let Some(new_id) = &tool_call.id
&& !new_id.is_empty()
&& let Some(existing) = tool_calls.get(&index)
&& !existing.id.is_empty()
&& existing.id != *new_id
{
let evicted = tool_calls.remove(&index).expect("checked above");
yield Ok(streaming::RawStreamingChoice::ToolCall(evicted));
}Some providers (observed with GLM-4 family models via OpenAI-compatible endpoints) stream a single tool call as multiple SSE chunks where each chunk carries a unique id (e.g., chatcmpl-tool-<uuid>) but the same index. In rig 0.29, all chunks accumulated at that index into one complete tool call. In rig 0.32, each chunk triggers an eviction — yielding incomplete fragments as "completed" tool calls.
Reproduction
This is model/provider dependent but can be reproduced with any provider that sends unique IDs per chunk. The SSE stream looks like:
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"chatcmpl-tool-aaa","function":{"name":"web_search","arguments":"null"}}]}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"chatcmpl-tool-bbb","function":{"name":"","arguments":"{\"query\": \"META"}}]}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"chatcmpl-tool-ccc","function":{"name":"","arguments":" Platforms news\""}}]}}]}
...
Each chunk has a different id (aaa, bbb, ccc) but they all represent delta fragments of the same tool call at index 0.
With rig 0.29: all fragments accumulate → one valid tool call with name="web_search" and complete arguments.
With rig 0.32: each fragment evicts the previous one → multiple broken tool calls: first has name="web_search" but arguments=null, subsequent have name="" with JSON fragments.
Expected behavior
Streaming tool call chunks at the same index should accumulate into a single tool call, regardless of whether the id field changes between chunks. The deduplication heuristic should use a different signal to distinguish "multiple distinct tool calls at the same index" from "one tool call streamed with changing IDs."
Possible approaches:
- Only evict when both
idandnameare non-empty and differ (a new name is a stronger signal of a genuinely new tool call) - Use a provider-specific flag to opt into the deduplication behavior
- Only evict after the previous entry has accumulated non-null arguments (don't evict incomplete entries)
Screenshots
N/A
Additional context
- Regression from rig-core 0.29 → 0.32
- Observed with
zai-org/GLM-4.7via an OpenAI-compatible Chat Completions endpoint - The same model + endpoint works correctly with rig 0.29
- Every tool call in the response is affected, causing 100% failure rate for any agentic workflow
- Related PRs: fix: OpenAI provider streaming tool call response for local LLM #442 (earlier fix for LM Studio streaming), fix(rig-1163): ollama stream tool calls get ignored #1309 (Ollama stream tool calls)