-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Description
Description
The LlamaStack server does not emit the response.output_text.done streaming event which doesn't adhere to official OpenAI Responses behaviour in their gpt models.
The streaming event sequence for text output should be:
output_item.added → content_part.added → output_text.delta (xN) → output_text.done → content_part.done → output_item.done
LlamaStack currently skips output_text.done — the event type is defined in openai_responses.py but never emitted in streaming.py.
How to reproduce
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8321/v1", api_key="fake")
events = list(client.responses.create(
model="ollama/gpt-oss:20b", # or any registered model
input="What is 2 + 2?",
stream=True,
))
types = [e.type for e in events]
print("response.output_text.done" in types) # False — should be TrueExpected vs actual
Expected (matches OpenAI ground truth with gpt-5.1):
...
response.output_text.delta (xN)
response.output_text.done ← contains final accumulated text
response.content_part.done
...
Actual (LlamaStack server):
...
response.output_text.delta (xN)
← output_text.done MISSING
response.content_part.done
...
Impact
Apart from divergence from official responses behaviour, Clients relying on response.output_text.done as a signal to capture the final accumulated text (without manually accumulating deltas) will not receive it.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels