Skip to content

Responses API streaming parity: missing response.output_text.done event #5309

@robinnarsinghranabhat

Description

@robinnarsinghranabhat

Description

The LlamaStack server does not emit the response.output_text.done streaming event which doesn't adhere to official OpenAI Responses behaviour in their gpt models.

The streaming event sequence for text output should be:

output_item.added → content_part.added → output_text.delta (xN) → output_text.done → content_part.done → output_item.done

LlamaStack currently skips output_text.done — the event type is defined in openai_responses.py but never emitted in streaming.py.

How to reproduce

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:8321/v1", api_key="fake")
events = list(client.responses.create(
    model="ollama/gpt-oss:20b",  # or any registered model
    input="What is 2 + 2?",
    stream=True,
))

types = [e.type for e in events]
print("response.output_text.done" in types)  # False — should be True

Expected vs actual

Expected (matches OpenAI ground truth with gpt-5.1):

...
response.output_text.delta (xN)
response.output_text.done          ← contains final accumulated text
response.content_part.done
...

Actual (LlamaStack server):

...
response.output_text.delta (xN)
                                   ← output_text.done MISSING
response.content_part.done
...

Impact

Apart from divergence from official responses behaviour, Clients relying on response.output_text.done as a signal to capture the final accumulated text (without manually accumulating deltas) will not receive it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions