Skip to content

OpenAI/LLM integration: capture time_to_first_token_ms from streaming responses #31

@piotrekno1

Description

@piotrekno1

Summary

The GenerationOutputMeta dataclass has a time_to_first_token_ms field, but the OpenAI integration never populates it. The integration times the full round-trip (t0 to response) but has no way to measure TTFT from a completed non-streaming response.

Current behavior

build_output_meta in integrations/openai.py constructs GenerationOutputMeta with tokens, cached tokens, reasoning tokens, tps, and stop reason — but time_to_first_token_ms is always None.

Streaming calls (stream=True) are detected but skipped entirely — no inference event is recorded.

Expected behavior

When stream=True, the integration should:

  1. Wrap the returned iterator
  2. Record the elapsed time when the first chunk arrives as time_to_first_token_ms
  3. Accumulate token counts from chunks
  4. Record the inference event when the stream is exhausted

Notes

  • The field and serialization path already exist — this is purely a capture gap in the integration
  • Non-streaming calls cannot produce a meaningful TTFT (the whole response arrives at once), so this only applies to streamed requests
  • The same gap exists in the Transformers integration, but that's harder to address without a streaming hook

Workaround

Users can pass TTFT manually today:

output_meta = GenerationOutputMeta(
    tokens_out=n,
    time_to_first_token_ms=ttft_ms,
)
wildedge.track(model, duration_ms=total_ms, output_meta=output_meta)

Environment

  • SDK version: 0.1.1

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions