Skip to content
This repository was archived by the owner on Dec 17, 2025. It is now read-only.

Conversation

@tc3oliver
Copy link

Summary

Add support for reasoning_content field in streaming responses from OpenAI-compatible APIs.

Problem

Some LLM inference engines (e.g., vLLM with reasoning models like DeepSeek-R1 or QwQ)
return streaming content in the reasoning_content field instead of content.
This causes the benchmark to incorrectly report output_tokens = 1 because
the actual generated text is not captured.

Solution

Check both content and reasoning_content fields when processing streaming chunks.
This maintains backward compatibility while adding support for reasoning models.

Testing

Tested with:

  • vLLM serving a 120B reasoning model (uses reasoning_content)
  • Ollama serving llama3:70b (uses standard content)

Both scenarios now correctly capture output tokens and metrics.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant