Fix GLM-5 and other reasoning models appearing to hang via OpenAI shim#365
Conversation
|
this looks great, just one more thing can we add unit test for this change? |
gnanam1990
left a comment
There was a problem hiding this comment.
Good fix for a real hang — the streaming state machine is sound and the root cause is correctly identified. Two things to address before merge:
1. Non-streaming path: || fallback treats content: "" as missing
const rawContent = choice?.message?.content || choice?.message?.reasoning_contentIf a model returns content: "" (empty string) alongside reasoning_content, the || will use reasoning_content as the visible text — so the full chain-of-thought becomes the user-visible reply. The streaming path correctly uses !== '' guards; the non-streaming path should match:
const rawContent = (choice?.message?.content || '') !== ''
? choice?.message?.content
: choice?.message?.reasoning_contentor simply use ?? with an explicit empty-string check.
2. No test for the non-streaming path
The 6 focused tests cover streaming only. Please add at least one test for the non-streaming case where content is null/"" and reasoning_content carries the reply — this is the exact scenario that caused the original hang.
Minor note (not a blocker): if tool calls arrive while a thinking block is open (hasEmittedThinkingStart && !hasClosedThinking), the thinking block won't be closed before the tool call block starts, producing malformed stream output. Tool calling with GLM-5 is acknowledged as untested — worth a follow-up issue to track.
|
@kevincodex1 @gnanam1990 Hi, team! Addressed your PR feedbacks, please check again: b5b0cda I also have some ipv6 changes needed (I am running openclaude in a VM w/o ipv4), but I will open a subsequent PR for that. p.s. Please note the second commit is already done via interacting with GLM-5.1 already (using Z.ai Lite plan for now): |
b5b0cda to
1295d5f
Compare
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Rechecked the latest head 1295d5f215539a22a85ae2f9f39244cb7d25366d.
The two requested changes from the earlier review are fixed on this revision, and I did not find a remaining blocker.
What I verified on this head:
- non-streaming fallback now uses the explicit empty-string/null check instead of the old
||behavior - dedicated non-stream tests were added for:
content: null+reasoning_contentcontent: ""+reasoning_content- real content taking precedence over
reasoning_content
- direct repro of the original hang path now works:
- a GLM-style streaming response with only
reasoning_contentemits thinking deltas and completes cleanly instead of yielding an empty/frozen stream
- a GLM-style streaming response with only
- direct repro of the mixed reasoning/tool path is also sane:
- streaming
reasoning_contentfollowed by a tool call closes the thinking block before thetool_useblock starts - non-streaming responses with reasoning + tool call now produce structured content instead of hanging
- streaming
bun test src/services/api/openaiShim.test.ts-> 12 passbun run build-> successbun run smoke-> success
Residual risk is mostly scope rather than a demonstrated bug: the PR still doesn't add broad provider-matrix coverage for every reasoning-model/tool-calling variant, but the current head does fix the reported GLM/DeepSeek-style reasoning-content hang and the earlier review blockers.
|
this looks good already @otaviocarvalho . kindly please rebase to main and fix failing tests |
Reasoning models like GLM-5 and DeepSeek stream chain-of-thought in
`reasoning_content` while `content` stays empty (""). The OpenAI shim
only read `delta.content`, so it saw empty strings and never emitted
any Anthropic stream events — causing the UI to appear frozen.
- Add `reasoning_content` to streaming chunk and non-streaming response types
- Emit `reasoning_content` as thinking blocks (thinking_delta) in streaming mode
- Properly transition from thinking to text blocks when content phase begins
- Fall back to `reasoning_content` in non-streaming mode when content is null
Fixes Gitlawb#214
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use explicit empty-string check instead of || for content fallback so content: "" doesn't leak reasoning_content as visible text - Close thinking block before tool call blocks in streaming path - Add non-streaming and streaming reasoning_content tests Co-Authored-By: GLM-5.1 <noreply@openclaude.dev>
Remove hard throw in createTextInstance that crashed when hostContext.isInsideText was stale due to react-compiler element caching. Add timeout guards to prevent test hangs when render errors prevent exit() from firing. Co-Authored-By: Claude GLM-5.1 <noreply@openclaude.dev>
1295d5f to
8410977
Compare
Thanks, @kevincodex1! Looks like some flaky tests were causing the build failure. Adjusted them here, please check again: 8410977 |
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Rechecked the latest head 84109778504826c51f52d1d347eb7a6523e38de5 against current origin/main aff2bd89e22161f57ef330737b7f149208b96097.
I did not find a remaining blocker on this revision.
What I verified on this head:
- direct repro of the original streaming hang path now works:
- GLM-style stream with only
reasoning_contentemits a thinking block and completes cleanly - current
origin/mainemits no content block at all for the same repro
- GLM-style stream with only
- direct streaming reasoning -> text transition works:
- reasoning chunk emits
thinking_delta - later
contentchunk closes the thinking block and opens a text block
- reasoning chunk emits
- non-streaming reasoning fallback works for the requested edge cases:
content: null+reasoning_content-> thinking block plus visible textcontent: ""+reasoning_content-> thinking block plus visible text- real
contentstill takes precedence overreasoning_content
bun test src/services/api/openaiShim.test.ts src/commands/provider/provider.test.tsx-> 22 passbun run build-> successbun run smoke-> success
Residual risk / note:
- the extra
reconciler.ts/staticRender.tsxchanges do alter failure behavior for render-helper paths: on uncaught Ink render errors, the helpers now time out and return captured error output after ~3s instead of hanging indefinitely. I didn't find a concrete branch-specific correctness regression from that, but it is a broader failure-mode change beyond the original OpenAI shim fix.
Gitlawb#365) * Fix GLM-5 and other reasoning models appearing to hang via OpenAI shim Reasoning models like GLM-5 and DeepSeek stream chain-of-thought in `reasoning_content` while `content` stays empty (""). The OpenAI shim only read `delta.content`, so it saw empty strings and never emitted any Anthropic stream events — causing the UI to appear frozen. - Add `reasoning_content` to streaming chunk and non-streaming response types - Emit `reasoning_content` as thinking blocks (thinking_delta) in streaming mode - Properly transition from thinking to text blocks when content phase begins - Fall back to `reasoning_content` in non-streaming mode when content is null Fixes Gitlawb#214 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix non-streaming reasoning_content fallback and add tests - Use explicit empty-string check instead of || for content fallback so content: "" doesn't leak reasoning_content as visible text - Close thinking block before tool call blocks in streaming path - Add non-streaming and streaming reasoning_content tests Co-Authored-By: GLM-5.1 <noreply@openclaude.dev> * Fix flaky Ink reconciler tests caused by react-compiler memoization Remove hard throw in createTextInstance that crashed when hostContext.isInsideText was stale due to react-compiler element caching. Add timeout guards to prevent test hangs when render errors prevent exit() from firing. Co-Authored-By: Claude GLM-5.1 <noreply@openclaude.dev> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: GLM-5.1 <noreply@openclaude.dev>
Summary
reasoning_contentsupport to the OpenAI shim's streaming and non-streaming response handlersreasoning_contentwhilecontentstays empty"", causing the shim to emit nothing and the UI to appear frozen indefinitelyImpact
CLAUDE_CODE_USE_OPENAI=1. Reasoning is displayed as thinking blocks, actual reply as text. Previously these models appeared to hang forever.OpenAIStreamChunkand non-streaming response types now includereasoning_content. The streaming parser has a new thinking/content phase transition. No API changes to the shim's public interface.Testing
bun run buildbun run smokebun test src/services/api/openaiShim.test.ts— 6/6 passNotes
OPENAI_BASE_URL=https://opencode.ai/zen/v1+OPENAI_MODEL=glm-5— streaming (-pflag) and non-streaming verifiedOPENAI_BASE_URL=https://api.z.ai/api/coding/paas/v4+OPENAI_MODEL=GLM-5.1— streaming verifiedGLM-4.7— confirmed working withreasoning_contentpassinstead ofpaasin the URL, which caused the original 500/404 — unrelated to this fixreasoning_contentwith nocontentphase will show thinking as the visible reply (fallback behavior, matches user expectation)reasoning_contentfield (DeepSeek, QwQ, etc.)Fixes #214