Skip to content

Fix GLM-5 and other reasoning models appearing to hang via OpenAI shim#365

Merged
kevincodex1 merged 3 commits intoGitlawb:mainfrom
otaviocarvalho:fix/openai-shim-reasoning-content
Apr 6, 2026
Merged

Fix GLM-5 and other reasoning models appearing to hang via OpenAI shim#365
kevincodex1 merged 3 commits intoGitlawb:mainfrom
otaviocarvalho:fix/openai-shim-reasoning-content

Conversation

@otaviocarvalho
Copy link
Copy Markdown
Contributor

@otaviocarvalho otaviocarvalho commented Apr 4, 2026

Summary

  • Added reasoning_content support to the OpenAI shim's streaming and non-streaming response handlers
  • Reasoning models (GLM-5, GLM-5.1, DeepSeek) stream chain-of-thought in reasoning_content while content stays empty "", causing the shim to emit nothing and the UI to appear frozen indefinitely

Impact

  • user-facing impact: GLM-5/5.1 and other reasoning models now work correctly via CLAUDE_CODE_USE_OPENAI=1. Reasoning is displayed as thinking blocks, actual reply as text. Previously these models appeared to hang forever.
  • developer/maintainer impact: OpenAIStreamChunk and non-streaming response types now include reasoning_content. The streaming parser has a new thinking/content phase transition. No API changes to the shim's public interface.

Testing

  • bun run build
  • bun run smoke
  • focused tests: bun test src/services/api/openaiShim.test.ts — 6/6 pass

Notes

  • provider/model paths tested:
    • OpenCode Zen: OPENAI_BASE_URL=https://opencode.ai/zen/v1 + OPENAI_MODEL=glm-5 — streaming (-p flag) and non-streaming verified
    • z.ai Coding Lite plan: OPENAI_BASE_URL=https://api.z.ai/api/coding/paas/v4 + OPENAI_MODEL=GLM-5.1 — streaming verified
    • z.ai also tested with GLM-4.7 — confirmed working with reasoning_content
    • Note: z.ai issue GLM unable to connect #214 reporter used pass instead of paas in the URL, which caused the original 500/404 — unrelated to this fix
  • screenshots attached: N/A (no UI changes, terminal output only)
  • follow-up work or known limitations:
    • Models that only emit reasoning_content with no content phase will show thinking as the visible reply (fallback behavior, matches user expectation)
    • Tool calling with GLM-5/5.1 not yet verified — may need separate testing
    • Benefits any OpenAI-compatible provider using the reasoning_content field (DeepSeek, QwQ, etc.)

Fixes #214

@kevincodex1
Copy link
Copy Markdown
Contributor

this looks great, just one more thing can we add unit test for this change?

@kevincodex1 kevincodex1 requested review from auriti and gnanam1990 and removed request for gnanam1990 April 5, 2026 03:42
Copy link
Copy Markdown
Collaborator

@gnanam1990 gnanam1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix for a real hang — the streaming state machine is sound and the root cause is correctly identified. Two things to address before merge:

1. Non-streaming path: || fallback treats content: "" as missing

const rawContent = choice?.message?.content || choice?.message?.reasoning_content

If a model returns content: "" (empty string) alongside reasoning_content, the || will use reasoning_content as the visible text — so the full chain-of-thought becomes the user-visible reply. The streaming path correctly uses !== '' guards; the non-streaming path should match:

const rawContent = (choice?.message?.content || '') !== ''
  ? choice?.message?.content
  : choice?.message?.reasoning_content

or simply use ?? with an explicit empty-string check.

2. No test for the non-streaming path

The 6 focused tests cover streaming only. Please add at least one test for the non-streaming case where content is null/"" and reasoning_content carries the reply — this is the exact scenario that caused the original hang.


Minor note (not a blocker): if tool calls arrive while a thinking block is open (hasEmittedThinkingStart && !hasClosedThinking), the thinking block won't be closed before the tool call block starts, producing malformed stream output. Tool calling with GLM-5 is acknowledged as untested — worth a follow-up issue to track.

@otaviocarvalho
Copy link
Copy Markdown
Contributor Author

otaviocarvalho commented Apr 5, 2026

@kevincodex1 @gnanam1990 Hi, team! Addressed your PR feedbacks, please check again: b5b0cda

I also have some ipv6 changes needed (I am running openclaude in a VM w/o ipv4), but I will open a subsequent PR for that.

p.s. Please note the second commit is already done via interacting with GLM-5.1 already (using Z.ai Lite plan for now): Co-Authored-By: GLM-5.1 <noreply@openclaude.dev>

@otaviocarvalho otaviocarvalho force-pushed the fix/openai-shim-reasoning-content branch from b5b0cda to 1295d5f Compare April 6, 2026 10:09
Vasanthdev2004
Vasanthdev2004 previously approved these changes Apr 6, 2026
Copy link
Copy Markdown
Collaborator

@Vasanthdev2004 Vasanthdev2004 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rechecked the latest head 1295d5f215539a22a85ae2f9f39244cb7d25366d.

The two requested changes from the earlier review are fixed on this revision, and I did not find a remaining blocker.

What I verified on this head:

  • non-streaming fallback now uses the explicit empty-string/null check instead of the old || behavior
  • dedicated non-stream tests were added for:
    • content: null + reasoning_content
    • content: "" + reasoning_content
    • real content taking precedence over reasoning_content
  • direct repro of the original hang path now works:
    • a GLM-style streaming response with only reasoning_content emits thinking deltas and completes cleanly instead of yielding an empty/frozen stream
  • direct repro of the mixed reasoning/tool path is also sane:
    • streaming reasoning_content followed by a tool call closes the thinking block before the tool_use block starts
    • non-streaming responses with reasoning + tool call now produce structured content instead of hanging
  • bun test src/services/api/openaiShim.test.ts -> 12 pass
  • bun run build -> success
  • bun run smoke -> success

Residual risk is mostly scope rather than a demonstrated bug: the PR still doesn't add broad provider-matrix coverage for every reasoning-model/tool-calling variant, but the current head does fix the reported GLM/DeepSeek-style reasoning-content hang and the earlier review blockers.

@kevincodex1
Copy link
Copy Markdown
Contributor

this looks good already @otaviocarvalho . kindly please rebase to main and fix failing tests

otaviocarvalho and others added 3 commits April 6, 2026 13:26
Reasoning models like GLM-5 and DeepSeek stream chain-of-thought in
`reasoning_content` while `content` stays empty (""). The OpenAI shim
only read `delta.content`, so it saw empty strings and never emitted
any Anthropic stream events — causing the UI to appear frozen.

- Add `reasoning_content` to streaming chunk and non-streaming response types
- Emit `reasoning_content` as thinking blocks (thinking_delta) in streaming mode
- Properly transition from thinking to text blocks when content phase begins
- Fall back to `reasoning_content` in non-streaming mode when content is null

Fixes Gitlawb#214

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use explicit empty-string check instead of || for content fallback
  so content: "" doesn't leak reasoning_content as visible text
- Close thinking block before tool call blocks in streaming path
- Add non-streaming and streaming reasoning_content tests

Co-Authored-By: GLM-5.1 <noreply@openclaude.dev>
Remove hard throw in createTextInstance that crashed when hostContext.isInsideText
was stale due to react-compiler element caching. Add timeout guards to prevent
test hangs when render errors prevent exit() from firing.

Co-Authored-By: Claude GLM-5.1 <noreply@openclaude.dev>
@otaviocarvalho
Copy link
Copy Markdown
Contributor Author

this looks good already @otaviocarvalho . kindly please rebase to main and fix failing tests

Thanks, @kevincodex1! Looks like some flaky tests were causing the build failure. Adjusted them here, please check again: 8410977

Copy link
Copy Markdown
Collaborator

@Vasanthdev2004 Vasanthdev2004 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rechecked the latest head 84109778504826c51f52d1d347eb7a6523e38de5 against current origin/main aff2bd89e22161f57ef330737b7f149208b96097.

I did not find a remaining blocker on this revision.

What I verified on this head:

  • direct repro of the original streaming hang path now works:
    • GLM-style stream with only reasoning_content emits a thinking block and completes cleanly
    • current origin/main emits no content block at all for the same repro
  • direct streaming reasoning -> text transition works:
    • reasoning chunk emits thinking_delta
    • later content chunk closes the thinking block and opens a text block
  • non-streaming reasoning fallback works for the requested edge cases:
    • content: null + reasoning_content -> thinking block plus visible text
    • content: "" + reasoning_content -> thinking block plus visible text
    • real content still takes precedence over reasoning_content
  • bun test src/services/api/openaiShim.test.ts src/commands/provider/provider.test.tsx -> 22 pass
  • bun run build -> success
  • bun run smoke -> success

Residual risk / note:

  • the extra reconciler.ts / staticRender.tsx changes do alter failure behavior for render-helper paths: on uncaught Ink render errors, the helpers now time out and return captured error output after ~3s instead of hanging indefinitely. I didn't find a concrete branch-specific correctness regression from that, but it is a broader failure-mode change beyond the original OpenAI shim fix.

@kevincodex1 kevincodex1 merged commit 1e05702 into Gitlawb:main Apr 6, 2026
1 check passed
euxaristia pushed a commit to euxaristia/openclaude that referenced this pull request Apr 13, 2026
Gitlawb#365)

* Fix GLM-5 and other reasoning models appearing to hang via OpenAI shim

Reasoning models like GLM-5 and DeepSeek stream chain-of-thought in
`reasoning_content` while `content` stays empty (""). The OpenAI shim
only read `delta.content`, so it saw empty strings and never emitted
any Anthropic stream events — causing the UI to appear frozen.

- Add `reasoning_content` to streaming chunk and non-streaming response types
- Emit `reasoning_content` as thinking blocks (thinking_delta) in streaming mode
- Properly transition from thinking to text blocks when content phase begins
- Fall back to `reasoning_content` in non-streaming mode when content is null

Fixes Gitlawb#214

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix non-streaming reasoning_content fallback and add tests

- Use explicit empty-string check instead of || for content fallback
  so content: "" doesn't leak reasoning_content as visible text
- Close thinking block before tool call blocks in streaming path
- Add non-streaming and streaming reasoning_content tests

Co-Authored-By: GLM-5.1 <noreply@openclaude.dev>

* Fix flaky Ink reconciler tests caused by react-compiler memoization

Remove hard throw in createTextInstance that crashed when hostContext.isInsideText
was stale due to react-compiler element caching. Add timeout guards to prevent
test hangs when render errors prevent exit() from firing.

Co-Authored-By: Claude GLM-5.1 <noreply@openclaude.dev>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: GLM-5.1 <noreply@openclaude.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GLM unable to connect

4 participants