Skip to content

OpenClaw + codex-lb: visible intermediate assistant messages can stop before tool calls #285

@SHAREN

Description

@SHAREN

TL;DR

I initially thought this was a pure codex-lb transport bug. After spending a lot more time debugging OpenClaw itself, my current understanding is more nuanced:

  • I did hit real OpenClaw-side integration bugs first (No API key for provider: codex-lb, then Failed to extract accountId from token)
  • I locally patched those in OpenClaw and moved from the older HTTP /v1 path to the real Codex websocket path (openai-codex-responses -> /backend-api/codex)
  • that improved the transport/auth layer, but it did not fully fix the original symptom
  • the remaining symptom now looks more like an OpenClaw continuation/orchestration problem than a pure codex-lb transport problem

I am posting this because I may still be misunderstanding something, and I would really appreciate feedback from people who know this stack better.


Full story

About a month ago I installed codex-lb for the first time and started using it with OpenClaw.

Very quickly I noticed a strange behavior: the agent could send a visible intermediate message like:

"OK, I'll open the file and check it"

or

"First I'll send an intermediate update, then I'll run ping"

but after that, nothing happened. The next step, where the model was supposed to make a tool call, simply never came. The reasoning chain just stopped on a normal assistant text response.

An important detail: this was not a case where tool calls never worked at all. Tool calls did work in general. The problem was narrower: when the model first emitted a visible intermediate message and was then supposed to continue into a tool call, that chain often broke.

At first I assumed this was a codex-lb problem. I spent a long time digging into codex-lb itself, wondering whether it was dropping tool calls, mishandling websocket traffic, or doing something wrong in the proxy layer. At that time I never got to the real root cause, so I dropped the investigation.

Recently I came back to it, but this time I focused much more on OpenClaw itself.


What I found first: a separate OpenClaw-side auth bug

The first confirmed issue I hit was:

No API key for provider: codex-lb

At first this looked like codex-lb itself was rejecting the key. But after tracing the code, it turned out the problem was deeper and not actually in codex-lb.

What I found was roughly this:

  • OpenClaw resolved the custom provider key correctly in its model registry / auth path
  • somewhere in the embedded runtime session path, that key was effectively lost before the call reached pi-ai
  • so pi-ai ended up seeing provider codex-lb without an apiKey
  • and it threw No API key for provider: codex-lb before any real network request even reached codex-lb

So there was at least one real OpenClaw-side bug before the request even got far enough for codex-lb to matter.

I worked around that locally by restoring runtime API key propagation for the embedded agent session.


The next issue: Codex path expected an OpenAI-style token

Once the API key propagation issue was fixed, I hit another blocker:

Failed to extract accountId from token

This was a different problem.

At this point the key was reaching the transport layer, but the openai-codex-responses path expected something that looked like an OpenAI/Codex-style token and tried to extract chatgpt_account_id from it.

That makes sense for the official direct chatgpt.com/backend-api/codex path. But codex-lb uses its own client-side key format like sk-clb-..., and that key is not required to be an OpenAI JWT.

So the situation became:

  • the apiKey was now reaching the transport layer
  • but OpenClaw / pi-ai was still trying to interpret sk-clb-... as if it were an OpenAI-style token
  • it could not extract an accountId
  • and the Codex path still failed

At that point I started moving away from the older generic OpenAI-compatible HTTP path and tried to use the more native Codex route instead.


Why I switched from HTTP /v1 to the real Codex websocket path

Originally, OpenClaw was talking to codex-lb through the simple compatibility route:

  • baseUrl = /v1
  • api = openai-completions

That path did work, and this detail matters a lot: basic tool calls also worked there.

This is important because switching to websocket did not “turn tools on from zero”. Tools were already partially working before.

However, I started suspecting that the harder failure case — “visible intermediate message first, then expected tool call never comes” — might be related to the fact that OpenClaw was not using a true Codex-native transport, but only a generic HTTP compatibility path.

So I moved the integration toward:

  • openai-codex-responses
  • codex-lb
  • /backend-api/codex
  • websocket transport

The hope was that if I made OpenClaw use the same class of transport as the native Codex client, it would handle the pattern “intermediate visible message -> tool call -> continuation” correctly.


What I changed locally in OpenClaw

I ended up making local OpenClaw source changes in a few places.

The important ideas were:

  1. restore runtime API key propagation for custom providers
  2. add a proxy-aware Codex transport path for non-OpenAI base URLs
  3. keep downstream auth as Bearer sk-clb-...
  4. provide a synthetic chatgpt-account-id separately instead of trying to parse sk-clb-... as an OpenAI JWT

I want to be careful here: I am not claiming these patches are the architecturally correct upstream fix. I am only saying this is the path I used to isolate the problem and get much farther than before.

After those changes, a few important things became true:

  • the API key propagation problem was gone
  • the Failed to extract accountId from token problem was gone for the proxy Codex path
  • OpenClaw really did start talking to codex-lb over websocket, not just through /v1
  • tool-call transport itself became operational on that path

So at that point it was no longer correct to say “the websocket transport is broken” or “codex-lb cannot accept Codex-style traffic from OpenClaw”.


The crucial realization: basic tool calls already worked before websocket

This turned out to be really important.

Once I switched to websocket, I initially assumed that if the transport was now “correct”, then the original problem should disappear. But that turned out not to be true.

Basic tool calls had already worked before, on the older HTTP route /v1 + openai-completions.

So:

  • the websocket migration did not magically “enable tools”
  • it fixed transport/auth mismatches
  • it made the path more Codex-native
  • but it did not by itself solve the main remaining symptom

That is why, from the outside, it could still look like “nothing changed”, even though some very real lower-layer issues had actually been fixed.


What still remained broken

After all the transport/auth fixes, the original complaint was still there:

  • the model sends a visible intermediate message
  • it should then continue and call a tool
  • but the run ends like an ordinary assistant response with stop

At first I still suspected websocket or codex-lb, but more testing made that explanation weaker.

Here is what I was able to confirm:

1. In the failing sessions there were no new runtime/tool errors

During the failing windows I was no longer seeing things like:

  • No API key for provider: codex-lb
  • Failed to extract accountId from token
  • Unexpected server response: 500

So the lower transport/auth layer already looked clean.

2. A direct websocket probe to codex-lb could return message -> completed with no tool call at all

This was an important experiment.

I opened a direct websocket to codex-lb at /backend-api/codex/responses and used a prompt like “first send a short intermediate update, then call a tool”.

The model really could return:

  • message
  • text
  • completed

with no function_call.

So websocket transport by itself does not guarantee “after visible text there will definitely be a tool call”.

3. The same transport could also produce tool calls if the orchestration pressure changed

So the problem is not that websocket “cannot do tools”. It clearly can.

4. Codex CLI / Codex App through the same codex-lb behaved differently

This may be the most important observation.

When I tested similar scenarios through Codex CLI / Codex App, it looked like after an intermediate visible message they could continue the workflow not necessarily inside the exact same provider response, but by making additional websocket requests.

In other words, the native Codex client appears able to do something like:

  • show an intermediate message to the user
  • then make another model step
  • then call a tool
  • then continue again

OpenClaw, in the analogous situation, often behaves differently:

  • it receives a normal assistant text
  • it sees stop
  • it assumes the turn is over
  • and the loop does not continue

My current interpretation

Based on all of that, my current interpretation is:

  • codex-lb is not simply “dropping” tool calls
  • the websocket route in codex-lb is working
  • OpenClaw’s auth path can be made to work with it
  • basic tools are possible
  • but the remaining bug is higher-level, in agent-loop / continuation logic inside OpenClaw

So the problem now looks more like this:

if the model emits a visible intermediate assistant text without a tool call in the same response, OpenClaw too often treats that as the final end of the turn and does not continue, even though Codex App / CLI may continue with additional model calls.

That is a very important shift in focus.

Without making this explicit, it is very easy to go back to the wrong explanation again and say “then websocket must still be broken” or “then codex-lb must still be cutting something off”.


One more nuance: I have also seen something similar with the normal OpenClaw OpenAI Codex provider

I want to be careful here.

Through codex-lb this problem appeared much more often for me, and that is why I started the whole investigation.

But later I noticed that similar behavior could sometimes happen even with the ordinary built-in OpenClaw OpenAI Codex provider.

So my cautious conclusion is:

  • I do not think codex-lb is necessarily creating this behavior entirely from scratch
  • but through codex-lb it showed up much more frequently and much more clearly
  • so either the proxy path makes the model more likely to produce a plain text + stop turn
  • or the native provider path is simply better aligned with OpenClaw’s assumptions
  • or the native stack more often returns the tool call inside the same turn

But based on what I have so far, I would not say “this is only a proxy problem”. It looks more like the proxy path exposed an existing weakness in OpenClaw’s orchestration logic.


Why I am posting this

I want to be honest: I am not fully confident that my local OpenClaw patches were architecturally correct.

Part of this investigation and part of the code changes were done with the help of an agent, so while I think the symptom tracing and the resulting hypothesis are fairly strong, I absolutely accept that:

  • I may still be misunderstanding something
  • I may have patched the wrong layer in places
  • there may be a much simpler and more correct solution
  • this may already be a known OpenClaw behavior that I rediscovered in a messy way

That is why I am posting this not as “here is my final fix”, but as a detailed investigation path. I would really appreciate feedback from people who know this stack better.

I would especially appreciate comments from anyone who has seen this exact pattern:

  • the model first emits a visible intermediate message
  • then it is supposed to call a tool
  • but instead the reasoning chain stops
  • and the run ends as an ordinary assistant text response

Final short summary

If I compress the whole story into a sequence, it looks like this:

  • first I saw the symptom that tools did not fire after an intermediate visible message
  • at first I thought this was a codex-lb problem
  • I spent a long time unsuccessfully digging into codex-lb itself
  • then I switched to investigating OpenClaw
  • I found and locally worked around an OpenClaw issue where apiKey was lost for custom providers
  • I found and locally worked around a Codex transport issue involving chatgpt_account_id extraction from an opaque bearer token
  • I moved OpenClaw from the older HTTP /v1 route to websocket / openai-codex-responses / /backend-api/codex
  • I confirmed that the transport/auth layer really became functional
  • but the original symptom still remained
  • and my current conclusion is that the remaining bug is much more likely to be an OpenClaw continuation/orchestration issue than a pure codex-lb transport issue

If anyone has seen this before, or if I am misunderstanding some part of the intended integration, I would be very grateful for any feedback.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions