feat: add headless gRPC server for external agent integration#278
feat: add headless gRPC server for external agent integration#278kevincodex1 merged 17 commits intoGitlawb:mainfrom
Conversation
Vasanthdev2004
left a comment
There was a problem hiding this comment.
This is a promising direction, but I don't think it's safe to merge as-is.
Two blocker-level issues from my side:
-
The PR adds new dependencies without updating
bun.lock.
package.json:56-57adds@grpc/grpc-jsand@grpc/proto-loader, but the PR does not include abun.lockupdate. In the review worktree,bun install --frozen-lockfilefails immediately and requires regenerating the lockfile. Since this repo uses Bun and commitsbun.lock, I think the lockfile update needs to be part of the PR. -
The server binds an unauthenticated gRPC endpoint to
0.0.0.0by default, while the README sayslocalhost:50051.README.md:211says the service starts onlocalhost:50051src/grpc/server.ts:36-43actually binds to0.0.0.0:${port}withcreateInsecure()
That's a real security/behavior mismatch. Exposing an unauthenticated tool-executing endpoint on all interfaces by default is not something we should ship casually. At minimum this should default to
127.0.0.1/localhost, with any broader bind requiring an explicit opt-in.
I also tried the actual startup path locally via bun run scripts/start-grpc.ts, and the current implementation failed to bind with Failed to listen at 0.0.0.0, so I would definitely want the bind behavior tightened up before merge.
- Update bun.lock for new dependencies (frozen-lockfile CI fix) - Add multi-turn session persistence via initialMessages - Replace hardcoded done payload with real token counts - Default bind to localhost instead of 0.0.0.0
|
@Vasanthdev2004 Thanks for the thorough review! All four issues addressed:
Ready for re-review |
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Two blocker-level issues still stand out for me on the latest #278 head:
-
scripts/start-grpc.ts:13-21only callsenableConfigs()and then starts the gRPC server. It skips the normal startup prep that the main CLI does before running: applying safe managed env vars, loading saved provider profiles, hydrating stored credentials/tokens, and validating the resulting provider config. In practice that means the headless server can start in a different auth/provider state from normal OpenClaude, especially for saved profile flows and any credential path that relies on startup hydration. Since the README positions this as a usable headless entrypoint, I think it needs startup parity with the normal CLI bootstrap before merge. -
src/grpc/server.ts:179-180handlescancelby just callingcall.end(), but it never interrupts the in-flightQueryEngine.QueryEngineexplicitly exposesinterrupt()for this (src/QueryEngine.ts:1158-1159). As written, a client cancel closes the stream but can still leave the underlying model/tool execution running in the server process. For a headless integration API, that is a real behavior bug, not just a missing refinement.
The earlier blockers around lockfile/default bind/done payload look addressed, but I�d want these two fixed before calling the gRPC path mergeable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace enableConfigs() with await init() in start-grpc.ts for full bootstrap parity with the main CLI (env vars, CA certs, mTLS, proxy, OAuth, Windows shell) - Call engine.interrupt() before call.end() in the cancel handler so in-flight model/tool execution is actually stopped - Show done.full_text in the CLI client when no text_chunk was received, preventing silent drops when streaming is unavailable
Vasanthdev2004
left a comment
There was a problem hiding this comment.
I rechecked the latest head after the recent fixes, and the earlier startup/cancel blockers look addressed. I still have two blocker-level API-contract issues before I can approve this:
-
ChatRequest.provideris still ignored by the server.
The proto exposesprovideras an optional request field insrc/proto/openclaude.proto:35, butsrc/grpc/server.ts:64-107never readsreq.providerat all. The only request-scoped routing that currently happens isuserSpecifiedModel/fallbackModelfromreq.model, and the file still has a comment saying provider should be configured elsewhere. For an external integration surface, exposing a provider field that the server silently ignores is a correctness bug. -
session_idis still a dead field, so the published session contract does not actually work.
ClientMessagedefinessession_idat the top level insrc/proto/openclaude.proto:15-27, butsrc/grpc/server.tsnever uses it. Session history is kept only in the per-streampreviousMessagesarray (src/grpc/server.ts:56-57), so reconnecting with the samesession_iddoes not resume anything. The bundled CLI client also writessession_idintorequestinstead of the top-levelClientMessageenvelope (scripts/grpc-cli.ts:107-110), which reinforces that the field is not currently wired end-to-end.
The recent fixes definitely moved this closer, but I still need the request/session fields in the proto to either be implemented authoritatively or removed from the public contract before I can call it mergeable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move session_id from ClientMessage into ChatRequest to fix proto-loader oneofs encoding bug and make the field functional - Implement in-memory session store so reconnecting with the same session_id resumes conversation context across streams - Remove ChatRequest.provider — per-request provider routing requires global process.env mutation, unsafe for concurrent clients; provider is configured via env vars at server startup
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Rechecked the current head. The earlier provider/session field problems do look fixed, and I also reran �un run build plus �un run smoke in the review worktree. I also verified the gRPC server now binds on localhost:50051 when started locally.
I still see two blocker-level issues before I�d call this mergeable:
-
scripts/start-grpc.ts still does not mirror the normal CLI bootstrap path. It only calls init() (scripts/start-grpc.ts:13-21), but the normal CLI also hydrates Gemini/GitHub secure tokens, resolves the saved provider profile into startup env, and validates provider configuration before first use (src/entrypoints/cli.tsx:71-96). That means the headless server can still start in the wrong auth/provider state for saved-profile setups that rely on secure token hydration or startup profile resolution.
-
The gRPC contract for tool results is still inconsistent. The proto says ToolCallResult.tool_name (src/proto/openclaude.proto:71-75), but the server is filling that field with �lock.tool_use_id (src/grpc/server.ts:146-151). So clients receive a UUID in the field named ool_name, and the bundled CLI displays that value as if it were the tool name. If you want correlation, the API needs a separate ool_use_id field; otherwise this field should carry the actual tool name.
…field scripts/start-grpc.ts now runs the same provider/auth bootstrap as the normal CLI entrypoint: enableConfigs, safe env vars, Gemini/GitHub token hydration, saved-profile resolution with warn-and-fallback, and provider validation before the server binds. ToolCallResult.tool_name was being populated with the tool_use_id UUID. Added a toolNameById map (filled in canUseTool) so tool_name now carries the actual tool name (e.g. "Bash"). The UUID moves to a new tool_use_id field (proto field 4) for client-side correlation.
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Rechecked the current head.
I reran:
bun run buildbun run smoke
Both pass, and the earlier startup-profile / tool-result-name issues are definitely improved. I still see two blocker-level problems before I'd merge it:
- The gRPC contract still doesn't actually support end-to-end tool call correlation.
ToolCallResult now includes tool_use_id and comments that it is a correlation ID matching ToolCallStart (src/proto/openclaude.proto:71-76), but ToolCallStart still has no corresponding tool_use_id field at all (src/proto/openclaude.proto:64-68).
That means an external client still can't reliably match a tool_result back to a specific tool_start when the same tool is used multiple times. Right now the server only emits:
tool_start.tool_nametool_start.arguments_json
If correlation is part of the public contract, ToolCallStart needs the same ID too.
- The server still keeps work running when the client closes the stream without sending an explicit
CancelSignal.
The only place that interrupts the QueryEngine is the protocol-level cancel branch (src/grpc/server.ts:197-201). In call.on('end'), the current code just nulls the local reference and clears pending requests (src/grpc/server.ts:215-219).
So if a client disconnects or closes the stream mid-generation without first sending a CancelSignal, the underlying model/tool work can keep running in the server process even though the client is gone.
Those two feel like real API/lifecycle issues rather than polish, so I'd still want another pass here before merge.
…tream close
Two blocker-level issues flagged in code review:
- ToolCallStart was missing tool_use_id, making it impossible for clients
to correlate tool_start events with tool_result when the same tool runs
multiple times. Added tool_use_id = 3 to the proto message and populated
it from the toolUseID parameter in canUseTool.
- On stream close without an explicit CancelSignal the server only nulled
the engine reference, leaving the underlying model/tool work running
as an orphan. Added engine.interrupt() in the call.on('end') handler
to stop work immediately when the client disconnects.
…el writes Four lifecycle and contract issues identified during proactive review: - Pending permission Promises in canUseTool would hang forever if the client disconnected mid-stream. On call 'end', all pending resolvers are now called with 'no' so the engine can unblock and terminate. - The done message and session save could fire after call.end() when a CancelSignal arrived mid-generation. Added an `interrupted` flag set on both cancel and stream close to gate all post-loop writes. - The session map had no eviction policy, allowing unbounded memory growth. Capped at MAX_SESSIONS=1000 with FIFO eviction of the oldest entry. - Field 3 was silently absent from ChatRequest. Added `reserved 3` to document the gap and prevent accidental reuse in future.
| sessionId = req.session_id || '' | ||
|
|
||
| // Load previous messages from session store (cross-stream persistence) | ||
| if (sessionId && this.sessions.has(sessionId)) { |
There was a problem hiding this comment.
This still leaks conversation history across different session IDs on the same stream. On the current head, previousMessages is only replaced when the incoming session_id already exists in his.sessions; otherwise it keeps whatever was accumulated from the prior request on the stream. Direct repro with a mocked QueryEngine: send one request with session_id = 's1', let it persist one message, then send a second request on the same stream with a brand-new session_id = 's2'. The second QueryEngine is constructed with the first request's history in initialMessages instead of starting fresh. That breaks the published session contract because changing to a new session ID should not inherit another session's transcript.
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Rechecked the latest head 98339c33ba3480b484bd30752e70ffaf6c9602c6 against current origin/main.
A lot of the earlier blocker feedback is addressed on this head:
bun.lockis updated andbun install --frozen-lockfilesucceeds- the server now defaults to
localhost session_idandtool_use_idare wired into the proto/server path- startup bootstrap is much closer to the normal CLI path
- disconnect cleanup now interrupts the engine and resolves pending permission prompts
I still can't approve it because there is one real session-contract bug left:
-
src/grpc/server.tsleaks conversation history across different session IDs on the same stream.
On the current head,previousMessagesis only replaced when the incomingsession_idalready exists inthis.sessions; otherwise it keeps whatever was accumulated from the prior request on the stream.Direct repro on this head with a mocked
QueryEngine:- send one request with
session_id = 's1' - let it persist one message
- send a second request on the same stream with a brand-new
session_id = 's2' - the second
QueryEngineis constructed with the first request's history ininitialMessagesinstead of starting fresh
That breaks the published session contract because changing to a new session ID should not inherit another session's transcript.
- send one request with
Fresh verification on this head:
bun install --frozen-lockfile-> successbun test src/commands/model/model.test.tsx src/utils/model/providers.test.ts-> 12 passbun run build-> successbun run smoke-> success- direct mocked repro of the session leak above -> reproduced
I also checked basic startup separately: this Windows machine cannot bind even a minimal standalone @grpc/grpc-js server on localhost, so I am not using local bind failure as evidence against this PR. The blocker above is branch-specific and directly reproduced.
…ion history leak previousMessages was declared at stream scope and only overwritten when the incoming session_id already existed in the session store. A second request on the same stream with a new session_id would silently inherit the first request's conversation history in initialMessages instead of starting fresh, violating the session contract. Fix: reset previousMessages to [] at the start of each ChatRequest before the session-store lookup.
…concurrent ChatRequest Two stream-scoped state bugs found during proactive audit: - The `interrupted` flag was never reset between requests on the same stream. If the first request was cancelled, all subsequent requests would silently skip the done message, causing the client to hang. - A second ChatRequest arriving while the first was still processing would overwrite the engine reference, corrupting the lifecycle of both requests. Now returns ALREADY_EXISTS error instead. Engine is nulled after the for-await loop completes so subsequent requests can proceed normally.
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Rechecked the latest head 466d4cf33222f3e0529c8a1b9f85bbd9f77274a8 against current origin/main.
The previous session-contract blocker is fixed on this revision.
What I verified on this head:
- direct mocked repro of the old leak is fixed:
- first request on a stream with
session_id = 's1' - second request on the same stream with a brand-new
session_id = 's2' QueryEngineinitial messages are now[]then[], so the new session starts fresh instead of inheriting the prior transcript
- first request on a stream with
- direct mocked repro of the intended persistence path still works:
- first stream stores history for
session_id = 's1' - second stream with the same
session_id = 's1'receives that history ininitialMessages
- first stream stores history for
- direct mocked repro of normal sequential requests on a single stream still works:
- first request completes with
done - second request on the same stream also completes with its own
done
- first request completes with
- direct mocked repro of overlapping requests returns the expected
ALREADY_EXISTSerror while allowing the first request to finish normally bun install --frozen-lockfile-> successbun test src/commands/model/model.test.tsx src/utils/model/providers.test.ts-> 12 passbun run build-> successbun run smoke-> success
I didn't find a remaining branch-specific blocker on the current head.
Residual risk is still coverage rather than demonstrated breakage: there are no dedicated automated tests for src/grpc/server.ts or the session/lifecycle paths, so the behavior above is currently protected by direct repro rather than a checked-in test file.
…b#278) * gRPC Server * gRPC fix * UpdProto * fix: address PR review feedback for gRPC server - Update bun.lock for new dependencies (frozen-lockfile CI fix) - Add multi-turn session persistence via initialMessages - Replace hardcoded done payload with real token counts - Default bind to localhost instead of 0.0.0.0 * fix(grpc): startup parity, cancel interrupt, and cli text fallback - Replace enableConfigs() with await init() in start-grpc.ts for full bootstrap parity with the main CLI (env vars, CA certs, mTLS, proxy, OAuth, Windows shell) - Call engine.interrupt() before call.end() in the cancel handler so in-flight model/tool execution is actually stopped - Show done.full_text in the CLI client when no text_chunk was received, preventing silent drops when streaming is unavailable * fix(grpc): wire session_id end-to-end and remove dead provider field - Move session_id from ClientMessage into ChatRequest to fix proto-loader oneofs encoding bug and make the field functional - Implement in-memory session store so reconnecting with the same session_id resumes conversation context across streams - Remove ChatRequest.provider — per-request provider routing requires global process.env mutation, unsafe for concurrent clients; provider is configured via env vars at server startup * fix(grpc): mirror CLI auth bootstrap in start-grpc and fix tool_name field scripts/start-grpc.ts now runs the same provider/auth bootstrap as the normal CLI entrypoint: enableConfigs, safe env vars, Gemini/GitHub token hydration, saved-profile resolution with warn-and-fallback, and provider validation before the server binds. ToolCallResult.tool_name was being populated with the tool_use_id UUID. Added a toolNameById map (filled in canUseTool) so tool_name now carries the actual tool name (e.g. "Bash"). The UUID moves to a new tool_use_id field (proto field 4) for client-side correlation. * fix(grpc): add tool_use_id to ToolCallStart and interrupt engine on stream close Two blocker-level issues flagged in code review: - ToolCallStart was missing tool_use_id, making it impossible for clients to correlate tool_start events with tool_result when the same tool runs multiple times. Added tool_use_id = 3 to the proto message and populated it from the toolUseID parameter in canUseTool. - On stream close without an explicit CancelSignal the server only nulled the engine reference, leaving the underlying model/tool work running as an orphan. Added engine.interrupt() in the call.on('end') handler to stop work immediately when the client disconnects. * fix(grpc): resolve pending promises on disconnect and guard post-cancel writes Four lifecycle and contract issues identified during proactive review: - Pending permission Promises in canUseTool would hang forever if the client disconnected mid-stream. On call 'end', all pending resolvers are now called with 'no' so the engine can unblock and terminate. - The done message and session save could fire after call.end() when a CancelSignal arrived mid-generation. Added an `interrupted` flag set on both cancel and stream close to gate all post-loop writes. - The session map had no eviction policy, allowing unbounded memory growth. Capped at MAX_SESSIONS=1000 with FIFO eviction of the oldest entry. - Field 3 was silently absent from ChatRequest. Added `reserved 3` to document the gap and prevent accidental reuse in future. * fix(grpc): reset previousMessages on each new request to prevent session history leak previousMessages was declared at stream scope and only overwritten when the incoming session_id already existed in the session store. A second request on the same stream with a new session_id would silently inherit the first request's conversation history in initialMessages instead of starting fresh, violating the session contract. Fix: reset previousMessages to [] at the start of each ChatRequest before the session-store lookup. * fix(grpc): reset interrupted flag between requests and guard against concurrent ChatRequest Two stream-scoped state bugs found during proactive audit: - The `interrupted` flag was never reset between requests on the same stream. If the first request was cancelled, all subsequent requests would silently skip the done message, causing the client to hang. - A second ChatRequest arriving while the first was still processing would overwrite the engine reference, corrupting the lifecycle of both requests. Now returns ALREADY_EXISTS error instead. Engine is nulled after the for-await loop completes so subsequent requests can proceed normally. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Impact
Testing
Notes