Enhance BrowserOS compatibility and add tool-calling mode#1
Enhance BrowserOS compatibility and add tool-calling mode#1copsys wants to merge 6 commits intovkop007:mainfrom
Conversation
Agent-Logs-Url: https://github.com/copsys/codex-app-proxy/sessions/41e79ae7-148a-4f6d-970c-4f5ad05386fd Co-authored-by: copsys <31281180+copsys@users.noreply.github.com>
Agent-Logs-Url: https://github.com/copsys/codex-app-proxy/sessions/41e79ae7-148a-4f6d-970c-4f5ad05386fd Co-authored-by: copsys <31281180+copsys@users.noreply.github.com>
…rowseros Add BrowserOS tool-calling mode to prevent environment-refusal responses
There was a problem hiding this comment.
Pull request overview
This PR extends the local Codex proxy to better support OpenAI-compatible tool-calling flows and “BrowserOS-style” agentic workflows, including emitting tool-call events during streaming and documenting the new request parameters.
Changes:
- Adds request parameter handling for
tools,tool_choice, andbrowseros_modeand maps tool-call events into OpenAI’s streaming/non-streaming response shapes. - Introduces tool-call instruction injection +
<tool_call>parsing utilities to translate model output into OpenAI tool-call objects. - Updates README with BrowserOS/tool-calling configuration guidance.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/index.ts | Parses tool-related request fields; emits tool_calls in SSE streaming chunks and returns tool calls in non-streaming responses. |
| src/codex.ts | Adds tool-related options + introduces parseToolCalls and buildToolInstructions helpers; extends stream event union with tool_calls. |
| src/codex-client.ts | Injects tool instructions into base instructions; buffers output to parse tool calls; adds agentic approval-handling behavior. |
| README.md | Documents tools, tool_choice, and browseros_mode, plus BrowserOS troubleshooting/config. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const pushCall = (raw: any) => { | ||
| const name = raw?.name || raw?.toolName || raw?.function?.name || ""; | ||
| const argsRaw = | ||
| raw?.arguments ?? raw?.input ?? raw?.parameters ?? raw?.function?.arguments; | ||
| if (!name) return; | ||
| const args = | ||
| typeof argsRaw === "string" | ||
| ? argsRaw | ||
| : JSON.stringify(argsRaw ?? {}); | ||
| const key = `${name}::${args}`; |
There was a problem hiding this comment.
parseToolCalls is very permissive: any JSON object/array (including fenced blocks or the whole response) that happens to contain a name field will be interpreted as a tool call, even if it’s just ordinary JSON output. In tool mode this can incorrectly flip responses into tool_calls and break clients. Consider restricting parsing to explicit <tool_call>…</tool_call> blocks (or objects that match the OpenAI tool call schema: type:function + function.name + function.arguments), and require an arguments/function.arguments field before accepting a call.
| approvalPolicy: "never", | ||
| sandboxPolicy: { | ||
| type: "readOnly", | ||
| access: { type: "fullAccess" }, | ||
| }, | ||
| sandboxPolicy: hasTools | ||
| ? { type: "readOnly", access: { type: "fullAccess" } } | ||
| : { type: "dangerFullAccess" }, | ||
| model: options.model, |
There was a problem hiding this comment.
The sandbox policy now falls back to { type: "dangerFullAccess" } when hasTools is false. This is a significant security/operational change from a read-only sandbox and means normal chat requests can potentially execute with full host access. Consider keeping the default read-only sandbox and making any elevated sandbox mode an explicit, opt-in request parameter (and ideally disabled by default).
| } else if (method === "commandExecution/requestApproval") { | ||
| // Auto-approve command executions for agentic behavior | ||
| const approvalId = params?.approvalId; | ||
| if (approvalId) { | ||
| console.log( | ||
| `[CodexClient] Auto-approving command execution: ${params?.command || "unknown"}`, | ||
| ); | ||
| this.request("commandExecution/sendApproval", { | ||
| approvalId, | ||
| decision: "accept", | ||
| }).catch(() => {}); | ||
| } | ||
| } else if (method === "fileChange/requestApproval") { | ||
| // Auto-approve file changes for agentic behavior | ||
| const approvalId = params?.approvalId; | ||
| if (approvalId) { | ||
| console.log(`[CodexClient] Auto-approving file change`); | ||
| this.request("fileChange/sendApproval", { | ||
| approvalId, | ||
| decision: "accept", | ||
| }).catch(() => {}); | ||
| } |
There was a problem hiding this comment.
commandExecution/requestApproval and fileChange/requestApproval events are auto-approved unconditionally. This effectively bypasses any human/host-side approval gate and can allow the model to execute commands or modify files without explicit operator consent, especially when combined with permissive sandboxing. Please gate this behavior behind an explicit, opt-in option (e.g., only when browseros_mode === true), and consider logging + declining by default.
| // If tools are present, check for tool calls in accumulated text | ||
| if (hasTools && accumulatedText) { | ||
| const toolCalls = parseToolCalls(accumulatedText); | ||
| if (toolCalls.length > 0) { | ||
| // Strip tool_call tags from text, emit remaining as content | ||
| const textWithoutToolCalls = accumulatedText | ||
| .replace(/<tool_call>[\s\S]*?<\/tool_call>/g, "") | ||
| .trim(); | ||
| if (textWithoutToolCalls) { | ||
| eventQueue.push({ | ||
| type: "message", | ||
| text: textWithoutToolCalls, | ||
| }); | ||
| } | ||
| eventQueue.push({ type: "tool_calls", calls: toolCalls }); | ||
| } else { | ||
| console.warn( | ||
| `[CodexClient] Tools provided but no tool calls parsed. Assistant preview: ${accumulatedText.slice(0, 300).replace(/\s+/g, " ")}`, | ||
| ); | ||
| // No tool calls found, emit as plain message | ||
| eventQueue.push({ type: "message", text: accumulatedText }); | ||
| } | ||
| } | ||
| turnDone = true; |
There was a problem hiding this comment.
Tool-call parsing/emission runs both on the turn/completed notification and again on the task_complete event, with no guard to prevent emitting message/tool_calls twice for the same turn. This can lead to duplicate tool call events or duplicated assistant content depending on which events fire. Track a per-turn flag (e.g., finalized = true) so final parsing/emission only happens once.
| // If tools are present, check for tool calls in accumulated text | |
| if (hasTools && accumulatedText) { | |
| const toolCalls = parseToolCalls(accumulatedText); | |
| if (toolCalls.length > 0) { | |
| // Strip tool_call tags from text, emit remaining as content | |
| const textWithoutToolCalls = accumulatedText | |
| .replace(/<tool_call>[\s\S]*?<\/tool_call>/g, "") | |
| .trim(); | |
| if (textWithoutToolCalls) { | |
| eventQueue.push({ | |
| type: "message", | |
| text: textWithoutToolCalls, | |
| }); | |
| } | |
| eventQueue.push({ type: "tool_calls", calls: toolCalls }); | |
| } else { | |
| console.warn( | |
| `[CodexClient] Tools provided but no tool calls parsed. Assistant preview: ${accumulatedText.slice(0, 300).replace(/\s+/g, " ")}`, | |
| ); | |
| // No tool calls found, emit as plain message | |
| eventQueue.push({ type: "message", text: accumulatedText }); | |
| } | |
| } | |
| turnDone = true; | |
| if (!turnDone) { | |
| // If tools are present, check for tool calls in accumulated text | |
| if (hasTools && accumulatedText) { | |
| const toolCalls = parseToolCalls(accumulatedText); | |
| if (toolCalls.length > 0) { | |
| // Strip tool_call tags from text, emit remaining as content | |
| const textWithoutToolCalls = accumulatedText | |
| .replace(/<tool_call>[\s\S]*?<\/tool_call>/g, "") | |
| .trim(); | |
| if (textWithoutToolCalls) { | |
| eventQueue.push({ | |
| type: "message", | |
| text: textWithoutToolCalls, | |
| }); | |
| } | |
| eventQueue.push({ type: "tool_calls", calls: toolCalls }); | |
| } else { | |
| console.warn( | |
| `[CodexClient] Tools provided but no tool calls parsed. Assistant preview: ${accumulatedText.slice(0, 300).replace(/\s+/g, " ")}`, | |
| ); | |
| // No tool calls found, emit as plain message | |
| eventQueue.push({ type: "message", text: accumulatedText }); | |
| } | |
| } | |
| turnDone = true; | |
| } |
This pull request adds robust support for OpenAI-compatible tool-calling and BrowserOS agentic workflows to the proxy, both in the backend (
src/codex-client.ts,src/codex.ts,src/index.ts) and in the documentation (README.md). It introduces new parameters (tools,tool_choice,browseros_mode), ensures correct prompt formatting and output parsing for agentic clients, and updates the streaming API to emit tool call events in the OpenAI tool-calling format. These changes enable seamless integration with agentic clients like BrowserOS and improve the developer experience for tool-based workflows.Agentic Tool-Calling and BrowserOS Support
tools,tool_choice, andbrowseros_modeparameters throughout the proxy, including automatic strict mode for BrowserOS when tools are provided, and clear documentation of these features inREADME.md. [1] [2] [3] [4]CodexClientto handle agentic tool-calling, including new instruction blocks, parsing<tool_call>tags, and handling tool call output in both streaming and non-streaming modes. [1] [2] [3] [4] [5]Streaming API Enhancements
tool_callsas the finish reason when tool calls are detected.Developer Experience and Documentation
README.mdto explain BrowserOS configuration, tool-calling fields, and new parameters, including troubleshooting for agentic clients. [1] [2]These changes collectively make the proxy compatible with modern OpenAI agentic workflows and improve usability for tool-based LLM applications.