Skip to content

feat: enable Computer Use with macOS + Windows + Linux support#98

Merged
claude-code-best merged 3 commits intoclaude-code-best:mainfrom
amDosion:feat/computer-use-windows
Apr 3, 2026
Merged

feat: enable Computer Use with macOS + Windows + Linux support#98
claude-code-best merged 3 commits intoclaude-code-best:mainfrom
amDosion:feat/computer-use-windows

Conversation

@amDosion
Copy link
Copy Markdown
Contributor

@amDosion amDosion commented Apr 3, 2026

Summary

三平台 Computer Use 支持。参考项目仅 macOS,本次扩展为 macOS + Windows + Linux。

Phase 1: 替换 @ant/computer-use-mcp stub(12 文件,6517 行)
Phase 2: 移除 src/ 中 8 处 macOS 硬编码(剪贴板分发、粘贴快捷键、CFRunLoop、ESC 热键、权限检查、平台标识)
Phase 3: 新增 Linux 后端(xdotool/scrot/xrandr/wmctrl)

Architecture

packages/@ant/computer-use-{input,swift}/src/
├── index.ts              ← dispatcher
├── types.ts              ← 共享接口
└── backends/
    ├── darwin.ts          ← macOS AppleScript(不改)
    ├── win32.ts           ← Windows PowerShell
    └── linux.ts           ← Linux xdotool/scrot(新增)

Windows verification (x64)

  • isSupported: true
  • 鼠标定位 + 前台窗口
  • 双显示器 2560x1440 × 2
  • 全屏截图 3MB
  • bun run build 463 files

Test plan

  • Windows: bun run dev 启动无报错
  • macOS: backends/darwin.ts 逻辑未改
  • Linux: xdotool 安装后 input 可用

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Computer Use now supports Windows and Linux (input, screenshots, window/window-level capture, OCR, UI automation, and cross-platform clipboard/paste behavior).
    • Improved click validation and app permission tiering for safer interactions.
  • Documentation

    • Added multi-phase rollout docs and a Windows enhancement guide.
  • Refactor

    • Modular platform-dispatch backends and reorganized MCP server/tools; Chicago MCP enabled by default in dev builds.

Phase 1: Replace @ant/computer-use-mcp stub (12 files, 6517 lines).

Phase 2: Remove 8 macOS-only guards in src/:
- main.tsx: remove getPlatform()==='macos' check
- swiftLoader.ts: remove darwin-only throw
- executor.ts: extend platform guard, clipboard dispatch, paste key
- drainRunLoop.ts: skip CFRunLoop pump on non-darwin
- escHotkey.ts: non-darwin returns false (Ctrl+C fallback)
- hostAdapter.ts: non-darwin permissions granted
- common.ts: dynamic platform + screenshotFiltering
- gates.ts: enabled:true, subscription check removed

Phase 3: Add Linux backends (xdotool/scrot/xrandr/wmctrl):
- computer-use-input/backends/linux.ts (173 lines)
- computer-use-swift/backends/linux.ts (278 lines)

Verified on Windows x64: mouse, screenshot, displays, foreground app.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d9c379b8-549c-4fa3-9a24-2bd222d35230

📥 Commits

Reviewing files that changed from the base of the PR and between 3707c3c and ca086b0.

📒 Files selected for processing (2)
  • src/utils/computerUse/executor.ts
  • src/utils/computerUse/swiftLoader.ts

📝 Walkthrough

Walkthrough

Reworks Computer Use from macOS-only to a platform-dispatch architecture: adds Windows and Linux backends for input and swift, introduces MCP executor/tools/permissions/pixel-compare/OCR/UIA modules, enables CHICAGO_MCP in defaults, and removes macOS-only runtime guards across client utilities.

Changes

Cohort / File(s) Summary
Docs & Build
DEV-LOG.md, docs/features/computer-use.md, docs/features/computer-use-windows-enhancement.md, build.ts, scripts/dev.ts
Adds cross-platform design and Windows enhancement docs; enables CHICAGO_MCP in default build/dev feature lists.
Input API & Types
packages/@ant/computer-use-input/src/types.ts, packages/@ant/computer-use-input/src/index.ts
Introduce InputBackend/FrontmostAppInfo types and replace inline macOS input logic with a runtime backend dispatcher and isSupported binding.
Input Backends
packages/@ant/computer-use-input/src/backends/*
packages/@ant/computer-use-input/src/backends/darwin.ts, .../linux.ts, .../win32.ts
Add platform-specific input backends: macOS (osascript/JXA), Linux (xdotool), Windows (PowerShell/Win32 interop) implementing mouse/keyboard/clipboard/frontmost app APIs.
Swift API & Types
packages/@ant/computer-use-swift/src/types.ts, packages/@ant/computer-use-swift/src/index.ts
Define Swift backend interfaces and replace prior macOS-only implementation with backend loading/dispatch and fallback stubs.
Swift Backends
packages/@ant/computer-use-swift/src/backends/*
darwin.ts, linux.ts, win32.ts
Add display/apps/screenshot implementations per platform (macOS: JXA/screencapture; Linux: scrot/xrandr; Windows: PowerShell/Win32 with PrintWindow and PNG capture).
MCP Core & Types
packages/@ant/computer-use-mcp/src/{types,executor}.ts, packages/@ant/computer-use-mcp/src/index.ts
Replace stub entry with detailed executor and host/session type definitions; change index to re-export modular implementations.
MCP Tools & Server
packages/@ant/computer-use-mcp/src/{tools,mcpServer}.ts
Add tool schema builder and MCP server/session binding including per-dispatch overrides, permission dialog merging, and dispatch gating.
MCP Utilities
packages/@ant/computer-use-mcp/src/{deniedApps,keyBlocklist,sentinelApps,subGates,imageResize,pixelCompare}.ts
Implement denied-app categorization and tiers, key normalization/blocklist, sentinel sets, sub-gates presets, image resize algorithm, pixel-staleness compare and click validation.
Windows Utilities
src/utils/computerUse/win32/*
windowEnum.ts, windowCapture.ts, uiAutomation.ts, ocr.ts
Add Win32 window enumeration (EnumWindows), HWND/title-based capture via PrintWindow, UI Automation (tree/find/click/setValue/elementAtPoint), and OCR via Windows.Media.Ocr (PowerShell wrappers).
Client Runtime & Guards
src/main.tsx, src/utils/computerUse/{common,drainRunLoop,escHotkey,executor,gates,hostAdapter,swiftLoader}.ts
Remove or relax macOS-only guards; add cross-platform clipboard implementations and paste key selection, runloop/hotkey fallbacks for non-darwin, permission short-circuits on non-darwin, and changed swift loader initialization/caching.
Exports & Rewiring
various packages (packages/@ant/..., src/utils/...)
Rewire exports to delegate to new modules, update public type surfaces, and remove many inline stub implementations.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Dispatcher as Platform Dispatcher
    participant Backend as Platform Backend
    participant OSTools as OS Tooling
    participant OS as Operating System

    Client->>Dispatcher: call input/display/screenshot API
    Dispatcher->>Dispatcher: detect process.platform & load backend
    Dispatcher->>Backend: invoke normalized API (moveMouse/screenshot/ocr/...)
    Backend->>OSTools: spawn platform command (osascript/xdotool/PowerShell)
    OSTools->>OS: native API calls (CoreGraphics/Win32/X11)
    OS-->>OSTools: bytes/json/stdout
    OSTools-->>Backend: parse & normalize result
    Backend-->>Dispatcher: return platform-agnostic result
    Dispatcher-->>Client: deliver response
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

🐰 I hopped from Darwin's soft green code,
To Windows panes and Linux road.
Backends stitched where calls belong,
Dispatch sings the same old song.
A tiny carrot for cross‑platform dawn.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.81% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the primary architectural change: enabling Computer Use feature with cross-platform support (macOS, Windows, Linux).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/utils/computerUse/gates.ts (1)

12-21: ⚠️ Potential issue | 🟠 Major

Fail closed when the Chicago config is absent.

readConfig() overlays a partial payload on top of DEFAULTS, so switching this default to true means any missing or stale enabled field now enables Computer Use for every eligible user. That turns config misses into an implicit rollout.

Suggested change
 const DEFAULTS: ChicagoConfig = {
-  enabled: true,
+  enabled: false,
   pixelValidation: false,
   clipboardPasteMultiline: true,
   mouseAnimation: true,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/gates.ts` around lines 12 - 21, DEFAULTS currently
enables Chicago by default which causes readConfig (which overlays partial
payloads onto DEFAULTS) to implicitly turn features on when the config is
missing or stale; change DEFAULTS.enabled to false so the system fails closed,
and confirm the behavior in the overlay code that uses readConfig (refer to
DEFAULTS, ChicagoConfig, and readConfig) still merges partial payloads but will
not enable Computer Use when the config is absent or incomplete.
build.ts (1)

13-19: ⚠️ Potential issue | 🟠 Major

Don't ship CHICAGO_MCP in DEFAULT_BUILD_FEATURES.

This makes the flag effectively always-on in release builds, so FEATURE_CHICAGO_MCP=1 no longer controls rollout. Keep the build default to AGENT_TRIGGERS_REMOTE and let Chicago stay opt-in via env/dev tooling.

Suggested change
-const DEFAULT_BUILD_FEATURES = ["AGENT_TRIGGERS_REMOTE", "CHICAGO_MCP"];
+const DEFAULT_BUILD_FEATURES = ["AGENT_TRIGGERS_REMOTE"];

Based on learnings, build mode enables AGENT_TRIGGERS_REMOTE only, and specific features can be enabled via FEATURE_<FLAG_NAME>=1 environment variables.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@build.ts` around lines 13 - 19, DEFAULT_BUILD_FEATURES currently includes
"CHICAGO_MCP", which forces that feature on in builds; remove "CHICAGO_MCP" so
only "AGENT_TRIGGERS_REMOTE" remains in DEFAULT_BUILD_FEATURES, leaving Chicago
opt-in via the FEATURE_CHICAGO_MCP env var flow that uses envFeatures and
features; update the DEFAULT_BUILD_FEATURES declaration (and any nearby comment)
to reflect this change so rollouts remain controlled by FEATURE_<FLAG_NAME>
environment variables.
src/utils/computerUse/drainRunLoop.ts (1)

61-79: ⚠️ Potential issue | 🟠 Major

Keep the 30s timeout on non-macOS calls.

Line 62 skips not just the CFRunLoop pump but also the timeout/unhandled-rejection guard. A hung Windows/Linux backend call will now block forever instead of surfacing computer-use native call exceeded 30000ms.

Suggested change
 export async function drainRunLoop<T>(fn: () => Promise<T>): Promise<T> {
-  if (process.platform !== 'darwin') return fn()
-  retain()
+  const needsPump = process.platform === 'darwin'
+  if (needsPump) retain()
   let timer: ReturnType<typeof setTimeout> | undefined
   try {
     // If the timeout wins the race, fn()'s promise is orphaned — a late
@@
     const timeout = withResolvers<never>()
     timer = setTimeout(timeoutReject, TIMEOUT_MS, timeout.reject)
     return await Promise.race([work, timeout.promise])
   } finally {
     clearTimeout(timer)
-    release()
+    if (needsPump) release()
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/drainRunLoop.ts` around lines 61 - 79, The current
early return in drainRunLoop skips the timeout/unhandled-rejection guard on
non-darwin platforms; change drainRunLoop so the timeout logic (withResolvers,
timer via setTimeout(timeoutReject, TIMEOUT_MS, ...), attaching work.catch(() =>
{}), and Promise.race) always runs, but only call retain() and release() (and
any CFRunLoop-specific logic) when process.platform === 'darwin'; update the
implementation around the symbols drainRunLoop, retain, release, withResolvers,
timeoutReject, TIMEOUT_MS and ensure timer is cleared in finally so non-macOS
calls still get the 30s timeout and unhandled-rejection protection.
src/main.tsx (1)

1605-1623: ⚠️ Potential issue | 🔴 Critical

Don't inject Chicago MCP after the enterprise MCP policy gate.

This block appends a type: 'stdio' server to dynamicMcpConfig after doesEnterpriseMcpConfigExist() / areMcpConfigsAllowedWithEnterpriseMcpConfig() have already run. That means orgs that intentionally forbid dynamic MCP servers still get Computer Use injected. Please re-run the policy check after this merge, or make the exemption explicit in the policy layer instead of bypassing it here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main.tsx` around lines 1605 - 1623, The Chicago MCP injection currently
appends to dynamicMcpConfig and allowedTools after the enterprise MCP policy
checks, bypassing org-level bans; modify the flow so you re-run the MCP policy
check (use areMcpConfigsAllowedWithEnterpriseMcpConfig() or
doesEnterpriseMcpConfigExist() / the existing policy gating functions)
immediately after calling setupComputerUseMCP() and only merge mcpConfig / push
cuTools when that re-check returns allowed, or alternatively move the Chicago
setup to run before the policy decision and let the policy layer explicitly
exempt or allow it; reference getChicagoEnabled, setupComputerUseMCP,
dynamicMcpConfig, allowedTools, doesEnterpriseMcpConfigExist, and
areMcpConfigsAllowedWithEnterpriseMcpConfig when making the change.
🟠 Major comments (16)
packages/@ant/computer-use-mcp/src/keyBlocklist.ts-124-128 (1)

124-128: ⚠️ Potential issue | 🟠 Major

Handle Linux system shortcuts in this API.

This package is now three-platform, but isSystemKeyCombo() still only accepts "darwin" | "win32". That leaves the Linux backend without a safe way to enforce the systemKeyCombos grant, so common window-manager shortcuts can slip through.

Suggested change
+const BLOCKED_LINUX = new Set([
+  "alt+f4",
+  "alt+tab",
+  "meta+l",
+  "meta+d",
+]);
+
 export function isSystemKeyCombo(
   seq: string,
-  platform: "darwin" | "win32",
+  platform: "darwin" | "win32" | "linux",
 ): boolean {
-  const blocklist = platform === "darwin" ? BLOCKED_DARWIN : BLOCKED_WIN32;
+  const blocklist =
+    platform === "darwin"
+      ? BLOCKED_DARWIN
+      : platform === "linux"
+        ? BLOCKED_LINUX
+        : BLOCKED_WIN32;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-mcp/src/keyBlocklist.ts around lines 124 - 128,
Update isSystemKeyCombo to accept Linux as a platform and enforce Linux
shortcuts: change the platform type union to include "linux" (or use the broader
platform type), add or reference a BLOCKED_LINUX set, and branch similarly to
the existing BLOCKED_DARWIN/BLOCKED_WIN32 logic so the function uses
BLOCKED_LINUX when platform === "linux"; ensure callers are updated where
necessary to pass "linux" so the Linux backend correctly enforces
systemKeyCombos.
src/utils/computerUse/executor.ts-300-302 (1)

300-302: ⚠️ Potential issue | 🟠 Major

Windows/Linux still inherit the macOS host-ID path.

Turning on win32/linux here sends those platforms through the existing getTerminalBundleId()/CLI_HOST_BUNDLE_ID flow unchanged. That flow only produces macOS bundle IDs or the darwin sentinel, but the new cross-platform input contract already uses non-bundle identifiers on Windows/Linux. prepareDisplay, previewHideSet, and screenshot exclusion therefore can't reliably recognize the host terminal on the new platforms.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/executor.ts` around lines 300 - 302, createCliExecutor
currently lets win32/linux flow through the macOS bundle-ID path
(getTerminalBundleId()/CLI_HOST_BUNDLE_ID), causing host identification
mismatches; update createCliExecutor to branch by platform so that for 'darwin'
it uses getTerminalBundleId()/CLI_HOST_BUNDLE_ID as before, but for 'win32' and
'linux' it assigns the new non-bundle host identifier used by the cross-platform
input contract (and does not call getTerminalBundleId), and ensure callers that
rely on host identity (prepareDisplay, previewHideSet, screenshot exclusion
logic) check both macOS bundle IDs and the new Windows/Linux host identifier so
the terminal is recognized correctly on all platforms.
packages/@ant/computer-use-mcp/src/sentinelApps.ts-13-36 (1)

13-36: ⚠️ Potential issue | 🟠 Major

Extend sentinel IDs beyond macOS.

This catalog is still bundle-id-only. With the new Windows/Linux backends, shell/file-system/system-settings apps on those platforms will never match these sets, so the approval UI loses its escalation warning exactly where the PR is adding support. Please add Windows/Linux sentinel IDs or normalize app identities before categorization.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-mcp/src/sentinelApps.ts around lines 13 - 36, The
current sentinel sets (SHELL_ACCESS_BUNDLE_IDS, FILESYSTEM_ACCESS_BUNDLE_IDS,
SYSTEM_SETTINGS_BUNDLE_IDS) and exported SENTINEL_BUNDLE_IDS only contain macOS
bundle IDs so non-macOS backends won't match; update sentinel logic by either
(A) expanding these sets to include Windows and Linux identifiers (executable
names, package IDs, WSL/Flatpak/Snap IDs) for common
shells/file-managers/system-settings, and/or (B) add a normalization layer when
categorizing apps that maps platform-specific identity fields (e.g., process
executable name, binary path, package name) into a canonical key before
membership checks against SENTINEL_BUNDLE_IDS; modify the categorization code
that uses SentinelCategory ("shell" | "filesystem" | "system_settings") to
consult the normalized identity or the extended sets so approvals on
Windows/Linux produce the same escalation warnings.
scripts/dev.ts-18-18 (1)

18-18: ⚠️ Potential issue | 🟠 Major

Keep CHICAGO_MCP opt-in in dev.

This widens the fixed dev baseline and makes Computer Use come up on every bun run dev with no opt-out path. Please keep DEFAULT_FEATURES to the documented four defaults and require FEATURE_CHICAGO_MCP=1 for this flag.

♻️ Proposed fix
-const DEFAULT_FEATURES = ["BUDDY", "TRANSCRIPT_CLASSIFIER", "BRIDGE_MODE", "AGENT_TRIGGERS_REMOTE", "CHICAGO_MCP"];
+const DEFAULT_FEATURES = [
+    "BUDDY",
+    "TRANSCRIPT_CLASSIFIER",
+    "BRIDGE_MODE",
+    "AGENT_TRIGGERS_REMOTE",
+];

As per coding guidelines, scripts/dev.ts should enable only BUDDY, TRANSCRIPT_CLASSIFIER, BRIDGE_MODE, and AGENT_TRIGGERS_REMOTE by default, and other features should be enabled via FEATURE_<FLAG_NAME>=1.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/dev.ts` at line 18, Remove "CHICAGO_MCP" from the DEFAULT_FEATURES
array so DEFAULT_FEATURES only contains "BUDDY", "TRANSCRIPT_CLASSIFIER",
"BRIDGE_MODE", and "AGENT_TRIGGERS_REMOTE"; then add a runtime check for
process.env.FEATURE_CHICAGO_MCP === "1" and, if true, push "CHICAGO_MCP" onto
the features list (e.g. in the same initialization logic that uses
DEFAULT_FEATURES) so the flag is opt-in via FEATURE_CHICAGO_MCP=1.
src/utils/computerUse/executor.ts-71-79 (1)

71-79: ⚠️ Potential issue | 🟠 Major

Use stdin/stdout for Windows clipboard operations.

The Windows write path at line 100 embeds clipboard text directly in the PowerShell -Command argument, exposing it to process listing and audit logs. The read path at line 72 is safe. Both Linux (xclip) and macOS (pbcopy) already use stdin/stdout correctly. Adopt the same pattern for Windows to keep the payload out of argv and ensure consistent behavior across platforms.

Proposed changes
   if (process.platform === 'win32') {
     const { stdout, code } = await execFileNoThrow('powershell', ['-NoProfile', '-Command', 'Get-Clipboard'], {
       useCwd: false,
     })

For the write path, replace the template literal approach with stdin:

   if (process.platform === 'win32') {
-    const { code } = await execFileNoThrow('powershell', ['-NoProfile', '-Command', `Set-Clipboard -Value '${text.replace(/'/g, "''")}'`], {
+    const { code } = await execFileNoThrow('powershell', ['-NoProfile', '-Command', 'Set-Clipboard -Value ([Console]::In.ReadToEnd())'], {
+      input: text,
       useCwd: false,
     })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/executor.ts` around lines 71 - 79, The Windows
clipboard write path currently embeds the payload into the PowerShell '-Command'
argv (exposing it in process lists); instead, modify the write branch to call
execFileNoThrow('powershell', ['-NoProfile', '-Command', 'Set-Clipboard -Value
([Console]::In.ReadToEnd())'], { useCwd: false, input: text }) (or equivalent)
so the clipboard text is sent via stdin rather than an argv template; keep the
same exit code check and error handling used in the read path (which calls
execFileNoThrow with 'Get-Clipboard') and reuse the same options (useCwd: false)
to maintain consistent behavior.
packages/@ant/computer-use-mcp/src/tools.ts-40-103 (1)

40-103: ⚠️ Potential issue | 🟠 Major

Batch actions never tell the model which coordinate mode to use.

Because BATCH_ACTION_ITEM_SCHEMA is hard-coded outside buildComputerUseTools(), computer_batch, teach_step.actions, and teach_batch.steps[].actions only expose generic (x, y) tuples. In normalized_0_100 mode, batched clicks and drags are ambiguous even though the non-batch tools are parameterized by coordinateMode.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-mcp/src/tools.ts around lines 40 - 103,
BATCH_ACTION_ITEM_SCHEMA is defined globally and omits any coordinateMode, so
batch endpoints (computer_batch, teach_step.actions,
teach_batch.steps[].actions) only accept raw (x,y) tuples and become ambiguous
for normalized_0_100 coordinates; move or extend the schema inside
buildComputerUseTools() (or augment it there) to include a coordinateMode field
(e.g., "coordinateMode" enum with values like "pixels" and "normalized_0_100")
and update BATCH_ACTION_ITEM_SCHEMA (or the schema returned by
buildComputerUseTools) to accept coordinateMode and interpret
coordinate/start_coordinate accordingly so batched clicks/drags are unambiguous.
Ensure the tools that produce/consume these actions (referenced by
buildComputerUseTools, computer_batch, teach_step.actions,
teach_batch.steps[].actions) validate and propagate coordinateMode with each
action.
packages/@ant/computer-use-input/src/backends/linux.ts-14-27 (1)

14-27: ⚠️ Potential issue | 🟠 Major

Don't treat failed helper commands as success.

These wrappers ignore exit status and stderr. A failed child process currently looks like an empty, successful result, so callers can no-op or return bogus data while the tool call still appears to have worked.

🛠️ Suggested fix
 function run(cmd: string[]): string {
   const result = Bun.spawnSync({
     cmd,
     stdout: 'pipe',
     stderr: 'pipe',
   })
+  if (result.exitCode !== 0) {
+    throw new Error(
+      `${cmd.join(' ')} exited ${result.exitCode}: ${new TextDecoder().decode(result.stderr).trim()}`,
+    )
+  }
   return new TextDecoder().decode(result.stdout).trim()
 }

Mirror the same exit-code check in runAsync().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-input/src/backends/linux.ts around lines 14 - 27,
The current helpers run and runAsync ignore exit status and stderr, treating
failed child processes as empty successes; update both functions (run and
runAsync) to check the process exit code (and/or error properties) and stderr,
and throw an Error (including the command, exit code and stderr/stdout) when the
child exits non‑zero or reports an error so callers no longer receive silent
bogus results; for run, inspect the Bun.spawnSync return (status/code/stderr)
and throw on non-zero, and for runAsync, await proc.exited, inspect
proc.exitCode or the resolved exit status and stderr, then throw with contextual
details if the command failed.
packages/@ant/computer-use-input/src/backends/win32.ts-199-201 (1)

199-201: ⚠️ Potential issue | 🟠 Major

Escape SendKeys metacharacters before typing arbitrary text.

SendWait() treats characters like +, ^, %, ~, (, ), {, and } as control tokens. Raw text containing them will not be typed literally. Enclose each in braces: {+}, {^}, {%}, {~}, {(}, {)}, and use doubled braces for literal braces: {{ and }}.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-input/src/backends/win32.ts around lines 199 -
201, The typeText function currently only escapes single quotes for the
PowerShell string; update it to also escape SendKeys metacharacters so they are
typed literally: in typeText, before calling ps(...SendWait('${escaped}') ),
transform the input text so every one of these characters + ^ % ~ ( ) is
replaced with a braced literal ({+}, {^}, {%}, {~}, {(}, {)}), and ensure
literal braces become doubled (replace { with {{ and } with }} or equivalent)
before you escape single quotes for PowerShell; then use that transformed value
in the existing escaped variable passed to ps to preserve the PowerShell
quoting.
packages/@ant/computer-use-input/src/index.ts-50-59 (1)

50-59: ⚠️ Potential issue | 🟠 Major

isSupported should not be based on module load alone.

This flips true as soon as the platform file can be required, even when runtime prerequisites are missing. On Linux, for example, the feature will advertise support before we've verified the xdotool/X11 path actually works. Similarly, macOS relies on osascript and Windows on powershell—all of which may be unavailable at runtime, but the backends perform no upfront verification. Failures only occur when functions are invoked.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-input/src/index.ts around lines 50 - 59,
isSupported currently flips true when the platform module loads (backend !==
null) even if runtime prerequisites are missing; change the contract so the
platform backends perform an explicit runtime verification and expose it, and
update the top-level isSupported to reflect that verification. Concretely: have
each platform backend export an isSupported boolean or async function that
actually checks required runtime tools (e.g., xdotool/osascript/powershell),
then replace the current isSupported = backend !== null with a call to
backend.isSupported (or await backend.isSupported() if async) so exports like
moveMouse, key, keys, mouseLocation, mouseButton, mouseScroll, typeText,
getFrontmostAppInfo still default to unsupported but isSupported accurately
reflects runtime capability.
packages/@ant/computer-use-input/src/backends/darwin.ts-50-58 (1)

50-58: ⚠️ Potential issue | 🟠 Major

key() doesn't support held keys on macOS — release is a no-op.

AppleScript's System Events key code and keystroke commands synthesize a complete key press (down + up) and do not provide separate key-down and key-up events for regular character keys. The release action has no effect, so any caller expecting to hold a key will only trigger a tap on press.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-input/src/backends/darwin.ts around lines 50 -
58, The current key function ignores the 'release' action and only synthesizes a
full key press via AppleScript, so held keys cannot be represented; update the
key(InputBackend['key']) implementation to stop returning early for 'release'
and instead emit explicit key-down and key-up events using macOS Quartz/CGEvent
APIs (create a helper like sendKeyEvent(keyCode: number, down: boolean) and for
non-key-code characters implement sendUnicodeKeyEvent/unicode string via
CGEventKeyboardSetUnicodeString), call sendKeyEvent(KEY_MAP[lower], true) on
'press' and sendKeyEvent(..., false) on 'release', and for characters that
require Unicode use the unicode helper for both down/up; ensure the helper is
awaited and preserves existing KEY_MAP lookup logic.
packages/@ant/computer-use-swift/src/backends/darwin.ts-141-143 (1)

141-143: ⚠️ Potential issue | 🟠 Major

Don't hardcode every macOS window to display 1.

Returning [1] for every bundle ID breaks multi-monitor display resolution, and it doesn't even line up with this backend's listAll() IDs, which are CG display IDs rather than a fixed 1.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-swift/src/backends/darwin.ts around lines 141 -
143, The findWindowDisplays implementation incorrectly returns a hardcoded [1]
for every bundleId; update the findWindowDisplays(bundleIds) function to
retrieve actual CG display IDs for each app's windows (e.g., via the same system
APIs used in this backend's listAll() or CGWindowList/CGDisplay APIs), map each
window to its CG display ID, de-duplicate the IDs, and return { bundleId,
displayIds: [...] } with those real display IDs so results line up with
listAll() and support multi-monitor setups.
packages/@ant/computer-use-swift/src/backends/linux.ts-244-277 (1)

244-277: ⚠️ Potential issue | 🟠 Major

Don't reuse a global screenshot path.

Both capture paths write to /tmp/cu-screenshot.png. If scrot fails, the next read can return stale bytes from a previous capture, and overlapping captures can clobber each other. Use a per-call temp file and clean it up in finally.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-swift/src/backends/linux.ts around lines 244 -
277, The code currently reuses the global SCREENSHOT_PATH which can return stale
data or cause clobbering; change both captureExcluding and captureRegion to
create a unique per-call temp file (instead of using SCREENSHOT_PATH) when
invoking runAsync (e.g., include a UUID/timestamp in the filename), read that
temp file into base64 as before, and ensure you delete the temp file in a
finally block so it’s always cleaned up; update any references to
SCREENSHOT_PATH inside captureExcluding and captureRegion to use the new
per-call temp path and keep the existing error handling/return shapes.
packages/@ant/computer-use-swift/src/backends/linux.ts-96-98 (1)

96-98: ⚠️ Potential issue | 🟠 Major

Implement actual window-to-display mapping on Linux.

Returning displayIds: [0] for every app makes multi-monitor targeting incorrect as soon as the window is not on display 0.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-swift/src/backends/linux.ts around lines 96 - 98,
The current stub findWindowDisplays in function findWindowDisplays returns
displayIds: [0] for every bundle and must be replaced with real Linux logic: for
each bundleId, enumerate windows (using X11/XCB or Wayland APIs), match windows
to the application (e.g., WM_CLASS/NET_WM_PID or process name), get each
window's geometry and map it to the monitor using XRandR/Monitor output or
Wayland output geometry, build WindowDisplayInfo objects with the actual display
IDs (or monitor indices) and return them; ensure the async signature is
preserved, handle missing windows by returning an empty displayIds array, and
keep error handling and type compatibility with WindowDisplayInfo.
packages/@ant/computer-use-swift/src/backends/win32.ts-91-93 (1)

91-93: ⚠️ Potential issue | 🟠 Major

Implement real window-to-display resolution here.

Returning displayIds: [0] for every app makes any window on a secondary monitor look like it's on the primary display, so auto-targeting and display-aware screenshots will pick the wrong monitor.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-swift/src/backends/win32.ts around lines 91 - 93,
The current stub in findWindowDisplays always returns displayIds: [0], which
misreports windows on secondary monitors; replace it with real Win32-based
resolution: for each bundleId use EnumWindows to enumerate top-level windows,
match windows to processes via GetWindowThreadProcessId, check the process
executable/command to correlate with bundleId (or compare process PID mapping
you already maintain), then for each matched window call MonitorFromWindow (or
MonitorFromRect/MonitorFromPoint) and GetMonitorInfo to obtain a monitor
identifier/index; collect unique monitor IDs per bundleId and return them as
displayIds. Ensure findWindowDisplays handles multiple windows per app
(de-duplicate display IDs), ignores invisible/minimized windows, and returns an
empty array if no windows are found.
packages/@ant/computer-use-swift/src/backends/darwin.ts-145-153 (1)

145-153: ⚠️ Potential issue | 🟠 Major

Use the supplied point in appUnderPoint().

This ignores x and y and always reports frontmostApplication. A click on a background window or another monitor will be attributed to the wrong app, weakening the per-app permission gate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-swift/src/backends/darwin.ts around lines 145 -
153, The appUnderPoint(_x, _y) implementation ignores the supplied coordinates
and always returns the frontmost app; update the jxa snippet in appUnderPoint to
hit-test the screen point and resolve the app owning the window under that
point: call CoreGraphics window APIs (e.g., CGWindowListCreateWindowAtPoint /
CGWindowListCreateDescriptionFromArray or CGWindowListCopyWindowInfo filtered by
point) to obtain the window's owner PID, then use
NSRunningApplication.runningApplicationWithProcessIdentifier or NSWorkspace to
get the bundleIdentifier/localizedName for that PID and JSON.stringify that
result instead of using NSWorkspace.sharedWorkspace.frontmostApplication. Ensure
the code uses the _x/_y CGPoint (pt) you already construct.
packages/@ant/computer-use-mcp/src/executor.ts-45-48 (1)

45-48: ⚠️ Potential issue | 🟠 Major

Add Linux to the public platform union.

This PR adds Linux support, but ComputerExecutorCapabilities.platform still only allows "darwin" | "win32". That makes a correct Linux executor impossible to type.

💡 Minimal fix
 export interface ComputerExecutorCapabilities {
   screenshotFiltering: 'native' | 'none'
-  platform: 'darwin' | 'win32'
+  platform: 'darwin' | 'win32' | 'linux'
   hostBundleId: string
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-mcp/src/executor.ts around lines 45 - 48, The
ComputerExecutorCapabilities interface's platform union is missing Linux; update
the platform type in ComputerExecutorCapabilities to include 'linux' (i.e.,
'darwin' | 'win32' | 'linux') so Linux executors can be correctly typed, and
search for any usages of ComputerExecutorCapabilities or platform checks to
ensure they handle the new 'linux' variant where necessary (e.g., switch/case or
conditional logic in executor.ts).
🟡 Minor comments (2)
docs/features/computer-use.md-178-195 (1)

178-195: ⚠️ Potential issue | 🟡 Minor

Add a language to this fenced block.

markdownlint flags unlabeled fences here; text is enough if this is meant to stay a plain execution plan.

Suggested change
-```
+```text
 Phase 2(解锁 macOS + Windows)
   ├── 2.1-2.3  移除 3 处硬编码 throw/skip
   ├── 2.4-2.5  剪贴板 + 粘贴快捷键平台分发
@@
 Phase 4(集成验证 + PR)
-```
+```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/features/computer-use.md` around lines 178 - 195, The fenced code block
in the docs (the triple-backtick block showing Phase 2/3/4) is unlabeled and
triggers markdownlint; update the opening fence to include a language tag (e.g.,
change the opening "```" to "```text") so the block is treated as plain text and
linting passes, leaving the block contents and closing "```" unchanged.
DEV-LOG.md-25-33 (1)

25-33: ⚠️ Potential issue | 🟡 Minor

Add a language to this fenced block.

Markdownlint is already flagging this block with MD040. Adding text here keeps docs lint clean.

📝 Proposed fix
-```
+```text
 packages/@ant/computer-use-{input,swift}/src/
 ├── index.ts          ← dispatcher
 ├── types.ts          ← 共享接口
 └── backends/
     ├── darwin.ts      ← macOS AppleScript(原样拆出,不改逻辑)
     ├── win32.ts       ← Windows PowerShell
     └── linux.ts       ← Linux xdotool/scrot/xrandr/wmctrl
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @DEV-LOG.md around lines 25 - 33, The fenced code block containing the file
tree starting with "packages/@ant/computer-use-{input,swift}/src/" should
include a language tag to satisfy markdownlint MD040; change the opening fence
from totext so it reads ```text and leave the block contents unchanged
(the tree lines including index.ts, types.ts and backends/*).


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🧹 Nitpick comments (1)</summary><blockquote>

<details>
<summary>src/utils/computerUse/gates.ts (1)</summary><blockquote>

`39-55`: **Remove the dead entitlement helper.**

`hasRequiredSubscription()` is now a constant `true`, so `getChicagoEnabled()` still reads like it enforces entitlements when it doesn't. Inlining the condition will make future rollout changes easier to reason about.

<details>
<summary>Suggested change</summary>

```diff
-function hasRequiredSubscription(): boolean {
-  return true
-}
-
 export function getChicagoEnabled(): boolean {
@@
-  return hasRequiredSubscription() && readConfig().enabled
+  return readConfig().enabled
 }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@src/utils/computerUse/gates.ts` around lines 39 - 55, Remove the dead helper
hasRequiredSubscription() and inline its result into getChicagoEnabled(): delete
or stop calling hasRequiredSubscription() and replace the final return to
directly use the true/entitlement expression (i.e., remove the
hasRequiredSubscription() call and rely on readConfig().enabled plus the
existing ant/monorepo check); update references to hasRequiredSubscription() in
this file (if any) and keep isEnvTruthy(process.env.ALLOW_ANT_COMPUTER_USE_MCP)
and readConfig().enabled as the gating conditions in getChicagoEnabled().
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @packages/@ant/computer-use-mcp/src/mcpServer.ts:

  • Around line 272-287: The CallTool handler currently calls dispatch regardless
    of adapter state, so disabled adapters still execute tools; update the handler
    registered via server.setRequestHandler(CallToolRequestSchema, ...) to first
    check adapter.isDisabled() (or the same gating used for ListTools) and
    short-circuit with an appropriate failure/empty response (e.g., a blocked error
    or empty result) when disabled; ensure this check is applied before calling
    bindSessionContext/dispatch so bindSessionContext and dispatch are never invoked
    for disabled adapters.

In @packages/@ant/computer-use-mcp/src/pixelCompare.ts:

  • Around line 102-106: comparePixelAtLocation currently returns false when the
    rect or cropped patches are missing (e.g., from crop failures), which makes
    validateClickTarget treat internal errors as real mismatches; change
    comparePixelAtLocation to return a distinct "skip" sentinel (null) instead of
    false when rect/crop/patches are unavailable (see the checks around
    crop(lastScreenshot.base64, rect) and crop(freshScreenshot.base64, rect)), and
    update the other identical block later (lines ~147-164) to do the same so
    callers like validateClickTarget can detect null and skip validation rather than
    aborting the click.

In @packages/@ant/computer-use-swift/src/backends/darwin.ts:

  • Around line 160-181: The listInstalled() implementation is fabricating
    bundleIds and only scans top-level Applications; update it to enumerate real app
    bundles (include /Applications, /System/Applications and subfolders) and for
    each app use a system query (e.g., mdls or reading Contents/Info.plist) via the
    existing osascript call or a child-process call to extract CFBundleIdentifier,
    then return that identifier in the returned objects; ensure the mapping in
    listInstalled() returns bundleId, displayName and path using the real
    CFBundleIdentifier so it matches open() and listRunning().

In @packages/@ant/computer-use-swift/src/backends/linux.ts:

  • Around line 124-157: listInstalled() currently sets bundleId to the .desktop
    filename, which is inconsistent with listRunning() and appUnderPoint() that use
    the executable path or PID fallback; change listInstalled() (in the InstalledApp
    creation) to derive bundleId from the Exec entry (the same value you set into
    path: exec.split(/\s+/)[0]) so bundleId matches running-app IDs, and only fall
    back to the .desktop basename if Exec is missing or empty; keep displayName from
    Name and preserve the existing NoDisplay filter.

In @packages/@ant/computer-use-swift/src/backends/win32.ts:

  • Around line 123-146: listInstalled() currently sets bundleId to PSChildName
    which doesn't match how listRunning(), appUnderPoint(), and open() identify apps
    (by executable path); update listInstalled() to use the install executable path
    as the bundleId (i.e., set bundleId to the parsed InstallLocation/executable
    path instead of PSChildName), ensure the mapped object uses path as the
    canonical identifier (falling back to name if path is missing), and keep
    displayName populated from DisplayName so grants and open() will match
    running/frontmost apps across listRunning(), appUnderPoint(), and open().

Outside diff comments:
In @build.ts:

  • Around line 13-19: DEFAULT_BUILD_FEATURES currently includes "CHICAGO_MCP",
    which forces that feature on in builds; remove "CHICAGO_MCP" so only
    "AGENT_TRIGGERS_REMOTE" remains in DEFAULT_BUILD_FEATURES, leaving Chicago
    opt-in via the FEATURE_CHICAGO_MCP env var flow that uses envFeatures and
    features; update the DEFAULT_BUILD_FEATURES declaration (and any nearby comment)
    to reflect this change so rollouts remain controlled by FEATURE_<FLAG_NAME>
    environment variables.

In @src/main.tsx:

  • Around line 1605-1623: The Chicago MCP injection currently appends to
    dynamicMcpConfig and allowedTools after the enterprise MCP policy checks,
    bypassing org-level bans; modify the flow so you re-run the MCP policy check
    (use areMcpConfigsAllowedWithEnterpriseMcpConfig() or
    doesEnterpriseMcpConfigExist() / the existing policy gating functions)
    immediately after calling setupComputerUseMCP() and only merge mcpConfig / push
    cuTools when that re-check returns allowed, or alternatively move the Chicago
    setup to run before the policy decision and let the policy layer explicitly
    exempt or allow it; reference getChicagoEnabled, setupComputerUseMCP,
    dynamicMcpConfig, allowedTools, doesEnterpriseMcpConfigExist, and
    areMcpConfigsAllowedWithEnterpriseMcpConfig when making the change.

In @src/utils/computerUse/drainRunLoop.ts:

  • Around line 61-79: The current early return in drainRunLoop skips the
    timeout/unhandled-rejection guard on non-darwin platforms; change drainRunLoop
    so the timeout logic (withResolvers, timer via setTimeout(timeoutReject,
    TIMEOUT_MS, ...), attaching work.catch(() => {}), and Promise.race) always runs,
    but only call retain() and release() (and any CFRunLoop-specific logic) when
    process.platform === 'darwin'; update the implementation around the symbols
    drainRunLoop, retain, release, withResolvers, timeoutReject, TIMEOUT_MS and
    ensure timer is cleared in finally so non-macOS calls still get the 30s timeout
    and unhandled-rejection protection.

In @src/utils/computerUse/gates.ts:

  • Around line 12-21: DEFAULTS currently enables Chicago by default which causes
    readConfig (which overlays partial payloads onto DEFAULTS) to implicitly turn
    features on when the config is missing or stale; change DEFAULTS.enabled to
    false so the system fails closed, and confirm the behavior in the overlay code
    that uses readConfig (refer to DEFAULTS, ChicagoConfig, and readConfig) still
    merges partial payloads but will not enable Computer Use when the config is
    absent or incomplete.

Major comments:
In @packages/@ant/computer-use-input/src/backends/darwin.ts:

  • Around line 50-58: The current key function ignores the 'release' action and
    only synthesizes a full key press via AppleScript, so held keys cannot be
    represented; update the key(InputBackend['key']) implementation to stop
    returning early for 'release' and instead emit explicit key-down and key-up
    events using macOS Quartz/CGEvent APIs (create a helper like
    sendKeyEvent(keyCode: number, down: boolean) and for non-key-code characters
    implement sendUnicodeKeyEvent/unicode string via
    CGEventKeyboardSetUnicodeString), call sendKeyEvent(KEY_MAP[lower], true) on
    'press' and sendKeyEvent(..., false) on 'release', and for characters that
    require Unicode use the unicode helper for both down/up; ensure the helper is
    awaited and preserves existing KEY_MAP lookup logic.

In @packages/@ant/computer-use-input/src/backends/linux.ts:

  • Around line 14-27: The current helpers run and runAsync ignore exit status and
    stderr, treating failed child processes as empty successes; update both
    functions (run and runAsync) to check the process exit code (and/or error
    properties) and stderr, and throw an Error (including the command, exit code and
    stderr/stdout) when the child exits non‑zero or reports an error so callers no
    longer receive silent bogus results; for run, inspect the Bun.spawnSync return
    (status/code/stderr) and throw on non-zero, and for runAsync, await proc.exited,
    inspect proc.exitCode or the resolved exit status and stderr, then throw with
    contextual details if the command failed.

In @packages/@ant/computer-use-input/src/backends/win32.ts:

  • Around line 199-201: The typeText function currently only escapes single
    quotes for the PowerShell string; update it to also escape SendKeys
    metacharacters so they are typed literally: in typeText, before calling
    ps(...SendWait('${escaped}') ), transform the input text so every one of these
    characters + ^ % ~ ( ) is replaced with a braced literal ({+}, {^}, {%}, {~},
    {(}, {)}), and ensure literal braces become doubled (replace { with {{ and }
    with }} or equivalent) before you escape single quotes for PowerShell; then use
    that transformed value in the existing escaped variable passed to ps to preserve
    the PowerShell quoting.

In @packages/@ant/computer-use-input/src/index.ts:

  • Around line 50-59: isSupported currently flips true when the platform module
    loads (backend !== null) even if runtime prerequisites are missing; change the
    contract so the platform backends perform an explicit runtime verification and
    expose it, and update the top-level isSupported to reflect that verification.
    Concretely: have each platform backend export an isSupported boolean or async
    function that actually checks required runtime tools (e.g.,
    xdotool/osascript/powershell), then replace the current isSupported = backend
    !== null with a call to backend.isSupported (or await backend.isSupported() if
    async) so exports like moveMouse, key, keys, mouseLocation, mouseButton,
    mouseScroll, typeText, getFrontmostAppInfo still default to unsupported but
    isSupported accurately reflects runtime capability.

In @packages/@ant/computer-use-mcp/src/executor.ts:

  • Around line 45-48: The ComputerExecutorCapabilities interface's platform union
    is missing Linux; update the platform type in ComputerExecutorCapabilities to
    include 'linux' (i.e., 'darwin' | 'win32' | 'linux') so Linux executors can be
    correctly typed, and search for any usages of ComputerExecutorCapabilities or
    platform checks to ensure they handle the new 'linux' variant where necessary
    (e.g., switch/case or conditional logic in executor.ts).

In @packages/@ant/computer-use-mcp/src/keyBlocklist.ts:

  • Around line 124-128: Update isSystemKeyCombo to accept Linux as a platform and
    enforce Linux shortcuts: change the platform type union to include "linux" (or
    use the broader platform type), add or reference a BLOCKED_LINUX set, and branch
    similarly to the existing BLOCKED_DARWIN/BLOCKED_WIN32 logic so the function
    uses BLOCKED_LINUX when platform === "linux"; ensure callers are updated where
    necessary to pass "linux" so the Linux backend correctly enforces
    systemKeyCombos.

In @packages/@ant/computer-use-mcp/src/sentinelApps.ts:

  • Around line 13-36: The current sentinel sets (SHELL_ACCESS_BUNDLE_IDS,
    FILESYSTEM_ACCESS_BUNDLE_IDS, SYSTEM_SETTINGS_BUNDLE_IDS) and exported
    SENTINEL_BUNDLE_IDS only contain macOS bundle IDs so non-macOS backends won't
    match; update sentinel logic by either (A) expanding these sets to include
    Windows and Linux identifiers (executable names, package IDs, WSL/Flatpak/Snap
    IDs) for common shells/file-managers/system-settings, and/or (B) add a
    normalization layer when categorizing apps that maps platform-specific identity
    fields (e.g., process executable name, binary path, package name) into a
    canonical key before membership checks against SENTINEL_BUNDLE_IDS; modify the
    categorization code that uses SentinelCategory ("shell" | "filesystem" |
    "system_settings") to consult the normalized identity or the extended sets so
    approvals on Windows/Linux produce the same escalation warnings.

In @packages/@ant/computer-use-mcp/src/tools.ts:

  • Around line 40-103: BATCH_ACTION_ITEM_SCHEMA is defined globally and omits any
    coordinateMode, so batch endpoints (computer_batch, teach_step.actions,
    teach_batch.steps[].actions) only accept raw (x,y) tuples and become ambiguous
    for normalized_0_100 coordinates; move or extend the schema inside
    buildComputerUseTools() (or augment it there) to include a coordinateMode field
    (e.g., "coordinateMode" enum with values like "pixels" and "normalized_0_100")
    and update BATCH_ACTION_ITEM_SCHEMA (or the schema returned by
    buildComputerUseTools) to accept coordinateMode and interpret
    coordinate/start_coordinate accordingly so batched clicks/drags are unambiguous.
    Ensure the tools that produce/consume these actions (referenced by
    buildComputerUseTools, computer_batch, teach_step.actions,
    teach_batch.steps[].actions) validate and propagate coordinateMode with each
    action.

In @packages/@ant/computer-use-swift/src/backends/darwin.ts:

  • Around line 141-143: The findWindowDisplays implementation incorrectly returns
    a hardcoded [1] for every bundleId; update the findWindowDisplays(bundleIds)
    function to retrieve actual CG display IDs for each app's windows (e.g., via the
    same system APIs used in this backend's listAll() or CGWindowList/CGDisplay
    APIs), map each window to its CG display ID, de-duplicate the IDs, and return {
    bundleId, displayIds: [...] } with those real display IDs so results line up
    with listAll() and support multi-monitor setups.
  • Around line 145-153: The appUnderPoint(_x, _y) implementation ignores the
    supplied coordinates and always returns the frontmost app; update the jxa
    snippet in appUnderPoint to hit-test the screen point and resolve the app owning
    the window under that point: call CoreGraphics window APIs (e.g.,
    CGWindowListCreateWindowAtPoint / CGWindowListCreateDescriptionFromArray or
    CGWindowListCopyWindowInfo filtered by point) to obtain the window's owner PID,
    then use NSRunningApplication.runningApplicationWithProcessIdentifier or
    NSWorkspace to get the bundleIdentifier/localizedName for that PID and
    JSON.stringify that result instead of using
    NSWorkspace.sharedWorkspace.frontmostApplication. Ensure the code uses the _x/_y
    CGPoint (pt) you already construct.

In @packages/@ant/computer-use-swift/src/backends/linux.ts:

  • Around line 244-277: The code currently reuses the global SCREENSHOT_PATH
    which can return stale data or cause clobbering; change both captureExcluding
    and captureRegion to create a unique per-call temp file (instead of using
    SCREENSHOT_PATH) when invoking runAsync (e.g., include a UUID/timestamp in the
    filename), read that temp file into base64 as before, and ensure you delete the
    temp file in a finally block so it’s always cleaned up; update any references to
    SCREENSHOT_PATH inside captureExcluding and captureRegion to use the new
    per-call temp path and keep the existing error handling/return shapes.
  • Around line 96-98: The current stub findWindowDisplays in function
    findWindowDisplays returns displayIds: [0] for every bundle and must be replaced
    with real Linux logic: for each bundleId, enumerate windows (using X11/XCB or
    Wayland APIs), match windows to the application (e.g., WM_CLASS/NET_WM_PID or
    process name), get each window's geometry and map it to the monitor using
    XRandR/Monitor output or Wayland output geometry, build WindowDisplayInfo
    objects with the actual display IDs (or monitor indices) and return them; ensure
    the async signature is preserved, handle missing windows by returning an empty
    displayIds array, and keep error handling and type compatibility with
    WindowDisplayInfo.

In @packages/@ant/computer-use-swift/src/backends/win32.ts:

  • Around line 91-93: The current stub in findWindowDisplays always returns
    displayIds: [0], which misreports windows on secondary monitors; replace it with
    real Win32-based resolution: for each bundleId use EnumWindows to enumerate
    top-level windows, match windows to processes via GetWindowThreadProcessId,
    check the process executable/command to correlate with bundleId (or compare
    process PID mapping you already maintain), then for each matched window call
    MonitorFromWindow (or MonitorFromRect/MonitorFromPoint) and GetMonitorInfo to
    obtain a monitor identifier/index; collect unique monitor IDs per bundleId and
    return them as displayIds. Ensure findWindowDisplays handles multiple windows
    per app (de-duplicate display IDs), ignores invisible/minimized windows, and
    returns an empty array if no windows are found.

In @scripts/dev.ts:

  • Line 18: Remove "CHICAGO_MCP" from the DEFAULT_FEATURES array so
    DEFAULT_FEATURES only contains "BUDDY", "TRANSCRIPT_CLASSIFIER", "BRIDGE_MODE",
    and "AGENT_TRIGGERS_REMOTE"; then add a runtime check for
    process.env.FEATURE_CHICAGO_MCP === "1" and, if true, push "CHICAGO_MCP" onto
    the features list (e.g. in the same initialization logic that uses
    DEFAULT_FEATURES) so the flag is opt-in via FEATURE_CHICAGO_MCP=1.

In @src/utils/computerUse/executor.ts:

  • Around line 300-302: createCliExecutor currently lets win32/linux flow through
    the macOS bundle-ID path (getTerminalBundleId()/CLI_HOST_BUNDLE_ID), causing
    host identification mismatches; update createCliExecutor to branch by platform
    so that for 'darwin' it uses getTerminalBundleId()/CLI_HOST_BUNDLE_ID as before,
    but for 'win32' and 'linux' it assigns the new non-bundle host identifier used
    by the cross-platform input contract (and does not call getTerminalBundleId),
    and ensure callers that rely on host identity (prepareDisplay, previewHideSet,
    screenshot exclusion logic) check both macOS bundle IDs and the new
    Windows/Linux host identifier so the terminal is recognized correctly on all
    platforms.
  • Around line 71-79: The Windows clipboard write path currently embeds the
    payload into the PowerShell '-Command' argv (exposing it in process lists);
    instead, modify the write branch to call execFileNoThrow('powershell',
    ['-NoProfile', '-Command', 'Set-Clipboard -Value ([Console]::In.ReadToEnd())'],
    { useCwd: false, input: text }) (or equivalent) so the clipboard text is sent
    via stdin rather than an argv template; keep the same exit code check and error
    handling used in the read path (which calls execFileNoThrow with
    'Get-Clipboard') and reuse the same options (useCwd: false) to maintain
    consistent behavior.

Minor comments:
In @DEV-LOG.md:

  • Around line 25-33: The fenced code block containing the file tree starting
    with "packages/@ant/computer-use-{input,swift}/src/" should include a language
    tag to satisfy markdownlint MD040; change the opening fence from totext
    so it reads ```text and leave the block contents unchanged (the tree lines
    including index.ts, types.ts and backends/*).

In @docs/features/computer-use.md:

  • Around line 178-195: The fenced code block in the docs (the triple-backtick
    block showing Phase 2/3/4) is unlabeled and triggers markdownlint; update the
    opening fence to include a language tag (e.g., change the opening "" to "text") so the block is treated as plain text and linting passes, leaving the
    block contents and closing "```" unchanged.

Nitpick comments:
In @src/utils/computerUse/gates.ts:

  • Around line 39-55: Remove the dead helper hasRequiredSubscription() and inline
    its result into getChicagoEnabled(): delete or stop calling
    hasRequiredSubscription() and replace the final return to directly use the
    true/entitlement expression (i.e., remove the hasRequiredSubscription() call and
    rely on readConfig().enabled plus the existing ant/monorepo check); update
    references to hasRequiredSubscription() in this file (if any) and keep
    isEnvTruthy(process.env.ALLOW_ANT_COMPUTER_USE_MCP) and readConfig().enabled as
    the gating conditions in getChicagoEnabled().

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `dd7c9e64-4c3f-46ae-b741-78e2cb2e651a`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 465e9f01c69ee7b2326aabe0b18ce7d8f1f2148e and e3264a16919675b13d3c118cd7e0e639d0436e39.

</details>

<details>
<summary>📒 Files selected for processing (34)</summary>

* `DEV-LOG.md`
* `build.ts`
* `docs/features/computer-use.md`
* `packages/@ant/computer-use-input/src/backends/darwin.ts`
* `packages/@ant/computer-use-input/src/backends/linux.ts`
* `packages/@ant/computer-use-input/src/backends/win32.ts`
* `packages/@ant/computer-use-input/src/index.ts`
* `packages/@ant/computer-use-input/src/types.ts`
* `packages/@ant/computer-use-mcp/src/deniedApps.ts`
* `packages/@ant/computer-use-mcp/src/executor.ts`
* `packages/@ant/computer-use-mcp/src/imageResize.ts`
* `packages/@ant/computer-use-mcp/src/index.ts`
* `packages/@ant/computer-use-mcp/src/keyBlocklist.ts`
* `packages/@ant/computer-use-mcp/src/mcpServer.ts`
* `packages/@ant/computer-use-mcp/src/pixelCompare.ts`
* `packages/@ant/computer-use-mcp/src/sentinelApps.ts`
* `packages/@ant/computer-use-mcp/src/subGates.ts`
* `packages/@ant/computer-use-mcp/src/toolCalls.ts`
* `packages/@ant/computer-use-mcp/src/tools.ts`
* `packages/@ant/computer-use-mcp/src/types.ts`
* `packages/@ant/computer-use-swift/src/backends/darwin.ts`
* `packages/@ant/computer-use-swift/src/backends/linux.ts`
* `packages/@ant/computer-use-swift/src/backends/win32.ts`
* `packages/@ant/computer-use-swift/src/index.ts`
* `packages/@ant/computer-use-swift/src/types.ts`
* `scripts/dev.ts`
* `src/main.tsx`
* `src/utils/computerUse/common.ts`
* `src/utils/computerUse/drainRunLoop.ts`
* `src/utils/computerUse/escHotkey.ts`
* `src/utils/computerUse/executor.ts`
* `src/utils/computerUse/gates.ts`
* `src/utils/computerUse/hostAdapter.ts`
* `src/utils/computerUse/swiftLoader.ts`

</details>

<details>
<summary>💤 Files with no reviewable changes (1)</summary>

* src/utils/computerUse/swiftLoader.ts

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment on lines +272 to +287
server.setRequestHandler(ListToolsRequestSchema, async () =>
adapter.isDisabled() ? { tools: [] } : { tools },
);

if (context) {
const dispatch = bindSessionContext(adapter, coordinateMode, context);
server.setRequestHandler(
CallToolRequestSchema,
async (request): Promise<CallToolResult> => {
const { screenshot: _s, telemetry: _t, ...result } = await dispatch(
request.params.name,
request.params.arguments ?? {},
);
return result;
},
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Disabled servers still accept direct CallTool requests.

ListTools hides the surface when adapter.isDisabled() is true, but this handler still dispatches the call. Any MCP client that already knows the tool names can bypass the disable/subscription gate.

🛡️ Suggested fix
     server.setRequestHandler(
       CallToolRequestSchema,
       async (request): Promise<CallToolResult> => {
+        if (adapter.isDisabled()) {
+          return {
+            content: [{ type: "text", text: "Computer Use is currently disabled." }],
+            isError: true,
+          }
+        }
         const { screenshot: _s, telemetry: _t, ...result } = await dispatch(
           request.params.name,
           request.params.arguments ?? {},
         );
         return result;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
server.setRequestHandler(ListToolsRequestSchema, async () =>
adapter.isDisabled() ? { tools: [] } : { tools },
);
if (context) {
const dispatch = bindSessionContext(adapter, coordinateMode, context);
server.setRequestHandler(
CallToolRequestSchema,
async (request): Promise<CallToolResult> => {
const { screenshot: _s, telemetry: _t, ...result } = await dispatch(
request.params.name,
request.params.arguments ?? {},
);
return result;
},
);
server.setRequestHandler(ListToolsRequestSchema, async () =>
adapter.isDisabled() ? { tools: [] } : { tools },
);
if (context) {
const dispatch = bindSessionContext(adapter, coordinateMode, context);
server.setRequestHandler(
CallToolRequestSchema,
async (request): Promise<CallToolResult> => {
if (adapter.isDisabled()) {
return {
content: [{ type: "text", text: "Computer Use is currently disabled." }],
isError: true,
}
}
const { screenshot: _s, telemetry: _t, ...result } = await dispatch(
request.params.name,
request.params.arguments ?? {},
);
return result;
},
);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-mcp/src/mcpServer.ts around lines 272 - 287, The
CallTool handler currently calls dispatch regardless of adapter state, so
disabled adapters still execute tools; update the handler registered via
server.setRequestHandler(CallToolRequestSchema, ...) to first check
adapter.isDisabled() (or the same gating used for ListTools) and short-circuit
with an appropriate failure/empty response (e.g., a blocked error or empty
result) when disabled; ensure this check is applied before calling
bindSessionContext/dispatch so bindSessionContext and dispatch are never invoked
for disabled adapters.

Comment on lines +102 to +106
if (!rect) return false;

const patch1 = crop(lastScreenshot.base64, rect);
const patch2 = crop(freshScreenshot.base64, rect);
if (!patch1 || !patch2) return false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Internal crop failures currently block clicks instead of skipping validation.

comparePixelAtLocation() returns false when the rect or cropped patch is unavailable, and validateClickTarget() treats that as a real mismatch. That violates the documented “skip on internal error” contract and can abort valid clicks whenever crop/decode fails.

Also applies to: 147-164

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-mcp/src/pixelCompare.ts around lines 102 - 106,
comparePixelAtLocation currently returns false when the rect or cropped patches
are missing (e.g., from crop failures), which makes validateClickTarget treat
internal errors as real mismatches; change comparePixelAtLocation to return a
distinct "skip" sentinel (null) instead of false when rect/crop/patches are
unavailable (see the checks around crop(lastScreenshot.base64, rect) and
crop(freshScreenshot.base64, rect)), and update the other identical block later
(lines ~147-164) to do the same so callers like validateClickTarget can detect
null and skip validation rather than aborting the click.

Comment on lines +160 to +181
async listInstalled() {
try {
const result = await osascript(`
tell application "System Events"
set appList to ""
repeat with appFile in (every file of folder "Applications" of startup disk whose name ends with ".app")
set appPath to POSIX path of (appFile as alias)
set appName to name of appFile
set appList to appList & appPath & "|" & appName & "\\n"
end repeat
return appList
end tell
`)
return result.split('\n').filter(Boolean).map(line => {
const [path, name] = line.split('|', 2)
const displayName = (name ?? '').replace(/\.app$/, '')
return {
bundleId: `com.app.${displayName.toLowerCase().replace(/\s+/g, '-')}`,
displayName,
path: path ?? '',
}
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Return real bundle identifiers from listInstalled().

This fabricates com.app.<name> values instead of reading each app's actual bundle identifier, while open() and listRunning() operate on the real identifier. Grants created from this list won't match running apps, and scanning only the top-level /Applications folder also misses many built-in apps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-swift/src/backends/darwin.ts around lines 160 -
181, The listInstalled() implementation is fabricating bundleIds and only scans
top-level Applications; update it to enumerate real app bundles (include
/Applications, /System/Applications and subfolders) and for each app use a
system query (e.g., mdls or reading Contents/Info.plist) via the existing
osascript call or a child-process call to extract CFBundleIdentifier, then
return that identifier in the returned objects; ensure the mapping in
listInstalled() returns bundleId, displayName and path using the real
CFBundleIdentifier so it matches open() and listRunning().

Comment on lines +124 to +157
async listInstalled(): Promise<InstalledApp[]> {
try {
// Read .desktop files from standard locations
const dirs = ['/usr/share/applications', '/usr/local/share/applications', `${process.env.HOME}/.local/share/applications`]
const apps: InstalledApp[] = []

for (const dir of dirs) {
let files: string
try {
files = run(['find', dir, '-name', '*.desktop', '-maxdepth', '1'])
} catch { continue }

for (const filepath of files.split('\n').filter(Boolean)) {
try {
const content = run(['cat', filepath])
const nameMatch = content.match(/^Name=(.+)$/m)
const execMatch = content.match(/^Exec=(.+)$/m)
const noDisplay = content.match(/^NoDisplay=true$/m)
if (noDisplay) continue

const name = nameMatch?.[1] ?? ''
const exec = execMatch?.[1] ?? ''
if (!name) continue

apps.push({
bundleId: filepath.split('/').pop()?.replace('.desktop', '') ?? '',
displayName: name,
path: exec.split(/\s+/)[0] ?? '',
})
} catch { /* skip unreadable files */ }
}
}

return apps.slice(0, 200)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Keep Linux app IDs consistent across discovery APIs.

listInstalled() emits the .desktop basename as bundleId, but listRunning() and appUnderPoint() emit an executable path or PID fallback. Grants created from the installed-app list won't match the running app, so frontmost-app gating and reopen flows will misfire.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-swift/src/backends/linux.ts around lines 124 -
157, listInstalled() currently sets bundleId to the .desktop filename, which is
inconsistent with listRunning() and appUnderPoint() that use the executable path
or PID fallback; change listInstalled() (in the InstalledApp creation) to derive
bundleId from the Exec entry (the same value you set into path:
exec.split(/\s+/)[0]) so bundleId matches running-app IDs, and only fall back to
the .desktop basename if Exec is missing or empty; keep displayName from Name
and preserve the existing NoDisplay filter.

Comment on lines +123 to +146
async listInstalled() {
try {
const raw = await psAsync(`
$apps = @()
$paths = @(
'HKLM:\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\*',
'HKLM:\\SOFTWARE\\WOW6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\*',
'HKCU:\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\*'
)
foreach ($p in $paths) {
Get-ItemProperty $p -ErrorAction SilentlyContinue | Where-Object { $_.DisplayName } | ForEach-Object {
$apps += "$($_.DisplayName)|$($_.InstallLocation)|$($_.PSChildName)"
}
}
$apps | Select-Object -Unique | Select-Object -First 200
`)
return raw.split('\n').filter(Boolean).map(line => {
const [name, path, id] = line.split('|', 3)
return {
bundleId: id ?? name ?? '',
displayName: name ?? '',
path: path ?? '',
}
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Use the same app identifier in listInstalled() as the runtime APIs.

This stores PSChildName as bundleId, while listRunning() and appUnderPoint() identify the same app by executable path. Grants created from the installed-app list will never match the running/frontmost app, and open() will be handed an identifier it can't launch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/`@ant/computer-use-swift/src/backends/win32.ts around lines 123 -
146, listInstalled() currently sets bundleId to PSChildName which doesn't match
how listRunning(), appUnderPoint(), and open() identify apps (by executable
path); update listInstalled() to use the install executable path as the bundleId
(i.e., set bundleId to the parsed InstallLocation/executable path instead of
PSChildName), ensure the mapped object uses path as the canonical identifier
(falling back to name if path is missing), and keep displayName populated from
DisplayName so grants and open() will match running/frontmost apps across
listRunning(), appUnderPoint(), and open().

unraid and others added 2 commits April 4, 2026 00:00
New Windows-native capabilities:
- windowCapture.ts: PrintWindow API for per-window screenshot (works on
  occluded/background windows)
- windowEnum.ts: EnumWindows for precise window enumeration with HWND
- uiAutomation.ts: IUIAutomation for UI tree reading, element clicking,
  text input, and coordinate-based element identification
- ocr.ts: Windows.Media.Ocr for screen text recognition (en-US + zh-CN)

Updated win32.ts backend to use EnumWindows for listRunning() and added
captureWindowTarget() for window-specific screenshots.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two root causes fixed:

1. swiftLoader.ts: require('@ant/computer-use-swift') returns a module
   with { ComputerUseAPI } class, not an instance. macOS native .node
   exports a plain object. Fixed by detecting class export and calling
   new ComputerUseAPI().

2. executor.ts resolvePrepareCapture: toolCalls.ts expects result to have
   { hidden: string[], displayId: number } fields. Our ComputerUseAPI
   returns { base64, width, height } only. Fixed by backfilling missing
   fields with defaults.

Verified: request_access → screenshot → left_click all work on Windows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude-code-best claude-code-best merged commit 86d2c8f into claude-code-best:main Apr 3, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants