Skip to content

feat: enable Computer Use with Windows support#96

Closed
amDosion wants to merge 1 commit intoclaude-code-best:mainfrom
amDosion:feat/computer-use-windows
Closed

feat: enable Computer Use with Windows support#96
amDosion wants to merge 1 commit intoclaude-code-best:mainfrom
amDosion:feat/computer-use-windows

Conversation

@amDosion
Copy link
Copy Markdown
Contributor

@amDosion amDosion commented Apr 3, 2026

Summary

  • 替换 @ant/computer-use-mcp stub 为完整实现(12 文件,6517 行)
  • 重构 @ant/computer-use-input@ant/computer-use-swift 为 dispatcher + backends/ 架构
  • 新增 Windows PowerShell 后端(参考项目仅 macOS)
  • CHICAGO_MCP 加入默认编译开关

Architecture

packages/@ant/computer-use-{input,swift}/src/
├── index.ts              ← dispatcher(按 platform 选后端)
├── types.ts              ← 共享接口
└── backends/
    ├── darwin.ts          ← macOS AppleScript(原有逻辑,原样拆出)
    └── win32.ts           ← Windows PowerShell(新增)

Windows verification (x64)

  • isSupported: true
  • 鼠标移动/画圆
  • 前台窗口信息(进程名+路径)
  • 双显示器检测(2560x1440 × 2)
  • 全屏截图(2560x1440, 7.3MB base64)
  • 运行中应用列表
  • bun run build 成功(463 files)

Test plan

  • Windows: bun run dev 启动无报错
  • macOS: 现有 AppleScript 后端不受影响(backends/darwin.ts 逻辑未改)
  • 不带 CHICAGO_MCP 构建时 Computer Use 仍被 DCE 裁掉

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Enabled Computer Use functionality with cross-platform support for macOS and Windows.
    • Added Windows support for mouse control, keyboard input, and screenshot capabilities.
    • Implemented app permission system with configurable access tiers.
    • Added pixel-level click validation for improved interaction accuracy.

Phase 1: Replace @ant/computer-use-mcp stub with full implementation
(12 files, 6517 lines from reference project).

Phase 2-3: Refactor @ant/computer-use-input and @ant/computer-use-swift
from single-file to dispatcher + backends/ architecture:
- backends/darwin.ts — existing macOS AppleScript (unchanged logic)
- backends/win32.ts — new Windows PowerShell (SetCursorPos, SendInput,
  CopyFromScreen, GetForegroundWindow)

Add CHICAGO_MCP to default build features.

Verified on Windows x64: mouse control, dual-monitor detection,
full-screen screenshot, foreground app info, running process list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This pull request implements cross-platform computer use functionality, replacing the MCP server stub with a full implementation, restructuring input and display backends into platform-agnostic dispatchers supporting both macOS and Windows, enabling the CHICAGO_MCP feature flag, and documenting the multi-phase architecture covering permission handling, input/output operations, and tool schemas.

Changes

Cohort / File(s) Summary
Feature Flag Enablement
build.ts, scripts/dev.ts, DEV-LOG.md
Updated default build features and dev-mode configuration to enable CHICAGO_MCP, and added dated development log entry documenting the multi-phase restructuring work with Windows x64 validation results.
Computer Use Documentation
docs/features/computer-use.md
Introduced comprehensive feature documentation defining 3-package architecture, phased execution roadmap (4 phases), file-change overview per phase, and performance expectations tied to PowerShell startup overhead and long-lived process optimization.
Input Backend Dispatch & Types
packages/@ant/computer-use-input/src/index.ts, types.ts, backends/darwin.ts, backends/win32.ts
Restructured monolithic macOS-only input API into platform-dispatching architecture with shared InputBackend interface; added Darwin backend using AppleScript/JXA and Windows backend using PowerShell with Win32 P/Invoke bindings for mouse/keyboard/clipboard operations and app info queries.
Display Backend Dispatch & Types
packages/@ant/computer-use-swift/src/index.ts, types.ts, backends/darwin.ts, backends/win32.ts
Refactored monolithic macOS implementation into platform-dispatching model with DisplayAPI, AppsAPI, ScreenshotAPI interfaces; added Darwin backend using AppleScript/JXA and Windows backend using PowerShell with System.Drawing/System.Windows.Forms for display enumeration, app discovery, and screenshot capture.
MCP Core Types & Contracts
packages/@ant/computer-use-mcp/src/executor.ts, types.ts
Introduced ComputerExecutor interface defining asynchronous control-flow entry points for display/app/screenshot operations, mouse/keyboard/drag/scroll/clipboard control, and state queries; expanded type system with ComputerUseSessionContext, ComputerUseHostAdapter, permission tiering (CuAppPermTier), grant flags, and session-scoped callbacks for lock gating, permission dialogs, and screenshot persistence.
MCP Policy & Validation
packages/@ant/computer-use-mcp/src/deniedApps.ts, sentinelApps.ts, keyBlocklist.ts, subGates.ts
Added multi-layer app denial policy (bundle-id and display-name substring matching), sentinel app escalation warnings (shell/filesystem/system-settings), system key combo blocklists for Darwin/Windows with normalization, and feature sub-gate configuration objects (ALL_SUB_GATES_ON, ALL_SUB_GATES_OFF).
MCP Tool & Server Runtime
packages/@ant/computer-use-mcp/src/tools.ts, mcpServer.ts, toolCalls.ts (inferred), imageResize.ts, pixelCompare.ts, index.ts
Implemented buildComputerUseTools generating parameterized MCP tool schemas with coordinate-mode-aware descriptions, createComputerUseMcpServer wiring MCP lifecycle, bindSessionContext dispatching tool calls with lock gating and permission routing, image resizing algorithm for token-budget constraints, and pixel-accurate click-target validation using injected JPEG crop function.

Sequence Diagrams

sequenceDiagram
    participant Client
    participant MCP Server
    participant Session Ctx
    participant Lock Gate
    participant Tool Handler
    participant Computer Executor
    participant Screenshot Cache

    Client->>MCP Server: CallTool(toolName, args)
    MCP Server->>Session Ctx: bindSessionContext dispatcher
    
    Session Ctx->>Lock Gate: checkCuLock()
    alt Lock Held
        Lock Gate-->>Session Ctx: { error: "cu_lock_held" }
        Session Ctx-->>Client: error response
    else Lock Available
        Lock Gate->>Lock Gate: acquireCuLock() if needed
        Session Ctx->>Tool Handler: handleToolCall(toolName, args, overrides)
        
        Tool Handler->>Computer Executor: Execute operation<br/>(screenshot/click/type/etc)
        Computer Executor-->>Tool Handler: result + ScreenshotResult
        
        alt Screenshot Returned
            Tool Handler->>Screenshot Cache: cache lastScreenshot
            Session Ctx->>Session Ctx: onScreenshotCaptured(dims)
        end
        
        Tool Handler-->>Session Ctx: CuCallToolResult
        Session Ctx->>Session Ctx: Strip internal fields
        Session Ctx-->>MCP Server: ToolResult
        MCP Server-->>Client: Tool response
        
        Session Ctx->>Lock Gate: Release lock in finally
    end
Loading
sequenceDiagram
    participant Model
    participant Tool Handler
    participant Permission Dialog
    participant App Grant Callback
    participant Computer Executor

    Model->>Tool Handler: request_access(bundleIds, tiers)
    Tool Handler->>Tool Handler: Check policy:<br/>isDenied? sentinel? denied_tier?
    
    alt Policy Denied
        Tool Handler-->>Model: { denied: [...] }
    else Needs User Input
        Tool Handler->>Permission Dialog: onPermissionRequest(request)
        Permission Dialog-->>Tool Handler: { granted, denied, flags }
        
        Tool Handler->>App Grant Callback: Merge with existing bundles
        App Grant Callback->>Tool Handler: Updated AppGrant[]
        
        Tool Handler-->>Model: permission_response
    else Already Granted
        Tool Handler-->>Model: { granted: [...] }
    end
Loading
sequenceDiagram
    participant Module Load
    participant Platform Check
    participant Darwin Backend
    participant Win32 Backend
    participant API Instance

    Module Load->>Platform Check: process.platform?
    
    alt darwin
        Platform Check->>Darwin Backend: import ./backends/darwin.js
        Darwin Backend-->>Module Load: InputBackend impl<br/>(AppleScript/JXA)
    else win32
        Platform Check->>Win32 Backend: import ./backends/win32.js
        Win32 Backend-->>Module Load: InputBackend impl<br/>(PowerShell/Win32 P/Invoke)
    else other
        Module Load-->>Module Load: backend = null
    end
    
    Module Load->>API Instance: Create API facade
    API Instance->>API Instance: Delegate to backend?<br/>or unsupported stub
    API Instance-->>Module Load: Ready for use
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Cross-platform hops from coast to coast,
Darwin clicks and Windows boasts,
PowerShell scripts and AppleScript dreams,
Building bridges through the seams.
From stub to server, locked and gated,
Computer Use—at last, vindicated! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: enable Computer Use with Windows support' clearly and concisely describes the primary change: enabling the Computer Use feature with cross-platform (specifically Windows) support. It directly summarizes the main objective of the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@amDosion
Copy link
Copy Markdown
Contributor Author

amDosion commented Apr 3, 2026

需要修正:fallback 显示器尺寸应动态获取,open 应直接启动 exe

@amDosion amDosion closed this Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant