The open-source alternative to OpenAI Codex's computer use. Control any Windows app — native desktop, web, and Electron — with a single AI skill. Like OpenAI's playwright-interactive, but with native Win32 desktop app support that Codex can't do.
What GPT-5.4 does in the cloud, this skill does locally. OpenAI's GPT-5.4 introduced "computer use" — the ability to control computers via Playwright and mouse/keyboard commands. Their playwright-interactive Codex skill enables visual debugging of web and Electron apps. We go further: our dual-engine approach adds native Windows desktop automation (Win32 API) alongside Playwright, letting you control apps like WeChat, DingTalk, and QQ that browser-based solutions simply can't reach.
| OpenAI Codex | Claude Code + This Skill | |
|---|---|---|
| Web/Electron apps | Playwright | Playwright |
| Native desktop apps | Not possible | Win32 API |
| Chat apps (WeChat, DingTalk, QQ) | Not possible | Full support |
| Visual feedback | js_repl screenshots | Screenshots + Claude Vision |
| QA workflow | Manual checklist | Automated QA checklist |
| Viewport testing | Manual | Device presets (iPhone, iPad, Pixel...) |
| DOM inspection | Via DevTools | Built-in inspector |
| Platform | Cloud only | Local (your machine) |
| Cost | Codex subscription | Free & open source |
$ python scripts/desktop_control.py list-apps
Supported applications:
weixin WeChat process=Weixin mode=win32 search=Ctrl+F
wxwork WeCom (企业微信) process=WXWork mode=win32 search=Ctrl+F
dingtalk DingTalk (钉钉) process=DingTalk mode=win32 search=Ctrl+K
feishu Feishu/Lark (飞书) process=Feishu mode=win32 search=Ctrl+K
qq QQ process=QQ mode=win32 search=Ctrl+F
telegram Telegram process=Telegram mode=win32 search=Ctrl+K
slack Slack process=slack mode=win32 search=Ctrl+K
teams Microsoft Teams process=ms-teams mode=win32 search=Ctrl+E
$ python scripts/desktop_control.py find-window --app weixin
{"found": true, "process": "Weixin", "left": 856, "top": 185, "width": 462, "height": 640}
$ python scripts/desktop_control.py send-message --app weixin --contact "文件传输助手" --message "Hello from AI 🤖"
✓ Message sent to 文件传输助手 via WeChat┌──────────────────────────────────────────────────────────┐
│ Desktop Controller Skill │
├───────────────┬────────────────────┬─────────────────────┤
│ Win32 Engine │ Playwright Engine │ Visual Feedback │
│ (Native Apps)│ (Web/Electron) │ (Screenshot + AI) │
├───────────────┼────────────────────┼─────────────────────┤
│ FindWindow │ page.click() │ Screen Capture │
│ SendKeys │ page.fill() │ Window Capture │
│ SetCursorPos │ page.goto() │ → Claude Vision │
│ mouse_event │ page.screenshot() │ → State Verify │
│ Clipboard │ page.evaluate() │ → Auto Retry │
│ GetWindowRect │ DOM Inspection │ → QA Checklist │
└───────────────┴────────────────────┴─────────────────────┘
| App | Process | Search Key | Status |
|---|---|---|---|
| WeChat (微信) | Weixin | Ctrl+F | Tested & Verified |
| WeCom (企业微信) | WXWork | Ctrl+F | Ready |
| DingTalk (钉钉) | DingTalk | Ctrl+K | Ready |
| Feishu/Lark (飞书) | Feishu | Ctrl+K | Ready |
| Ctrl+F | Ready | ||
| Telegram | Telegram | Ctrl+K | Ready |
| Slack | slack | Ctrl+K | Ready |
| Microsoft Teams | ms-teams | Ctrl+E | Ready |
Any website or Electron app — VS Code, Notion, Discord, Figma, and more.
Just tell Claude Code what you want:
"Send a WeChat message to 张三 saying 你好"
"给张三发钉钉消息说明天开会"
"Take a screenshot of my DingTalk window"
"Open https://example.com and click the login button"
"Run QA checklist on http://localhost:3000"
"Test my site on iPhone 14 viewport"
"Inspect all buttons on this page"
"帮我操控电脑自动发消息"
# Send a message via any chat app
python scripts/desktop_control.py send-message --app weixin --contact "张三" --message "你好"
# Screenshot an app window
python scripts/desktop_control.py screenshot --app weixin --output wechat.png
# Full screen screenshot
python scripts/desktop_control.py screenshot --output screen.png
# Click at coordinates
python scripts/desktop_control.py click --app weixin --x 500 --y 400
# Type text
python scripts/desktop_control.py type --app dingtalk --text "Hello World"
# Find a window
python scripts/desktop_control.py find-window --app feishu
# List all supported apps
python scripts/desktop_control.py list-apps# Screenshot a web page
python scripts/playwright_control.py web-screenshot --url "https://example.com" --output page.png
# Click an element by CSS selector
python scripts/playwright_control.py web-click --url "https://example.com" --selector "#login-btn"
# Fill a form field
python scripts/playwright_control.py web-fill --url "https://example.com" --selector "input[name=email]" --text "test@example.com"
# Inspect DOM elements
python scripts/playwright_control.py web-inspect --url "https://example.com" --selector "button"
# Evaluate JavaScript
python scripts/playwright_control.py web-eval --url "https://example.com" --js "return document.title"
# Test with mobile viewport
python scripts/playwright_control.py viewport --device "iPhone 14" --url "https://example.com" --output mobile.png
# Run automated QA checklist
python scripts/playwright_control.py qa-checklist --url "http://localhost:3000"- Win32 Engine: Controls native Windows apps that no browser automation can reach
- Playwright Engine: Full DOM access, CSS selectors, JavaScript evaluation
Execute Action → Screenshot → Claude Analyzes → Verify Success → Next Action
↑ │
└──────────── Retry if Failed ←────────────────┘
One command runs functional, visual, viewport, and performance checks:
- Page load verification
- Broken link detection
- Console error capture
- Desktop + mobile screenshots
- Horizontal overflow check
- Performance timing (DOMContentLoaded, Load)
Built-in presets: iPhone 14, iPhone 14 Pro Max, iPad, Pixel 7, Desktop, Desktop HD, 4K
Chinese text handling built-in via Unicode code point arrays — no encoding issues.
The #1 discovery: after searching for a contact in chat apps, the message input area does NOT receive keyboard focus automatically. You must physically click on it using Win32 SetCursorPos + mouse_event. This single insight makes the difference between a working and broken automation.
Windows clipboard can be locked by other processes. Always:
Clipboard.Clear()beforeSetText()- Retry up to 5 times with 300ms delay
- 100ms pause between Clear and Set
# Clone to Claude Code skills directory
git clone https://github.com/24kchengYe/desktop-controller-skill ~/.claude/skills/desktop-controller
# For Playwright features (optional)
cd ~/.claude/skills/desktop-controller
npm install playwright
npx playwright install chromium- Windows OS with PowerShell
- Python 3.8+
- Node.js 18+ (for Playwright features)
- Target apps running and logged in
Add new apps in scripts/app_registry.py:
"my_app": {
"name": "My App",
"aliases": ["myapp"],
"process": "MyApp",
"mode": "win32",
"search_key": "^f", # Ctrl+F
"input_area": {"x_ratio": 0.65, "y_ratio_from_bottom": 0.12},
...
}- OpenAI Codex Skills — OpenAI's skill catalog including
playwright-interactive - OpenAI Codex — OpenAI's coding agent (our skill brings similar computer-use capabilities to Claude Code)
- Playwright — The browser automation framework powering our web engine
- Claude Code — Anthropic's CLI coding agent
If this project helps you, please give it a star! It helps others discover it.
MIT — free for personal and commercial use.