Give Droid the ability to see and control your browser.
Built on Factory's official browser skill with added page debugging, multi-tab support, and LLM-friendly DOM snapshots.
This skill lets you ask Droid to browse websites, debug pages, take screenshots, and interact with web apps - all through natural conversation.
New to Droid? Droid is an AI coding assistant that runs in your terminal. Install it at factory.ai.
Prerequisites: Node.js 18+, Google Chrome, Droid CLI
# Install the skill
git clone https://github.com/joshdayorg/droid-browser-skill.git /tmp/droid-browser-skill
cp -r /tmp/droid-browser-skill/browser ~/.factory/skills/browser
cd ~/.factory/skills/browser && npm install
rm -rf /tmp/droid-browser-skillTry it:
- Start Droid in your terminal:
droid - Say: "Start the browser"
- Say: "Go to google.com"
That's it! Droid is now controlling Chrome.
Once the browser skill is installed, just talk naturally:
| You say... | Droid does... |
|---|---|
| "Start the browser" | Launches Chrome |
| "Go to example.com" | Navigates to the URL |
| "What's wrong with this page?" | Debugs console errors, JS exceptions, network failures |
| "Take a screenshot" | Captures the viewport |
| "Show me the page structure" | Returns LLM-friendly DOM snapshot |
| "Open a new tab and go to github.com" | Opens new tab, navigates |
| "Switch to the google tab" | Switches between tabs |
| "Click the login button" | Interacts with elements |
| "Fill in the email field with test@example.com" | Types into inputs |
| Script | Description |
|---|---|
start.js |
Launch Chrome with remote debugging on port 9222 |
nav.js |
Navigate to URLs, open new tabs |
eval.js |
Execute JavaScript in the browser |
screenshot.js |
Capture viewport screenshots |
pick.js |
Interactive visual element picker |
debug.js |
Full page debugging - console errors, JS exceptions, failed network requests |
tabs.js |
List, switch, and close browser tabs |
snapshot.js |
LLM-friendly DOM snapshot - accessibility tree with element refs |
git clone https://github.com/joshdayorg/droid-browser-skill.git /tmp/droid-browser-skill
cp -r /tmp/droid-browser-skill/browser ~/.factory/skills/browser
cd ~/.factory/skills/browser && npm install
rm -rf /tmp/droid-browser-skillgit clone https://github.com/joshdayorg/droid-browser-skill.git /tmp/droid-browser-skill
mkdir -p .factory/skills
cp -r /tmp/droid-browser-skill/browser .factory/skills/browser
cd .factory/skills/browser && npm install
rm -rf /tmp/droid-browser-skill
git add .factory/skills/browser
git commit -m "Add browser automation skill"Edit config.js to customize for your monitor:
// Default: Right half of 27" Apple Studio Display (2560x1440 effective)
export const VIEWPORT = { width: 1280, height: 1340 };
export const WINDOW = { width: 1280, height: 1440, x: 1280, y: 0 };Full screen (any monitor):
export const WINDOW = { width: 1920, height: 1080, x: 0, y: 0 };Left half of screen:
export const WINDOW = { width: 1280, height: 1440, x: 0, y: 0 };Smaller window:
export const WINDOW = { width: 1024, height: 768, x: 100, y: 100 };# Fresh profile
.factory/skills/browser/start.js
# Use your existing Chrome profile (keeps logins, cookies)
.factory/skills/browser/start.js --profile# Navigate current tab
.factory/skills/browser/nav.js https://example.com
# Open in new tab
.factory/skills/browser/nav.js https://example.com --new# Get page title
.factory/skills/browser/eval.js "document.title"
# Count links
.factory/skills/browser/eval.js "document.querySelectorAll('a').length"
# Click a button
.factory/skills/browser/eval.js "document.querySelector('button').click()".factory/skills/browser/screenshot.js
# Output: ✓ Screenshot saved: /tmp/screenshot-2024-01-15T10-30-00.png# Debug current page (15 second capture)
.factory/skills/browser/debug.js
# Debug specific URL
.factory/skills/browser/debug.js https://example.com
# Custom capture time (5 seconds)
.factory/skills/browser/debug.js https://example.com 5Example output:
🔍 Debugging: https://example.com
Listening for 15 seconds...
════════════════════════════════════════════════════════════
📋 CONSOLE ERRORS & WARNINGS:
❌ [error] Failed to load resource: 404
💥 JAVASCRIPT ERRORS:
❌ TypeError: Cannot read property 'map' of undefined
at App.js:42
🌐 FAILED NETWORK REQUESTS:
❌ GET /api/users → 401 Unauthorized
════════════════════════════════════════════════════════════
📊 SUMMARY: 1 console issues, 1 JS errors, 1 failed requests
# List all tabs
.factory/skills/browser/tabs.js
# Switch to tab 2
.factory/skills/browser/tabs.js 2
# Switch by URL or title match
.factory/skills/browser/tabs.js google
.factory/skills/browser/tabs.js "My App"
# Close current tab
.factory/skills/browser/tabs.js close.factory/skills/browser/pick.js "Select the login button"
# Click elements in browser, press Enter in terminal when done
# Returns: tag, id, classes, text, selector for each picked element.factory/skills/browser/snapshot.jsReturns an accessibility tree optimized for AI understanding:
- button "SHOP" [ref=e8]
- link "Learn More" [ref=e13]
- textbox "Email" [ref=e35]
- heading "Products" [level=2]
Interact with elements using their refs:
# Click a button
.factory/skills/browser/eval.js "document.querySelector('[data-ref=e8]').click()"
# Fill a text field
.factory/skills/browser/eval.js "document.querySelector('[data-ref=e35]').value = 'test@example.com'"This skill uses Puppeteer Core to connect to Chrome via the Chrome DevTools Protocol (CDP). Unlike full Puppeteer, it doesn't bundle Chromium - it connects to your existing Chrome installation.
Key design decisions:
defaultViewport: null- Viewport matches window size, no weird small content areas- Stateful tab tracking - Scripts remember which tab is active
- 15-second debug default - Captures most page load issues without being too slow
Chrome won't start:
- Make sure Chrome is installed
- Check if port 9222 is already in use:
lsof -i :9222
Scripts timeout:
- Chrome may be unresponsive; restart with
start.js - Kill stuck Chrome:
pkill -f "chrome-debug"
Viewport is wrong size:
- Edit
config.jsto match your monitor - Restart Chrome after changing config
MIT