Browser Automation Skill for Factory Droids

Give Droid the ability to see and control your browser.

Built on Factory's official browser skill with added page debugging, multi-tab support, and LLM-friendly DOM snapshots.

What is this?

This skill lets you ask Droid to browse websites, debug pages, take screenshots, and interact with web apps - all through natural conversation.

New to Droid? Droid is an AI coding assistant that runs in your terminal. Install it at factory.ai.

Quick Start

Prerequisites: Node.js 18+, Google Chrome, Droid CLI

# Install the skill
git clone https://github.com/joshdayorg/droid-browser-skill.git /tmp/droid-browser-skill
cp -r /tmp/droid-browser-skill/browser ~/.factory/skills/browser
cd ~/.factory/skills/browser && npm install
rm -rf /tmp/droid-browser-skill

Try it:

Start Droid in your terminal: droid
Say: "Start the browser"
Say: "Go to google.com"

That's it! Droid is now controlling Chrome.

What Can You Ask Droid?

Once the browser skill is installed, just talk naturally:

You say...	Droid does...
"Start the browser"	Launches Chrome
"Go to example.com"	Navigates to the URL
"What's wrong with this page?"	Debugs console errors, JS exceptions, network failures
"Take a screenshot"	Captures the viewport
"Show me the page structure"	Returns LLM-friendly DOM snapshot
"Open a new tab and go to github.com"	Opens new tab, navigates
"Switch to the google tab"	Switches between tabs
"Click the login button"	Interacts with elements
"Fill in the email field with test@example.com"	Types into inputs

Scripts Reference

Script	Description
`start.js`	Launch Chrome with remote debugging on port 9222
`nav.js`	Navigate to URLs, open new tabs
`eval.js`	Execute JavaScript in the browser
`screenshot.js`	Capture viewport screenshots
`pick.js`	Interactive visual element picker
`debug.js`	Full page debugging - console errors, JS exceptions, failed network requests
`tabs.js`	List, switch, and close browser tabs
`snapshot.js`	LLM-friendly DOM snapshot - accessibility tree with element refs

Installation Options

Option A: Personal Skills (follows you across projects)

git clone https://github.com/joshdayorg/droid-browser-skill.git /tmp/droid-browser-skill
cp -r /tmp/droid-browser-skill/browser ~/.factory/skills/browser
cd ~/.factory/skills/browser && npm install
rm -rf /tmp/droid-browser-skill

Option B: Project Skills (shared via git)

git clone https://github.com/joshdayorg/droid-browser-skill.git /tmp/droid-browser-skill
mkdir -p .factory/skills
cp -r /tmp/droid-browser-skill/browser .factory/skills/browser
cd .factory/skills/browser && npm install
rm -rf /tmp/droid-browser-skill
git add .factory/skills/browser
git commit -m "Add browser automation skill"

Configuration

Edit config.js to customize for your monitor:

// Default: Right half of 27" Apple Studio Display (2560x1440 effective)
export const VIEWPORT = { width: 1280, height: 1340 };
export const WINDOW = { width: 1280, height: 1440, x: 1280, y: 0 };

Common Configurations

Full screen (any monitor):

export const WINDOW = { width: 1920, height: 1080, x: 0, y: 0 };

Left half of screen:

export const WINDOW = { width: 1280, height: 1440, x: 0, y: 0 };

Smaller window:

export const WINDOW = { width: 1024, height: 768, x: 100, y: 100 };

Detailed Usage

Start Chrome

# Fresh profile
.factory/skills/browser/start.js

# Use your existing Chrome profile (keeps logins, cookies)
.factory/skills/browser/start.js --profile

Navigate

# Navigate current tab
.factory/skills/browser/nav.js https://example.com

# Open in new tab
.factory/skills/browser/nav.js https://example.com --new

Execute JavaScript

# Get page title
.factory/skills/browser/eval.js "document.title"

# Count links
.factory/skills/browser/eval.js "document.querySelectorAll('a').length"

# Click a button
.factory/skills/browser/eval.js "document.querySelector('button').click()"

Take Screenshot

.factory/skills/browser/screenshot.js
# Output: ✓ Screenshot saved: /tmp/screenshot-2024-01-15T10-30-00.png

Debug Page (Console, JS Errors, Network)

# Debug current page (15 second capture)
.factory/skills/browser/debug.js

# Debug specific URL
.factory/skills/browser/debug.js https://example.com

# Custom capture time (5 seconds)
.factory/skills/browser/debug.js https://example.com 5

Example output:

🔍 Debugging: https://example.com
   Listening for 15 seconds...

════════════════════════════════════════════════════════════

📋 CONSOLE ERRORS & WARNINGS:
   ❌ [error] Failed to load resource: 404

💥 JAVASCRIPT ERRORS:
   ❌ TypeError: Cannot read property 'map' of undefined
      at App.js:42

🌐 FAILED NETWORK REQUESTS:
   ❌ GET /api/users → 401 Unauthorized

════════════════════════════════════════════════════════════
📊 SUMMARY: 1 console issues, 1 JS errors, 1 failed requests

Manage Tabs

# List all tabs
.factory/skills/browser/tabs.js

# Switch to tab 2
.factory/skills/browser/tabs.js 2

# Switch by URL or title match
.factory/skills/browser/tabs.js google
.factory/skills/browser/tabs.js "My App"

# Close current tab
.factory/skills/browser/tabs.js close

Pick Elements

.factory/skills/browser/pick.js "Select the login button"
# Click elements in browser, press Enter in terminal when done
# Returns: tag, id, classes, text, selector for each picked element

DOM Snapshot (LLM-friendly)

.factory/skills/browser/snapshot.js

Returns an accessibility tree optimized for AI understanding:

- button "SHOP" [ref=e8]
- link "Learn More" [ref=e13]
- textbox "Email" [ref=e35]
- heading "Products" [level=2]

Interact with elements using their refs:

# Click a button
.factory/skills/browser/eval.js "document.querySelector('[data-ref=e8]').click()"

# Fill a text field
.factory/skills/browser/eval.js "document.querySelector('[data-ref=e35]').value = 'test@example.com'"

How It Works

This skill uses Puppeteer Core to connect to Chrome via the Chrome DevTools Protocol (CDP). Unlike full Puppeteer, it doesn't bundle Chromium - it connects to your existing Chrome installation.

Key design decisions:

defaultViewport: null - Viewport matches window size, no weird small content areas
Stateful tab tracking - Scripts remember which tab is active
15-second debug default - Captures most page load issues without being too slow

Troubleshooting

Chrome won't start:

Make sure Chrome is installed
Check if port 9222 is already in use: lsof -i :9222

Scripts timeout:

Chrome may be unresponsive; restart with start.js
Kill stuck Chrome: pkill -f "chrome-debug"

Viewport is wrong size:

Edit config.js to match your monitor
Restart Chrome after changing config

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.factory/skills/browser		.factory/skills/browser
browser		browser
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Browser Automation Skill for Factory Droids

What is this?

Quick Start

What Can You Ask Droid?

Scripts Reference

Installation Options

Option A: Personal Skills (follows you across projects)

Option B: Project Skills (shared via git)

Configuration

Common Configurations

Detailed Usage

Start Chrome

Navigate

Execute JavaScript

Take Screenshot

Debug Page (Console, JS Errors, Network)

Manage Tabs

Pick Elements

DOM Snapshot (LLM-friendly)

How It Works

Troubleshooting

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

joshdayorg/droid-browser-skill

Folders and files

Latest commit

History

Repository files navigation

Browser Automation Skill for Factory Droids

What is this?

Quick Start

What Can You Ask Droid?

Scripts Reference

Installation Options

Option A: Personal Skills (follows you across projects)

Option B: Project Skills (shared via git)

Configuration

Common Configurations

Detailed Usage

Start Chrome

Navigate

Execute JavaScript

Take Screenshot

Debug Page (Console, JS Errors, Network)

Manage Tabs

Pick Elements

DOM Snapshot (LLM-friendly)

How It Works

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages