-
Notifications
You must be signed in to change notification settings - Fork 3
Add Agent Browser integration documentation #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
a0f1ffb
d33c9e9
f969575
dd437d1
f851b73
210f212
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,149 @@ | ||
| --- | ||
| title: "Agent Browser" | ||
| --- | ||
|
|
||
| [Agent Browser](https://github.com/vercel-labs/agent-browser) is a headless browser automation CLI for AI agents built by Vercel. It provides a fast Rust CLI with Node.js fallback, making it ideal for AI-powered browser automation. By integrating with Kernel, you can run Agent Browser automations with cloud-hosted browsers. | ||
|
|
||
| ## Using the native Kernel provider | ||
|
|
||
| Agent Browser has built-in support for Kernel as a cloud browser provider. This is the simplest way to use Kernel with Agent Browser. | ||
|
|
||
| ### Quick start | ||
|
|
||
| Use the `-p` flag to enable Kernel: | ||
|
|
||
| ```bash | ||
| export KERNEL_API_KEY="your-api-key" | ||
| agent-browser -p kernel open https://example.com | ||
| ``` | ||
|
|
||
| Or use environment variables for CI/scripts: | ||
|
|
||
| ```bash | ||
| export AGENT_BROWSER_PROVIDER=kernel | ||
| export KERNEL_API_KEY="your-api-key" | ||
| agent-browser open https://example.com | ||
| ``` | ||
|
|
||
| Get your API key from the [Kernel Dashboard](https://dashboard.onkernel.com). | ||
|
|
||
| ### Configuration options | ||
|
|
||
| Configure Kernel via environment variables: | ||
|
|
||
| | Variable | Description | Default | | ||
| |----------|-------------|---------| | ||
| | `KERNEL_HEADLESS` | Run browser in headless mode (`true`/`false`) | `false` | | ||
| | `KERNEL_STEALTH` | Enable stealth mode to avoid bot detection (`true`/`false`) | `true` | | ||
| | `KERNEL_TIMEOUT_SECONDS` | Session timeout in seconds | `300` | | ||
| | `KERNEL_PROFILE_NAME` | Browser profile name for persistent cookies/logins | (none) | | ||
|
|
||
| ### Profile persistence | ||
|
|
||
| When `KERNEL_PROFILE_NAME` is set, the profile will be created if it doesn't already exist. Cookies, logins, and session data are automatically saved back to the profile when the browser session ends, making them available for future sessions. | ||
|
|
||
| ```bash | ||
| export KERNEL_API_KEY="your-api-key" | ||
| export KERNEL_PROFILE_NAME="my-profile" | ||
| agent-browser -p kernel open https://example.com | ||
| ``` | ||
|
|
||
| ## Connecting via CDP (alternative) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this section should only apply when you need full control of the Kernel browser session creation logic beyond what the agent-browser environment variables support. also, the example should use CLI throughout (not SDK + CLI mix): # Create a Kernel browser and extract the CDP URL
SESSION=$(kernel browsers create --stealth -o json)
CDP_URL=$(echo "$SESSION" | jq -r '.cdp_ws_url')
SESSION_ID=$(echo "$SESSION" | jq -r '.session_id')
# Connect agent-browser to the Kernel session
agent-browser connect "$CDP_URL"
# Run your automation
agent-browser open https://example.com
agent-browser snapshot
# Clean up
agent-browser close
kernel browsers delete "$SESSION_ID" |
||
|
|
||
| If you need more control, you can connect Agent Browser to Kernel via CDP directly. | ||
|
|
||
| ### 1. Install the Kernel SDK | ||
|
|
||
| ```bash | ||
| npm install @onkernel/sdk | ||
| ``` | ||
|
|
||
| ### 2. Initialize Kernel and create a browser | ||
|
|
||
| Import the libraries and create a cloud browser session: | ||
|
|
||
| ```typescript | ||
| import Kernel from '@onkernel/sdk'; | ||
|
|
||
| const kernel = new Kernel(); | ||
|
|
||
| const kernelBrowser = await kernel.browsers.create({ stealth: true }); | ||
|
|
||
| console.log("Live view url: ", kernelBrowser.browser_live_view_url); | ||
| ``` | ||
|
|
||
| ### 3. Connect Agent Browser to Kernel | ||
|
|
||
| Use Agent Browser's `connect` command to connect to Kernel's CDP WebSocket URL: | ||
|
|
||
| ```bash | ||
| agent-browser connect <cdp_ws_url> | ||
| ``` | ||
|
|
||
| Where `<cdp_ws_url>` is the `cdp_ws_url` from your Kernel browser session. | ||
|
|
||
| ### 4. Use Agent Browser commands | ||
|
|
||
| Once connected, use Agent Browser's commands with the Kernel-powered browser: | ||
|
|
||
| ```bash | ||
| agent-browser open example.com | ||
| agent-browser snapshot # Get accessibility tree with refs | ||
| agent-browser click @e2 # Click by ref from snapshot | ||
| agent-browser fill @e3 "test@example.com" # Fill by ref | ||
| agent-browser get text @e1 # Get text by ref | ||
| agent-browser screenshot page.png | ||
| ``` | ||
|
|
||
| ### 5. Clean up | ||
|
|
||
| When you're done, close the browser and clean up the Kernel session: | ||
|
|
||
| ```bash | ||
| agent-browser close | ||
| ``` | ||
|
|
||
| ```typescript | ||
| await kernel.browsers.deleteByID(kernelBrowser.session_id); | ||
| ``` | ||
|
|
||
| ## Programmatic usage | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. consider adding context for when you'd use this approach - e.g., "Use this approach if you want to use agent-browser as an alternative to Playwright within a Node.js or Python application while maintaining programmatic control over browser session lifecycle." also, the example has a bug - TypeScript: import Kernel from '@onkernel/sdk';
import { execSync } from 'child_process';
const kernel = new Kernel();
const browser = await kernel.browsers.create({ stealth: true });
console.log("Live view url:", browser.browser_live_view_url);
try {
execSync(`agent-browser connect "${browser.cdp_ws_url}"`, { stdio: 'inherit' });
execSync('agent-browser open https://example.com', { stdio: 'inherit' });
execSync('agent-browser snapshot', { stdio: 'inherit' });
execSync('agent-browser close', { stdio: 'inherit' });
} finally {
await kernel.browsers.deleteByID(browser.session_id);
}Python: import subprocess
from kernel import Kernel
kernel = Kernel()
browser = kernel.browsers.create(stealth=True)
print(f"Live view url: {browser.browser_live_view_url}")
try:
subprocess.run(["agent-browser", "connect", browser.cdp_ws_url], check=True)
subprocess.run(["agent-browser", "open", "https://example.com"], check=True)
subprocess.run(["agent-browser", "snapshot"], check=True)
subprocess.run(["agent-browser", "close"], check=True)
finally:
kernel.browsers.delete_by_id(browser.session_id) |
||
|
|
||
| You can also use Agent Browser programmatically with Kernel: | ||
|
|
||
| ```typescript | ||
| import Kernel from '@onkernel/sdk'; | ||
| import { execSync } from 'child_process'; | ||
|
|
||
| const kernel = new Kernel(); | ||
|
|
||
| // Create a Kernel browser session | ||
| const kernelBrowser = await kernel.browsers.create({ stealth: true }); | ||
|
|
||
| console.log("Live view url: ", kernelBrowser.browser_live_view_url); | ||
|
|
||
| // Connect Agent Browser to Kernel | ||
| execSync(`agent-browser connect ${kernelBrowser.cdp_ws_url}`); | ||
|
|
||
| // Run your automation | ||
| execSync('agent-browser open https://example.com'); | ||
| execSync('agent-browser snapshot'); | ||
|
|
||
| // Clean up | ||
| execSync('agent-browser close'); | ||
| await kernel.browsers.deleteByID(kernelBrowser.session_id); | ||
| ``` | ||
|
|
||
| ## Benefits of using Kernel with Agent Browser | ||
|
|
||
| - **No local browser management**: Run automations without installing or maintaining browsers locally | ||
| - **Scalability**: Launch multiple browser sessions in parallel | ||
| - **Stealth mode**: Built-in anti-detection features for web scraping | ||
| - **Session state**: Maintain browser state across runs via [Profiles](/browsers/profiles) | ||
| - **Live view**: Debug your automations with real-time browser viewing | ||
|
|
||
| ## Next steps | ||
|
|
||
| - Check out [live view](/browsers/live-view) for debugging your automations | ||
| - Learn about [stealth mode](/browsers/bot-detection/stealth) for avoiding detection | ||
| - Learn how to properly [terminate browser sessions](/browsers/termination) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the "Or use environment variables for CI/scripts" section is unnecessary - consider removing lines 20-27 and just documenting that
AGENT_BROWSER_PROVIDER=kernelis an alternative to the-p kernelflag in the configuration table below