feat: add browser automation tool via agent-browser CLI#318
feat: add browser automation tool via agent-browser CLI#318Danieldd28 wants to merge 1 commit intomainfrom
Conversation
Integrate agent-browser CLI as a lightweight browser automation tool. Instead of embedding browser dependencies, this wraps the external agent-browser binary via exec.Command, keeping PicoClaw lean. Changes: - Add BrowserTool (pkg/tools/browser.go) wrapping agent-browser CLI - Add BrowserConfig to config with enabled, session, headless, timeout, cdp_port - Register browser tool conditionally in agent loop - Add unit tests for argument building, command splitting, error handling The tool accepts a single 'command' parameter and delegates to agent-browser. Default CDP port is 9222. Zero new Go dependencies - all stdlib imports.
There was a problem hiding this comment.
Pull request overview
Integrates the external agent-browser CLI as an optional PicoClaw tool to enable browser automation without adding embedded browser dependencies.
Changes:
- Added a new
browsertool that shells out toagent-browserwith config-driven global flags. - Introduced
tools.browserconfiguration (enabled/session/headless/timeout/cdp_port) with defaults. - Conditionally registered the browser tool in the agent tool registry when enabled.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| pkg/tools/browser.go | Implements BrowserTool wrapper around the agent-browser binary, including arg building and output handling. |
| pkg/tools/browser_test.go | Adds unit tests for tool metadata, parameter schema, and command/arg parsing helpers. |
| pkg/config/config.go | Adds BrowserConfig under ToolsConfig and wires defaults. |
| pkg/agent/loop.go | Registers the browser tool when cfg.Tools.Browser.Enabled is true. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| type BrowserToolOptions struct { | ||
| Session string // Session name for isolation | ||
| Headless bool // Run in headless mode (default true) | ||
| Timeout int // Command timeout in seconds (default 30) | ||
| CDPPort int // Chrome DevTools Protocol port (default 9222) | ||
| } |
There was a problem hiding this comment.
BrowserToolOptions says Headless has a default of true, but NewBrowserTool currently uses the bool zero-value (false) when opts.Headless isn’t explicitly set, which makes the tool run in headed mode by default (because buildArgs adds --headed when !t.headless). Either implement an explicit default-to-headless behavior (e.g., tri-state/pointer bool) or update the option comment/tests/docs so the default behavior is unambiguous and consistent.
| } | ||
| if current.Len() > 0 { | ||
| args = append(args, current.String()) | ||
| } |
There was a problem hiding this comment.
splitCommand currently drops empty quoted arguments. For example, fill @e3 "" will produce no argument for the empty string because the final append is gated on current.Len() > 0. This breaks commands where an empty string is a valid parameter; consider tracking whether an argument was quoted so empty quoted args are preserved, and add a unit test for this case.
| } | ||
|
|
||
| // Build the full agent-browser command line | ||
| cmdArgs := t.buildArgs(command) |
There was a problem hiding this comment.
Execute() only validates that the raw command string is non-empty, but buildArgs/splitCommand can still return an empty subcommand (e.g., command set to "" or just quotes). In that case this will invoke agent-browser with only global flags, which is likely to fail with a confusing error. Consider validating that the parsed cmdArgs has at least 1 token and returning a clear ErrorResult if not.
| cmdArgs := t.buildArgs(command) | |
| cmdArgs := t.buildArgs(command) | |
| if len(cmdArgs) == 0 { | |
| return ErrorResult("parsed command is empty; provide an agent-browser subcommand (e.g. 'open https://example.com')") | |
| } |
| } | ||
|
|
||
| func (t *BrowserTool) Description() string { | ||
| return `Automate a headless browser via agent-browser CLI. Pass the subcommand as 'command'. |
There was a problem hiding this comment.
Description() starts with "Automate a headless browser…", but the tool can run headed mode when configured (via --headed when Headless=false). Consider adjusting the wording so it doesn’t promise headless operation unconditionally.
| return `Automate a headless browser via agent-browser CLI. Pass the subcommand as 'command'. | |
| return `Automate a browser (headless by default) via the agent-browser CLI. Pass the subcommand as 'command'. |
|
@Zepan This PR addresses roadmap issue #293 (Autonomous Browser Operations — priority: high). It uses Note: PR #187 also targets browser automation but uses Recommendation: Review both #318 and #187, pick one approach. The CLI subprocess model (this PR) is more consistent with PicoClaw's existing pattern (see: Codex CLI provider, Claude CLI provider). Playwright-go would add significant binary size. |
Summary
Integrate the agent-browser CLI as a lightweight browser automation tool for PicoClaw. This replaces the previous PR #308 (ActionBook approach) with a much leaner design that wraps an external CLI binary instead of embedding browser dependencies.
Design
Instead of pulling in heavy Go browser libraries (chromedp, etc.), this delegates all browser complexity to the external agent-browser binary via exec.Command. PicoClaw stays lean:
Changes
New files
pkg/tools/browser.go- BrowserTool wrapping agent-browser CLIpkg/tools/browser_test.go- 11 unit testsModified files
pkg/config/config.go- Add BrowserConfig (enabled, session, headless, timeout, cdp_port)pkg/agent/loop.go- Register browser tool conditionallyConfiguration
{ "tools": { "browser": { "enabled": true, "headless": false, "timeout": 60, "cdp_port": 9222 } } }How it works
The tool exposes a single
browsertool with acommandparameter. The LLM constructs agent-browser subcommands directly:browser open https://example.combrowser snapshot -ibrowser click @e2browser fill @e3 "text"browser closeGlobal flags (--cdp, --headed, --session, --json) are added automatically based on config.
Testing
Prerequisite
Requires agent-browser CLI installed:
npm install -g @anthropic/agent-browser