feat: add ActionBook browser automation tools#308
feat: add ActionBook browser automation tools#308Danieldd28 wants to merge 6 commits intosipeed:mainfrom
Conversation
Integrate ActionBook CLI as three native browser automation tools: - browser_search: find action manuals for websites - browser_get: retrieve CSS selectors and page structure by ID - browser: execute browser commands (open, click, fill, text, etc.) Zero new dependencies — thin CLI wrapper using os/exec. Config-gated via tools.browser.enabled (default: true). 18 unit tests + 1 integration test, all passing. Closes: browser automation support for agent loop
d43cf98 to
62d2c5e
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds browser automation capabilities to PicoClaw agents through three new tools that wrap the ActionBook CLI: browser_search (find action manuals), browser_get (retrieve selectors and page structure), and browser (execute browser commands like open, click, fill, etc.). The implementation follows a thin CLI wrapper pattern similar to existing tools, adds configuration options for enabling/disabling the feature and controlling headless mode, and includes comprehensive unit tests with integration test support.
Changes:
- Added three browser automation tools wrapping the ActionBook CLI with action allowlisting for security
- Added BrowserConfig to configuration with Enabled and Headless options (both default to true)
- Registered tools in agent loop with config-gating for conditional enablement
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| pkg/tools/browser.go | Implements BrowserSearchTool, BrowserGetTool, and BrowserTool with CLI wrapping, action validation, and parameter handling |
| pkg/tools/browser_test.go | Comprehensive unit tests for all three tools covering validation, allowlisting, case sensitivity, and integration scenarios |
| pkg/config/config.go | Adds BrowserConfig struct with Enabled and Headless fields, including environment variable bindings and defaults |
| pkg/agent/loop.go | Registers browser tools in createToolRegistry() with config-gating using cfg.Tools.Browser.Enabled |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| case "cookies": | ||
| // cookies sub-actions via value: list, get <name>, set <name> <val>, delete <name>, clear | ||
| if val, ok := args["value"].(string); ok && val != "" { | ||
| cmdArgs = append(cmdArgs, strings.Fields(val)...) |
There was a problem hiding this comment.
Using strings.Fields on user input can enable command injection. The value parameter is split on whitespace and all resulting tokens are appended as separate command arguments. An attacker could craft a value like "list --some-flag" which would be interpreted as multiple arguments to the actionbook CLI. This could potentially allow execution of unintended commands or options. Consider parsing cookie sub-actions more explicitly (e.g., using a structured format like action:name:value) or validating/sanitizing the value before splitting.
| cmdArgs = append(cmdArgs, strings.Fields(val)...) | |
| fields := strings.Fields(val) | |
| if len(fields) == 0 { | |
| return ErrorResult("cookies action requires a sub-action, e.g. 'list', 'get <name>', 'set <name> <value>'") | |
| } | |
| sub := fields[0] | |
| switch sub { | |
| case "list", "clear": | |
| if len(fields) != 1 { | |
| return ErrorResult(fmt.Sprintf("cookies %s does not take additional arguments", sub)) | |
| } | |
| cmdArgs = append(cmdArgs, sub) | |
| case "get", "delete": | |
| if len(fields) != 2 { | |
| return ErrorResult(fmt.Sprintf("cookies %s requires exactly one cookie name argument", sub)) | |
| } | |
| name := fields[1] | |
| if name == "" || strings.HasPrefix(name, "-") { | |
| return ErrorResult("invalid cookie name for cookies " + sub) | |
| } | |
| cmdArgs = append(cmdArgs, sub, name) | |
| case "set": | |
| if len(fields) < 3 { | |
| return ErrorResult("cookies set requires a cookie name and value") | |
| } | |
| name := fields[1] | |
| if name == "" || strings.HasPrefix(name, "-") { | |
| return ErrorResult("invalid cookie name for cookies set") | |
| } | |
| // Allow spaces in the cookie value by joining remaining fields. | |
| value := strings.Join(fields[2:], " ") | |
| if value == "" || strings.HasPrefix(value, "-") { | |
| return ErrorResult("invalid cookie value for cookies set") | |
| } | |
| cmdArgs = append(cmdArgs, sub, name, value) | |
| default: | |
| return ErrorResult("unsupported cookies sub-action: " + sub) | |
| } |
| actionID, ok := args["action_id"].(string) | ||
| if !ok || actionID == "" { | ||
| return ErrorResult("action_id is required") | ||
| } | ||
|
|
||
| output, err := runActionbook(ctx, 30*time.Second, "get", actionID, "--json") |
There was a problem hiding this comment.
The action_id parameter is passed directly to the actionbook CLI without validation or sanitization. If an attacker can control this value (e.g., through a malicious browser_search result or by directly calling this tool), they could potentially inject additional command arguments. Consider validating that action_id matches an expected format (e.g., only contains alphanumeric characters, dots, colons, slashes, and hyphens) before passing it to the command.
| cmdArgs := []string{"search", query, "--json"} | ||
| if domain, ok := args["domain"].(string); ok && domain != "" { | ||
| cmdArgs = append(cmdArgs, "--domain", domain) | ||
| } |
There was a problem hiding this comment.
The query and domain parameters are passed directly to the actionbook CLI without validation or sanitization. While exec.CommandContext provides some protection against shell injection by passing arguments separately, malicious input could still cause unexpected behavior if the actionbook CLI doesn't handle special characters properly. Consider validating or sanitizing these inputs, especially if they could contain shell metacharacters, null bytes, or other potentially problematic content.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Closing this PR - replacing with a different approach |
Goal
Enable PicoClaw agents to perform browser automation (opening pages, clicking elements, filling forms, taking screenshots, extracting text) via native tool calls.
Changes
Three new tools wrapping the ActionBook CLI:
browser_searchactionbook searchbrowser_getactionbook getbrowseractionbook browser *Workflow:
browser_search->browser_get->browser(open -> interact -> close)New Files
pkg/tools/browser.go- Three tool structs implementing the Tool interfacepkg/tools/browser_test.go- 18 unit tests + 1 integration testModified Files
pkg/config/config.go- Added BrowserConfig struct (enabled, headless)pkg/agent/loop.go- Registered tools in createToolRegistry()Design Decisions
tools.browser.enabled(default: true),tools.browser.headless(default: true)Testing
All 18 unit tests pass. Integration test with live actionbook search passes.
Verified end-to-end with
picoclaw agent- the LLM successfully used the browser tool to open example.com, extract page text, and close the browser.Requirements
Requires
actionbookCLI (v0.7.3+) in PATH. If not found, tools return a clear error without crashing.