Skip to content

feat: add ActionBook browser automation tools#308

Closed
Danieldd28 wants to merge 6 commits intosipeed:mainfrom
Danieldd28:feat/actionbook-browser-tools
Closed

feat: add ActionBook browser automation tools#308
Danieldd28 wants to merge 6 commits intosipeed:mainfrom
Danieldd28:feat/actionbook-browser-tools

Conversation

@Danieldd28
Copy link
Collaborator

Goal

Enable PicoClaw agents to perform browser automation (opening pages, clicking elements, filling forms, taking screenshots, extracting text) via native tool calls.

Changes

Three new tools wrapping the ActionBook CLI:

Tool Purpose CLI Command
browser_search Find action manuals for a website/task actionbook search
browser_get Get CSS selectors and page structure by ID actionbook get
browser Execute browser commands (open, click, fill, text, etc.) actionbook browser *

Workflow: browser_search -> browser_get -> browser (open -> interact -> close)

New Files

  • pkg/tools/browser.go - Three tool structs implementing the Tool interface
  • pkg/tools/browser_test.go - 18 unit tests + 1 integration test

Modified Files

  • pkg/config/config.go - Added BrowserConfig struct (enabled, headless)
  • pkg/agent/loop.go - Registered tools in createToolRegistry()

Design Decisions

  • Zero new dependencies - thin CLI wrapper using os/exec, same pattern as ExecTool
  • Action allowlist - only whitelisted browser actions are permitted (no arbitrary CLI injection)
  • Config-gated - tools.browser.enabled (default: true), tools.browser.headless (default: true)
  • Lean - ~310 lines, lighter than web.go (436 lines)

Testing

All 18 unit tests pass. Integration test with live actionbook search passes.
Verified end-to-end with picoclaw agent - the LLM successfully used the browser tool to open example.com, extract page text, and close the browser.

Requirements

Requires actionbook CLI (v0.7.3+) in PATH. If not found, tools return a clear error without crashing.

Copilot AI review requested due to automatic review settings February 16, 2026 12:44
Integrate ActionBook CLI as three native browser automation tools:
- browser_search: find action manuals for websites
- browser_get: retrieve CSS selectors and page structure by ID
- browser: execute browser commands (open, click, fill, text, etc.)

Zero new dependencies — thin CLI wrapper using os/exec.
Config-gated via tools.browser.enabled (default: true).
18 unit tests + 1 integration test, all passing.

Closes: browser automation support for agent loop
@Danieldd28 Danieldd28 force-pushed the feat/actionbook-browser-tools branch from d43cf98 to 62d2c5e Compare February 16, 2026 12:48
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds browser automation capabilities to PicoClaw agents through three new tools that wrap the ActionBook CLI: browser_search (find action manuals), browser_get (retrieve selectors and page structure), and browser (execute browser commands like open, click, fill, etc.). The implementation follows a thin CLI wrapper pattern similar to existing tools, adds configuration options for enabling/disabling the feature and controlling headless mode, and includes comprehensive unit tests with integration test support.

Changes:

  • Added three browser automation tools wrapping the ActionBook CLI with action allowlisting for security
  • Added BrowserConfig to configuration with Enabled and Headless options (both default to true)
  • Registered tools in agent loop with config-gating for conditional enablement

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
pkg/tools/browser.go Implements BrowserSearchTool, BrowserGetTool, and BrowserTool with CLI wrapping, action validation, and parameter handling
pkg/tools/browser_test.go Comprehensive unit tests for all three tools covering validation, allowlisting, case sensitivity, and integration scenarios
pkg/config/config.go Adds BrowserConfig struct with Enabled and Headless fields, including environment variable bindings and defaults
pkg/agent/loop.go Registers browser tools in createToolRegistry() with config-gating using cfg.Tools.Browser.Enabled

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

case "cookies":
// cookies sub-actions via value: list, get <name>, set <name> <val>, delete <name>, clear
if val, ok := args["value"].(string); ok && val != "" {
cmdArgs = append(cmdArgs, strings.Fields(val)...)
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using strings.Fields on user input can enable command injection. The value parameter is split on whitespace and all resulting tokens are appended as separate command arguments. An attacker could craft a value like "list --some-flag" which would be interpreted as multiple arguments to the actionbook CLI. This could potentially allow execution of unintended commands or options. Consider parsing cookie sub-actions more explicitly (e.g., using a structured format like action:name:value) or validating/sanitizing the value before splitting.

Suggested change
cmdArgs = append(cmdArgs, strings.Fields(val)...)
fields := strings.Fields(val)
if len(fields) == 0 {
return ErrorResult("cookies action requires a sub-action, e.g. 'list', 'get <name>', 'set <name> <value>'")
}
sub := fields[0]
switch sub {
case "list", "clear":
if len(fields) != 1 {
return ErrorResult(fmt.Sprintf("cookies %s does not take additional arguments", sub))
}
cmdArgs = append(cmdArgs, sub)
case "get", "delete":
if len(fields) != 2 {
return ErrorResult(fmt.Sprintf("cookies %s requires exactly one cookie name argument", sub))
}
name := fields[1]
if name == "" || strings.HasPrefix(name, "-") {
return ErrorResult("invalid cookie name for cookies " + sub)
}
cmdArgs = append(cmdArgs, sub, name)
case "set":
if len(fields) < 3 {
return ErrorResult("cookies set requires a cookie name and value")
}
name := fields[1]
if name == "" || strings.HasPrefix(name, "-") {
return ErrorResult("invalid cookie name for cookies set")
}
// Allow spaces in the cookie value by joining remaining fields.
value := strings.Join(fields[2:], " ")
if value == "" || strings.HasPrefix(value, "-") {
return ErrorResult("invalid cookie value for cookies set")
}
cmdArgs = append(cmdArgs, sub, name, value)
default:
return ErrorResult("unsupported cookies sub-action: " + sub)
}

Copilot uses AI. Check for mistakes.
Comment on lines +161 to +166
actionID, ok := args["action_id"].(string)
if !ok || actionID == "" {
return ErrorResult("action_id is required")
}

output, err := runActionbook(ctx, 30*time.Second, "get", actionID, "--json")
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The action_id parameter is passed directly to the actionbook CLI without validation or sanitization. If an attacker can control this value (e.g., through a malicious browser_search result or by directly calling this tool), they could potentially inject additional command arguments. Consider validating that action_id matches an expected format (e.g., only contains alphanumeric characters, dots, colons, slashes, and hyphens) before passing it to the command.

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +119
cmdArgs := []string{"search", query, "--json"}
if domain, ok := args["domain"].(string); ok && domain != "" {
cmdArgs = append(cmdArgs, "--domain", domain)
}
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query and domain parameters are passed directly to the actionbook CLI without validation or sanitization. While exec.CommandContext provides some protection against shell injection by passing arguments separately, malicious input could still cause unexpected behavior if the actionbook CLI doesn't handle special characters properly. Consider validating or sanitizing these inputs, especially if they could contain shell metacharacters, null bytes, or other potentially problematic content.

Copilot uses AI. Check for mistakes.
Danieldd28 and others added 5 commits February 16, 2026 20:24
@Danieldd28
Copy link
Collaborator Author

Closing this PR - replacing with a different approach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant