Skip to content

Integrate Chrome DevTools Protocol CLI Tool for Browser Automation #7

@ballidev

Description

@ballidev

Integrate Chrome DevTools Protocol CLI Tool for Browser Automation

Summary

Request to integrate chrome-ws (Chrome DevTools Protocol CLI tool) into the browser container system as a lightweight alternative to Playwright for browser automation and testing.

Background

The Playwright MCP currently has significant performance issues when dealing with complex web pages, returning massive YAML snapshots (100KB+) that consume excessive context and are impractical for automation tasks. The chrome-ws tool provides a zero-dependency alternative that works directly with Chrome via DevTools Protocol.

Upstream Project

  • Repository: https://github.com/obra/superpowers-chrome
  • Tool Location: skills/browsing/ directory
  • Key Files:
    • chrome-ws - Main executable (bash script)
    • chrome-ws-lib.js - Core library
    • Documentation: SKILL.md, COMMANDLINE-USAGE.md, EXAMPLES.md

Technical Requirements

Dependencies

  • Node.js 16+ (already available)
  • Chrome with remote debugging port enabled
  • No npm packages required (zero-dependency design)

Wayland Compatibility

Successfully tested on Wayland with the following Chrome flags:

google-chrome \
  --remote-debugging-port=9222 \
  --no-sandbox \
  --enable-features=UseOzonePlatform \
  --ozone-platform=wayland \
  --user-data-dir=/path/to/profile \
  --no-first-run \
  --no-default-browser-check \
  --disable-default-apps \
  --disable-sync

Environment Variables

export DISPLAY=:0
export WAYLAND_DISPLAY=wayland-0

Key Advantages

  1. Minimal Output: Returns clean text instead of massive YAML structures

    • Example.com extraction: "Example Domain" (8 bytes)
    • vs Playwright MCP: 100KB+ YAML snapshot
  2. Performance:

    • Complex page extraction: 0.05 seconds
    • Amazon homepage: 84 lines, 4KB output
  3. Simplicity:

    • Direct bash commands
    • No complex dependencies
    • Works with existing browser sessions
  4. Functionality:

    • Navigation and page loading
    • Content extraction (text, HTML, markdown)
    • Element interaction (click, type, select)
    • JavaScript evaluation
    • Cookie/authentication support
    • Multi-tab management
    • Screenshot capture

Proven Use Cases

Successfully demonstrated:

  • Extracting structured data from HackerNews (story titles)
  • Navigating between pages with visual confirmation
  • Clicking links and interacting with elements
  • Running JavaScript to extract specific DOM elements
  • Exporting pages as markdown

Implementation Options

Option 1: Install as-is

  • Clone upstream repository
  • Symlink chrome-ws and chrome-ws-lib.js to /usr/local/bin/
  • Add to container build process

Option 2: Fork and customize (Recommended)

  • Fork repository for our specific needs
  • Add Wayland-specific wrapper script
  • Pre-configure chrome launch flags for container environment
  • Add integration with existing test frameworks
  • Maintain as part of fedora-desktop tooling

Recommended Approach

Create a fork with the following customizations:

  1. Wrapper Script (/usr/local/bin/chrome-ws-wayland):

    • Auto-configures Wayland environment variables
    • Provides sensible defaults for Chrome flags
    • Handles profile management for ephemeral containers
  2. Helper Functions:

    • chrome-start - Launch Chrome with correct flags
    • chrome-stop - Clean shutdown
    • chrome-auth - Set authentication cookies via JavaScript
  3. Integration:

    • Add to browser container image
    • Document usage patterns for test automation
    • Provide examples for common scenarios

Testing Confirmation

All features tested and working on Fedora/Wayland:

  • ✅ Chrome launches in headed mode with Wayland
  • ✅ Remote debugging port accessible
  • ✅ CLI commands work correctly
  • ✅ Content extraction produces minimal output
  • ✅ Interactive browser control functions properly
  • ✅ No popup dialogs when using suppression flags

Questions for Discussion

  1. Should we install upstream as-is or maintain a fork?
  2. Where should the tool be installed? (/usr/local/bin/, /opt/chrome-ws/, etc.)
  3. Should Chrome launch be automatic or manual in tests?
  4. Do we need additional wrapper scripts for common operations?
  5. Should this replace Playwright MCP entirely or complement it?

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions