Skip to content

feat: browser extension backport#100

Open
IRONICBo wants to merge 9 commits intomemovai:mainfrom
IRONICBo:feat/browser-extension-backport
Open

feat: browser extension backport#100
IRONICBo wants to merge 9 commits intomemovai:mainfrom
IRONICBo:feat/browser-extension-backport

Conversation

@IRONICBo
Copy link
Copy Markdown
Member

@IRONICBo IRONICBo commented Feb 26, 2026

Summary

  • add browser-use tool client and operator flow
  • introduce browser extension bridge and UI assets

Changes

  • add browser-use toolchain and RPC plumbing
  • add browser extension assets under extension/

Issues

This PR is cherry-picked from #38, and only contains the Chrome extension. cc @crispyberry

Summary by CodeRabbit

  • New Features

    • Preview browser extension for automated browsing (navigate, back/forward, click, fill, scroll) with an on-page plan overlay and a popup UI for connection/status.
    • Local Python simulator to run and validate browser automation loops with LLM-driven action planning.
  • Documentation

    • Added Browser Extension (Preview) setup and usage guides in English, Chinese, and Japanese.
    • Included example env/config guidance.
  • Chores

    • Updated ignore rules for extension artifacts.

Signed-off-by: Asklv <boironic@gmail.com>
Signed-off-by: Asklv <boironic@gmail.com>
Signed-off-by: Asklv <boironic@gmail.com>
Signed-off-by: Asklv <boironic@gmail.com>
Signed-off-by: Asklv <boironic@gmail.com>
Signed-off-by: Asklv <boironic@gmail.com>
Signed-off-by: Asklv <boironic@gmail.com>
Signed-off-by: Asklv <boironic@gmail.com>
Signed-off-by: Asklv <boironic@gmail.com>
@IRONICBo IRONICBo requested a review from crispyberry February 26, 2026 16:45
@IRONICBo IRONICBo self-assigned this Feb 26, 2026
@IRONICBo IRONICBo added enhancement New feature or request ready to merge labels Feb 26, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 26, 2026

📝 Walkthrough

Walkthrough

This PR adds a preview Chrome MV3 extension and a Python ESP32 simulation server enabling LLM-driven browser automation via a WebSocket bridge. It includes extension background/offscreen/service worker logic, content script DOM snapshotting and action execution, popup UI, offscreen WebSocket client, server-side BridgeState/RPC, and documentation updates.

Changes

Cohort / File(s) Summary
Configuration & Env
/.gitignore, extension/.env.example, extension/requirements.txt
Add ignore patterns for extension env and pycache, example env with OPENAI key/model, and Python deps (openai, websockets).
Top-level Documentation
README.md, README_CN.md, README_JA.md
Insert Browser Extension (Preview) documentation (installation, scope, workflow) in three locales; content duplicated in two places in each file.
Extension README
extension/README.md
New detailed README describing the experimental bridge, install/run steps, message protocol, supported actions, and troubleshooting notes.
Extension Manifest & Static
extension/manifest.json, extension/offscreen.html
Add MV3 manifest, permissions, content script registration, action popup; add minimal offscreen HTML for persistent bridge.
Background Service Worker
extension/background.js
New background/service_worker implementing offscreen management, WS integration, DOM snapshot fallbacks, action execution (navigate/back/forward/click/fill/scroll), planning state, message routing, and lifecycle hooks (many fallback and error-handling paths).
Offscreen Client
extension/offscreen.js
New offscreen WebSocket client with reconnect/heartbeat, command deduping/caching, debounced reconfiguration, and status persistence.
Content Script & Overlay
extension/content.js
New content script for DOM snapshotting, element discovery, Twitter-specific helpers, action execution with contenteditable and composer fallbacks, and a shadow-DOM plan overlay updated via extension messages.
Popup UI
extension/popup.html, extension/popup.css, extension/popup.js
Add popup UI and styles to show status, toggle listener, set WebSocket IP, and trigger reconnect; periodic refresh and message handlers.
Python ESP32 Simulator
extension/esp32_sim_server.py
New WebSocket server and BridgeState RPC system, Esp32Agent that queries OpenAI for JSON-constrained actions, goal execution loop with retries/backtracking, Twitter-specific checks, and CLI input loop.
Small additions
extension/.env.example (already listed), extension/requirements.txt (already listed)
Environment and dependency examples added to support the simulator.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as ESP32 Agent
    participant Bridge as BridgeState<br/>(WebSocket Server)
    participant Ext as Extension Background/Offscreen
    participant Content as Content Script
    participant Browser as Browser DOM
    participant LLM as OpenAI API

    Agent->>Bridge: rpc(get_dom_snapshot)
    Bridge->>Ext: send RPC request (via WS)
    Ext->>Content: GET_DOM_SNAPSHOT message
    Content->>Browser: Query DOM & elements
    Content-->>Ext: domSnapshot (command_result)
    Ext-->>Bridge: relay command_result
    Bridge-->>Agent: DOM snapshot response

    Agent->>LLM: _ask_llm(goal, dom_snapshot)
    LLM-->>Agent: action JSON (navigate/click/fill/scroll)

    Agent->>Bridge: rpc(execute_action {action})
    Bridge->>Ext: send RPC request
    Ext->>Content: EXECUTE_ACTION message
    Content->>Browser: Perform action
    Content-->>Ext: action result
    Ext-->>Bridge: command_result
    Bridge-->>Agent: action execution result

    Agent->>Bridge: notify(step status)
    Bridge->>Ext: agent_status broadcast
    Ext->>Content: PLAN_UPDATE
    Content->>Browser: Update overlay UI
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped a bridge from code to page,
Sent dom-snaps dancing, step by stage,
WebSocket whispers, LLM plans the play,
Clicks and fills in a gentle relay,
A tiny rabbit cheers — automation day!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: browser extension backport' directly describes the main change: adding a browser extension, which comprises the majority of the changeset.
Linked Issues check ✅ Passed The PR implements browser-use integration (#35) through a Chrome extension bridge with RPC plumbing, WebSocket communication, offscreen documents, and LLM-driven browser automation—fulfilling all coding requirements for tool invocation, browser bridge/control, and configuration.
Out of Scope Changes check ✅ Passed All changes are scoped to the extension/ directory plus documentation updates (README.md, README_CN.md, README_JA.md) and .gitignore, all directly supporting the browser extension backport objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@IRONICBo IRONICBo force-pushed the feat/browser-extension-backport branch from bcee638 to 537520c Compare February 26, 2026 16:48
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
extension/.env.example (1)

2-2: Use a non-secret-looking API key placeholder.

A neutral placeholder avoids secret-scanner false positives and reduces accidental misuse.

✏️ Suggested tweak
-OPENAI_API_KEY=sk-xxxx
+OPENAI_API_KEY=your_openai_api_key_here
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/.env.example` at line 2, Replace the secret-looking placeholder
value "sk-xxxx" for the OPENAI_API_KEY entry with a neutral, non-secret
placeholder (e.g., "YOUR_OPENAI_API_KEY" or "REPLACE_WITH_OPENAI_KEY") so the
environment example no longer resembles a real API key; update the
OPENAI_API_KEY line in the .env example accordingly.
extension/popup.html (1)

24-27: Associate the hint text with the IP input for accessibility.

Linking the hint improves screen-reader context and input guidance.

♿ Proposed change
-      <input id="wsIp" type="text" placeholder="127.0.0.1" />
+      <input id="wsIp" type="text" placeholder="127.0.0.1" aria-describedby="wsIpHint" />
       <button id="reconnect" class="primary">Reconnect</button>
 
-      <p class="hint">Fixed format: <code>ws://&lt;IP&gt;:8765/ws</code></p>
+      <p id="wsIpHint" class="hint">Fixed format: <code>ws://&lt;IP&gt;:8765/ws</code></p>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/popup.html` around lines 24 - 27, The hint paragraph isn't
associated with the IP input, hurting screen-reader context; add an id to the
hint element (e.g., "wsIpHint") and reference that id from the input with
aria-describedby="wsIpHint" on the element with id "wsIp" so assistive tech
reads the fixed-format hint when the input is focused; ensure the hint element
keeps class "hint" and the code sample remains inside it.
extension/requirements.txt (1)

1-2: Consider bounding major versions for runtime stability.

Using only >= can introduce breaking upgrades unexpectedly in fresh installs. Both openai and websockets have documented breaking changes at major version boundaries: openai 1.x→2.x changes the type signature of ResponseFunctionToolCallOutputItem.output, and websockets 12.x→13.x deprecates handler signatures, requires Python ≥3.8, and renames APIs. Prefer bounded ranges (or a lock file) for deterministic behavior.

♻️ Proposed change
-openai>=1.40.0
-websockets>=12.0
+openai>=1.40.0,<2.0.0
+websockets>=12.0,<13.0.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/requirements.txt` around lines 1 - 2, The requirements file allows
unbounded major upgrades for openai and websockets which can introduce breaking
changes; update the constraints for the two packages (openai and websockets) to
bound their major versions (for example change openai to allow 1.x only and
websockets to allow 12.x only) or switch to an exact lock file, ensuring
installs remain deterministic and avoid automatic upgrades to incompatible major
releases.
extension/esp32_sim_server.py (1)

58-84: RPC retry logic is robust but naming could be clearer.

The retry loop range(1, retries + 2) with retries=2 gives 3 total attempts, which is correct. However, the parameter name retries might be clearer as max_retries to indicate total retry count after the first attempt.

The use of asyncio.shield(fut) on line 77 correctly prevents the future from being cancelled during timeout, allowing potential late responses to still resolve the future.

Consider renaming for clarity
-  async def rpc(self, payload: Dict[str, Any], timeout: float = 15.0, retries: int = 2) -> Dict[str, Any]:
+  async def rpc(self, payload: Dict[str, Any], timeout: float = 15.0, max_retries: int = 2) -> Dict[str, Any]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/esp32_sim_server.py` around lines 58 - 84, The rpc method's retry
parameter name is confusing: rename the parameter retries to max_retries in the
async def rpc signature and update all uses inside rpc (including the retry loop
range(1, max_retries + 2), the attempt comparison that raises TimeoutError, and
any logging/diagnostic text referencing retries) so the intent (number of
retries after the first attempt) is clear; ensure PendingRequest handling
(self.pending, request_id) and the asyncio.shield(fut) usage remain unchanged.
extension/background.js (1)

292-297: Auto-creating a Google tab may be unexpected user behavior.

When all tabs are on restricted URLs (like chrome://), the code automatically creates a new tab to https://www.google.com. While this enables the automation to proceed, it could be surprising to users.

Consider documenting this behavior or showing a notification, though for a preview/experimental feature this is acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/background.js` around lines 292 - 297, The code automatically
creates a visible Google tab via chrome.tabs.create (the created variable) when
all tabs are restricted; update the logic around chrome.tabs.create and
waitForTabComplete to avoid surprising the user by either (a) creating the tab
as inactive (set active:false) or (b) showing a user-facing notification via
chrome.notifications.create before creating the tab and only proceeding if the
user accepts; locate the chrome.tabs.create call and the waitForTabComplete
usage and implement one of these behaviors and add a short comment documenting
the intentional UX choice.
extension/manifest.json (1)

13-17: Remove unnecessary WebSocket host permissions.

In Chrome MV3, WebSocket connections from offscreen documents are controlled by the extension's Content Security Policy (connect-src directive), not by host_permissions. Since this extension doesn't use the chrome.webRequest API (which would require explicit host permissions for WebSocket interception), the ws://*/* and wss://*/* entries in host_permissions are unnecessary. Removing lines 15–16 follows the principle of least privilege.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/manifest.json` around lines 13 - 17, Remove the unnecessary
WebSocket host permissions from the manifest: delete the "ws://*/*" and
"wss://*/*" entries inside the host_permissions array in manifest.json and leave
only the required host(s) (e.g., "<all_urls>" if needed); verify any WebSocket
allowances are governed by your Content Security Policy's connect-src directive
rather than host_permissions and confirm you are not relying on
chrome.webRequest for WebSocket interception.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@extension/popup.css`:
- Around line 27-32: The CSS rule for `#detail` uses the deprecated property
"word-break: break-word"; replace that declaration with the standards-based
"overflow-wrap: anywhere" in the `#detail` selector so long words wrap correctly;
update the `#detail` rule (selector "#detail") to remove "word-break: break-word"
and add "overflow-wrap: anywhere" instead.

---

Nitpick comments:
In `@extension/.env.example`:
- Line 2: Replace the secret-looking placeholder value "sk-xxxx" for the
OPENAI_API_KEY entry with a neutral, non-secret placeholder (e.g.,
"YOUR_OPENAI_API_KEY" or "REPLACE_WITH_OPENAI_KEY") so the environment example
no longer resembles a real API key; update the OPENAI_API_KEY line in the .env
example accordingly.

In `@extension/background.js`:
- Around line 292-297: The code automatically creates a visible Google tab via
chrome.tabs.create (the created variable) when all tabs are restricted; update
the logic around chrome.tabs.create and waitForTabComplete to avoid surprising
the user by either (a) creating the tab as inactive (set active:false) or (b)
showing a user-facing notification via chrome.notifications.create before
creating the tab and only proceeding if the user accepts; locate the
chrome.tabs.create call and the waitForTabComplete usage and implement one of
these behaviors and add a short comment documenting the intentional UX choice.

In `@extension/esp32_sim_server.py`:
- Around line 58-84: The rpc method's retry parameter name is confusing: rename
the parameter retries to max_retries in the async def rpc signature and update
all uses inside rpc (including the retry loop range(1, max_retries + 2), the
attempt comparison that raises TimeoutError, and any logging/diagnostic text
referencing retries) so the intent (number of retries after the first attempt)
is clear; ensure PendingRequest handling (self.pending, request_id) and the
asyncio.shield(fut) usage remain unchanged.

In `@extension/manifest.json`:
- Around line 13-17: Remove the unnecessary WebSocket host permissions from the
manifest: delete the "ws://*/*" and "wss://*/*" entries inside the
host_permissions array in manifest.json and leave only the required host(s)
(e.g., "<all_urls>" if needed); verify any WebSocket allowances are governed by
your Content Security Policy's connect-src directive rather than
host_permissions and confirm you are not relying on chrome.webRequest for
WebSocket interception.

In `@extension/popup.html`:
- Around line 24-27: The hint paragraph isn't associated with the IP input,
hurting screen-reader context; add an id to the hint element (e.g., "wsIpHint")
and reference that id from the input with aria-describedby="wsIpHint" on the
element with id "wsIp" so assistive tech reads the fixed-format hint when the
input is focused; ensure the hint element keeps class "hint" and the code sample
remains inside it.

In `@extension/requirements.txt`:
- Around line 1-2: The requirements file allows unbounded major upgrades for
openai and websockets which can introduce breaking changes; update the
constraints for the two packages (openai and websockets) to bound their major
versions (for example change openai to allow 1.x only and websockets to allow
12.x only) or switch to an exact lock file, ensuring installs remain
deterministic and avoid automatic upgrades to incompatible major releases.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e9a9211 and bcee638.

⛔ Files ignored due to path filters (5)
  • extension/icons/icon128.png is excluded by !**/*.png
  • extension/icons/icon16.png is excluded by !**/*.png
  • extension/icons/icon32.png is excluded by !**/*.png
  • extension/icons/icon48.png is excluded by !**/*.png
  • extension/mimibroswer.png is excluded by !**/*.png
📒 Files selected for processing (16)
  • .gitignore
  • README.md
  • README_CN.md
  • README_JA.md
  • extension/.env.example
  • extension/README.md
  • extension/background.js
  • extension/content.js
  • extension/esp32_sim_server.py
  • extension/manifest.json
  • extension/offscreen.html
  • extension/offscreen.js
  • extension/popup.css
  • extension/popup.html
  • extension/popup.js
  • extension/requirements.txt

Comment on lines +27 to +32
#detail {
margin: 0 0 10px;
font-size: 12px;
color: #6c7a89;
word-break: break-word;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

word-break break-word deprecated CSS specification

💡 Result:

word-break: break-word is a legacy, deprecated keyword for the word-break property, kept only for compatibility. The CSS Text Module Level 3 spec defines it as behaving the same as word-break: normal plus overflow-wrap: anywhere, regardless of whatever overflow-wrap is otherwise set to. [1][2]

What to use instead (standards-based):

  • Prefer overflow-wrap: anywhere (closest match to the deprecated behavior). [2]
  • If you want the more traditional “only break long tokens if needed” behavior, use overflow-wrap: break-word. (This is different from anywhere in some layout/min-content sizing behavior.) [2]

Recommendation: don’t use word-break: break-word; use overflow-wrap: anywhere (or overflow-wrap: break-word, depending on desired behavior). [1][2]


🏁 Script executed:

#!/bin/bash
# Check the file content and verify the code at lines 27-32
if [ -f extension/popup.css ]; then
  echo "=== File content at lines 27-32 ==="
  sed -n '27,32p' extension/popup.css
  echo ""
  echo "=== Full file for context ==="
  cat -n extension/popup.css
else
  echo "File not found"
fi

Repository: memovai/mimiclaw

Length of output: 3076


Replace deprecated word-break: break-word with overflow-wrap: anywhere.

word-break: break-word is deprecated per CSS Text Module Level 3 and should be replaced with the standards-based overflow-wrap: anywhere, which provides the same wrapping behavior.

✅ Proposed fix
 `#detail` {
   margin: 0 0 10px;
   font-size: 12px;
   color: `#6c7a89`;
-  word-break: break-word;
+  overflow-wrap: anywhere;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#detail {
margin: 0 0 10px;
font-size: 12px;
color: #6c7a89;
word-break: break-word;
}
`#detail` {
margin: 0 0 10px;
font-size: 12px;
color: `#6c7a89`;
overflow-wrap: anywhere;
}
🧰 Tools
🪛 Stylelint (17.3.0)

[error] 31-31: Unexpected deprecated keyword "break-word" for property "word-break" (declaration-property-value-keyword-no-deprecated)

(declaration-property-value-keyword-no-deprecated)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/popup.css` around lines 27 - 32, The CSS rule for `#detail` uses the
deprecated property "word-break: break-word"; replace that declaration with the
standards-based "overflow-wrap: anywhere" in the `#detail` selector so long words
wrap correctly; update the `#detail` rule (selector "#detail") to remove
"word-break: break-word" and add "overflow-wrap: anywhere" instead.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (1)
extension/popup.css (1)

27-32: ⚠️ Potential issue | 🟡 Minor

Replace deprecated word-break: break-word in #detail.

Line [31] still uses a deprecated keyword; switch to overflow-wrap: anywhere.

#!/bin/bash
rg -n 'word-break:\s*break-word|overflow-wrap:\s*anywhere' extension/popup.css
✅ Proposed fix
 `#detail` {
   margin: 0 0 10px;
   font-size: 12px;
   color: `#6c7a89`;
-  word-break: break-word;
+  overflow-wrap: anywhere;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/popup.css` around lines 27 - 32, In the `#detail` CSS rule replace
the deprecated "word-break: break-word" with the modern "overflow-wrap:
anywhere": locate the "#detail" selector in extension/popup.css, remove the
"word-break: break-word" line and add "overflow-wrap: anywhere" to ensure long
words break correctly across browsers.
🧹 Nitpick comments (1)
README_CN.md (1)

260-267: Consider relocating the browser extension section for better document flow.

Placing the "浏览器扩展(预览)" section after "其他功能" (Other Features) is unconventional, as "其他功能" typically serves as a catch-all section near the document's end. Consider one of these alternatives:

  • Option 1 (Recommended): Move this section to appear before "其他功能" (around line 249), making it a peer to other major feature sections.
  • Option 2: Convert it to a subsection (###) within "其他功能".
  • Option 3: Add a brief mention in the "其他功能" bullet list with a link to extension/README.md.

This would improve the logical flow of the documentation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README_CN.md` around lines 260 - 267, The "浏览器扩展(预览)" section is placed after
the "其他功能" section which breaks document flow; relocate or reframe it. Move the
entire "## 浏览器扩展(预览)" block (and its two bullets linking to extension/README.md
and notes) so it appears before the "其他功能" section (making it a peer feature
section), or alternatively convert it into a subsection "### 浏览器扩展(预览)" under
the existing "其他功能" header, or replace it with a single bullet entry linking to
extension/README.md within the "其他功能" list; update any adjacent headings to
preserve heading order and links.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@extension/background.js`:
- Around line 69-76: The fallback DOM snapshot currently uses el.value which
exposes live form data; update the mapping in the anonymous map (the one
producing objects with tag, selector, text, href, placeholder, contentEditable
and which calls selectorFor(el)) to stop reading el.value—use only el.innerText
(or el.textContent) for the text field (e.g., replace (el.innerText || el.value
|| "") with (el.innerText || "") ) so live input values are not captured,
keeping placeholder and contentEditable behaviors unchanged.

In `@extension/content.js`:
- Around line 488-507: The overlay timeline currently builds HTML with untrusted
values (s.action, s.reason, s.step) and assigns it to timeline.innerHTML,
enabling HTML injection; change the rendering to create DOM nodes and set their
textContent (or escape values) instead of interpolating into a
string—specifically replace the stepsHtml/loadingHtml string assembly and the
timeline.innerHTML assignment with code that iterates over steps, creates
elements for ".mb-step", ".mb-step-head" and ".mb-step-reason", sets
element.textContent using s.step/s.action/s.reason (falling back to "?" /
"unknown" / "(no reason)"), append those nodes to timeline, and only use
innerHTML for static, trusted markup if needed.

In `@extension/esp32_sim_server.py`:
- Around line 54-57: When clearing the extension socket in clear_extension, also
fail any in-flight RPC futures so they don't hang: locate the container of
outstanding RPCs (e.g., self.pending_rpcs / self.pending_futures /
self._pending_requests) and for each future set an exception (or cancel) with a
clear ConnectionError or RuntimeError before clearing the container; then set
self.extension_ws = None. Ensure you reference clear_extension and extension_ws
and clear the pending container after failing each future.
- Around line 455-463: Registration is currently unauthenticated: when handling
a incoming message with mtype == "register" and data.get("role") == "extension"
the code calls bridge.set_extension(websocket) and sets registered_once without
validating a token. Add an auth token check (e.g. compare data.get("token") to a
server-side secret/config) before calling bridge.set_extension; if the token is
missing or invalid, send a register_nack (or error) over websocket and close the
connection instead of registering. Update the registration branch that
references mtype, data, bridge.set_extension, registered_once, and the
websocket.send call to perform this validation and early reject unauthorized
attempts.
- Around line 71-83: The current retry loop resends the same payload on timeout,
risking duplicate non-idempotent browser actions (e.g.,
execute_action/click/fill/post). Modify the send/retry logic in the method that
uses self.extension_ws, payload, fut, request_id, retries and timeout so that
you only perform retries for idempotent RPC types: detect the RPC name from
payload (e.g., payload.get("method") or payload["method"]) and if it is a
non-idempotent action like "execute_action", "click", "fill", "post" then send
exactly once and on timeout raise TimeoutError immediately; otherwise keep the
existing retry loop and logging/attempt counting for idempotent methods. Ensure
fut is still awaited with asyncio.shield and that the error message still
includes request_id when raising TimeoutError.

In `@extension/manifest.json`:
- Around line 13-17: The manifest currently grants universal host access via
"host_permissions" and auto-injects on all pages (the "<all_urls>" entries and
content script "matches") — change this to require explicit opt-in: remove
"<all_urls>" from "host_permissions" and instead list only necessary default
domains (or leave empty) and add "<all_urls>" to "optional_permissions"; remove
or narrow the "matches" pattern in "content_scripts" so scripts are not
auto-injected on every site, and implement programmatic injection using
chrome.permissions.request (or browser.permissions.request) followed by
chrome.scripting.executeScript (or equivalent) after the user explicitly enables
the session/feature; update any background/extension UI flow to request the
optional permission when the user opts in and only then perform injection.

---

Duplicate comments:
In `@extension/popup.css`:
- Around line 27-32: In the `#detail` CSS rule replace the deprecated "word-break:
break-word" with the modern "overflow-wrap: anywhere": locate the "#detail"
selector in extension/popup.css, remove the "word-break: break-word" line and
add "overflow-wrap: anywhere" to ensure long words break correctly across
browsers.

---

Nitpick comments:
In `@README_CN.md`:
- Around line 260-267: The "浏览器扩展(预览)" section is placed after the "其他功能"
section which breaks document flow; relocate or reframe it. Move the entire "##
浏览器扩展(预览)" block (and its two bullets linking to extension/README.md and notes)
so it appears before the "其他功能" section (making it a peer feature section), or
alternatively convert it into a subsection "### 浏览器扩展(预览)" under the existing
"其他功能" header, or replace it with a single bullet entry linking to
extension/README.md within the "其他功能" list; update any adjacent headings to
preserve heading order and links.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcee638 and 537520c.

⛔ Files ignored due to path filters (5)
  • extension/icons/icon128.png is excluded by !**/*.png
  • extension/icons/icon16.png is excluded by !**/*.png
  • extension/icons/icon32.png is excluded by !**/*.png
  • extension/icons/icon48.png is excluded by !**/*.png
  • extension/mimibroswer.png is excluded by !**/*.png
📒 Files selected for processing (16)
  • .gitignore
  • README.md
  • README_CN.md
  • README_JA.md
  • extension/.env.example
  • extension/README.md
  • extension/background.js
  • extension/content.js
  • extension/esp32_sim_server.py
  • extension/manifest.json
  • extension/offscreen.html
  • extension/offscreen.js
  • extension/popup.css
  • extension/popup.html
  • extension/popup.js
  • extension/requirements.txt
✅ Files skipped from review due to trivial changes (1)
  • README_JA.md
🚧 Files skipped from review as they are similar to previous changes (7)
  • extension/offscreen.js
  • README.md
  • extension/popup.html
  • extension/popup.js
  • extension/offscreen.html
  • extension/requirements.txt
  • extension/README.md

Comment on lines +69 to +76
return Array.from(document.querySelectorAll(query)).slice(0, maxCount).map((el) => ({
tag: el.tagName.toLowerCase(),
selector: selectorFor(el),
text: (el.innerText || el.value || "").trim().slice(0, 120),
href: el.href || null,
placeholder: el.getAttribute("placeholder") || null,
contentEditable: Boolean(el.isContentEditable || el.getAttribute("contenteditable") === "true")
}));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fallback DOM snapshot also exposes live form values.

Line [72] includes el.value for inputs/textareas in the fallback snapshot path, so secrets can still leak even if primary content messaging fails.

🔒 Proposed fix (align fallback redaction with content script)
     return Array.from(document.querySelectorAll(query)).slice(0, maxCount).map((el) => ({
+      const tag = el.tagName.toLowerCase();
+      const type = (el.getAttribute("type") || "").toLowerCase();
+      const nameHint = `${el.getAttribute("name") || ""} ${el.id || ""}`.toLowerCase();
+      const isFormField = tag === "input" || tag === "textarea" || tag === "select";
+      const isSensitive =
+        type === "password" ||
+        /pass|pwd|token|otp|secret|ssn|card|cvv|email|phone/i.test(nameHint);
       tag: el.tagName.toLowerCase(),
       selector: selectorFor(el),
-      text: (el.innerText || el.value || "").trim().slice(0, 120),
+      text: isFormField || isSensitive ? "" : (el.innerText || "").trim().slice(0, 120),
       href: el.href || null,
       placeholder: el.getAttribute("placeholder") || null,
       contentEditable: Boolean(el.isContentEditable || el.getAttribute("contenteditable") === "true")
     }));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/background.js` around lines 69 - 76, The fallback DOM snapshot
currently uses el.value which exposes live form data; update the mapping in the
anonymous map (the one producing objects with tag, selector, text, href,
placeholder, contentEditable and which calls selectorFor(el)) to stop reading
el.value—use only el.innerText (or el.textContent) for the text field (e.g.,
replace (el.innerText || el.value || "") with (el.innerText || "") ) so live
input values are not captured, keeping placeholder and contentEditable behaviors
unchanged.

Comment on lines +45 to +55
function elementBrief(el) {
return {
tag: el.tagName.toLowerCase(),
selector: elementSelector(el),
text: (el.innerText || el.value || "").trim().slice(0, 120),
href: el.href || null,
ariaLabel: el.getAttribute("aria-label") || null,
role: el.getAttribute("role") || null,
placeholder: el.getAttribute("placeholder") || null,
contentEditable: Boolean(el.isContentEditable || el.getAttribute("contenteditable") === "true")
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

DOM snapshot currently leaks form values (including secrets).

Line [49] reads el.value for interactive elements selected from Line [62], so typed values (passwords, OTPs, emails, tokens) can be exfiltrated in snapshots.

🔒 Proposed fix (redact form field values)
 function elementBrief(el) {
+  const tag = el.tagName.toLowerCase();
+  const type = (el.getAttribute("type") || "").toLowerCase();
+  const nameHint = `${el.getAttribute("name") || ""} ${el.id || ""}`.toLowerCase();
+  const isFormField = tag === "input" || tag === "textarea" || tag === "select";
+  const isSensitive =
+    type === "password" ||
+    /pass|pwd|token|otp|secret|ssn|card|cvv|email|phone/i.test(nameHint);
   return {
-    tag: el.tagName.toLowerCase(),
+    tag,
     selector: elementSelector(el),
-    text: (el.innerText || el.value || "").trim().slice(0, 120),
+    text: isFormField || isSensitive ? "" : (el.innerText || "").trim().slice(0, 120),
     href: el.href || null,
     ariaLabel: el.getAttribute("aria-label") || null,
     role: el.getAttribute("role") || null,

Also applies to: 58-72

Comment on lines +488 to +507
const stepsHtml = steps
.map(
(s) => `
<div class="mb-step">
<div class="mb-step-head">Step ${s.step || "?"} · ${s.action || "unknown"}</div>
<div class="mb-step-reason">${s.reason || "(no reason)"}</div>
</div>
`
)
.join("");
const loadingHtml =
status === "running"
? `
<div class="mb-loading">
Waiting next step
<span class="mb-dots"><i></i><i></i><i></i></span>
</div>
`
: "";
timeline.innerHTML = stepsHtml + loadingHtml;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid HTML injection in overlay timeline rendering.

Line [507] writes model/page-derived fields into innerHTML (s.action, s.reason), which allows untrusted markup injection. Render with textContent (or escape before interpolation).

🛡️ Proposed fix (escape before interpolation)
   const timeline = root.getElementById("mb-timeline");
+  const esc = (v) =>
+    String(v ?? "").replace(/[&<>"']/g, (ch) => ({
+      "&": "&amp;",
+      "<": "&lt;",
+      ">": "&gt;",
+      '"': "&quot;",
+      "'": "&#39;"
+    }[ch]));
   const stepsHtml = steps
     .map(
       (s) => `
       <div class="mb-step">
-        <div class="mb-step-head">Step ${s.step || "?"} · ${s.action || "unknown"}</div>
-        <div class="mb-step-reason">${s.reason || "(no reason)"}</div>
+        <div class="mb-step-head">Step ${esc(s.step || "?")} · ${esc(s.action || "unknown")}</div>
+        <div class="mb-step-reason">${esc(s.reason || "(no reason)")}</div>
       </div>
     `
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/content.js` around lines 488 - 507, The overlay timeline currently
builds HTML with untrusted values (s.action, s.reason, s.step) and assigns it to
timeline.innerHTML, enabling HTML injection; change the rendering to create DOM
nodes and set their textContent (or escape values) instead of interpolating into
a string—specifically replace the stepsHtml/loadingHtml string assembly and the
timeline.innerHTML assignment with code that iterates over steps, creates
elements for ".mb-step", ".mb-step-head" and ".mb-step-reason", sets
element.textContent using s.step/s.action/s.reason (falling back to "?" /
"unknown" / "(no reason)"), append those nodes to timeline, and only use
innerHTML for static, trusted markup if needed.

Comment on lines +54 to +57
def clear_extension(self, ws: Any) -> None:
if self.extension_ws is ws:
self.extension_ws = None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail pending RPCs immediately on extension disconnect.

When the extension socket drops, in-flight futures stay pending until timeout. That delays recovery and amplifies cascading retries.

⚙️ Proposed fix
   def clear_extension(self, ws: Any) -> None:
     if self.extension_ws is ws:
       self.extension_ws = None
+      for pending in self.pending.values():
+        if not pending.future.done():
+          pending.future.set_exception(ConnectionError("extension_disconnected"))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def clear_extension(self, ws: Any) -> None:
if self.extension_ws is ws:
self.extension_ws = None
def clear_extension(self, ws: Any) -> None:
if self.extension_ws is ws:
self.extension_ws = None
for pending in self.pending.values():
if not pending.future.done():
pending.future.set_exception(ConnectionError("extension_disconnected"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/esp32_sim_server.py` around lines 54 - 57, When clearing the
extension socket in clear_extension, also fail any in-flight RPC futures so they
don't hang: locate the container of outstanding RPCs (e.g., self.pending_rpcs /
self.pending_futures / self._pending_requests) and for each future set an
exception (or cancel) with a clear ConnectionError or RuntimeError before
clearing the container; then set self.extension_ws = None. Ensure you reference
clear_extension and extension_ws and clear the pending container after failing
each future.

Comment on lines +71 to +83
for attempt in range(1, retries + 2):
if self.extension_ws is None:
raise RuntimeError("Extension disconnected while waiting for command result.")

await self.extension_ws.send(json.dumps(payload, ensure_ascii=False))
try:
result = await asyncio.wait_for(asyncio.shield(fut), timeout=timeout)
return result
except asyncio.TimeoutError:
if attempt >= retries + 1:
raise TimeoutError(f"RPC timeout request_id={request_id} after {attempt} attempts") from None
print(f"[server] rpc retry request_id={request_id} attempt={attempt + 1}")
finally:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

RPC timeout retries can duplicate non-idempotent browser actions.

Line [75] resends the same command on timeout. For execute_action (click/fill/post), this can trigger duplicate side effects.

🧭 Proposed fix (retry only idempotent RPC types)
   async def rpc(self, payload: Dict[str, Any], timeout: float = 15.0, retries: int = 2) -> Dict[str, Any]:
@@
-    try:
-      for attempt in range(1, retries + 2):
+    is_idempotent = payload.get("type") in {"get_dom_snapshot"}
+    max_attempts = (retries + 1) if is_idempotent else 1
+    try:
+      for attempt in range(1, max_attempts + 1):
@@
-        except asyncio.TimeoutError:
-          if attempt >= retries + 1:
+        except asyncio.TimeoutError:
+          if attempt >= max_attempts:
             raise TimeoutError(f"RPC timeout request_id={request_id} after {attempt} attempts") from None
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 73-73: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 78-78: Consider moving this statement to an else block

(TRY300)


[warning] 81-81: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/esp32_sim_server.py` around lines 71 - 83, The current retry loop
resends the same payload on timeout, risking duplicate non-idempotent browser
actions (e.g., execute_action/click/fill/post). Modify the send/retry logic in
the method that uses self.extension_ws, payload, fut, request_id, retries and
timeout so that you only perform retries for idempotent RPC types: detect the
RPC name from payload (e.g., payload.get("method") or payload["method"]) and if
it is a non-idempotent action like "execute_action", "click", "fill", "post"
then send exactly once and on timeout raise TimeoutError immediately; otherwise
keep the existing retry loop and logging/attempt counting for idempotent
methods. Ensure fut is still awaited with asyncio.shield and that the error
message still includes request_id when raising TimeoutError.

Comment on lines +455 to +463
if mtype == "register" and data.get("role") == "extension":
bridge.set_extension(websocket)
if not registered_once:
print("[server] extension registered")
registered_once = True
else:
print("[server] extension re-registered")
await websocket.send(json.dumps({"type": "register_ack", "ts": int(asyncio.get_running_loop().time() * 1000)}))
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

WebSocket extension registration is unauthenticated.

Any local process (or webpage reaching localhost websocket) can send {"type":"register","role":"extension"} and hijack bridge state. Add an auth token check during registration.

🔐 Proposed fix (shared secret on register)
 DEFAULT_MODEL = os.getenv("OPENAI_MODEL", "gpt-5-nano")
+BRIDGE_AUTH_TOKEN = os.getenv("BRIDGE_AUTH_TOKEN", "").strip()
@@
       if mtype == "register" and data.get("role") == "extension":
+        token = str(data.get("token", "")).strip()
+        if not BRIDGE_AUTH_TOKEN or token != BRIDGE_AUTH_TOKEN:
+          await websocket.close(code=1008, reason="unauthorized")
+          return
         bridge.set_extension(websocket)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/esp32_sim_server.py` around lines 455 - 463, Registration is
currently unauthenticated: when handling a incoming message with mtype ==
"register" and data.get("role") == "extension" the code calls
bridge.set_extension(websocket) and sets registered_once without validating a
token. Add an auth token check (e.g. compare data.get("token") to a server-side
secret/config) before calling bridge.set_extension; if the token is missing or
invalid, send a register_nack (or error) over websocket and close the connection
instead of registering. Update the registration branch that references mtype,
data, bridge.set_extension, registered_once, and the websocket.send call to
perform this validation and early reject unauthorized attempts.

Comment on lines +13 to +17
"host_permissions": [
"<all_urls>",
"ws://*/*",
"wss://*/*"
],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Default all-sites access/injection is too broad for first-run behavior.

Line [14] plus Line [39] gives universal page access and auto-injection, which materially increases sensitive-data exposure risk. Prefer opt-in host permissions and inject only after explicit user action/session enablement.

Also applies to: 37-42

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/manifest.json` around lines 13 - 17, The manifest currently grants
universal host access via "host_permissions" and auto-injects on all pages (the
"<all_urls>" entries and content script "matches") — change this to require
explicit opt-in: remove "<all_urls>" from "host_permissions" and instead list
only necessary default domains (or leave empty) and add "<all_urls>" to
"optional_permissions"; remove or narrow the "matches" pattern in
"content_scripts" so scripts are not auto-injected on every site, and implement
programmatic injection using chrome.permissions.request (or
browser.permissions.request) followed by chrome.scripting.executeScript (or
equivalent) after the user explicitly enables the session/feature; update any
background/extension UI flow to request the optional permission when the user
opts in and only then perform injection.

@IRONICBo IRONICBo added the P2 Priority: Tier 3 - Core functionality extensions label Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request P2 Priority: Tier 3 - Core functionality extensions ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add browser-use integration

1 participant