Skip to content

Commit 980f0bb

Browse files
authored
Simplify confirm actions and improve stale snapshot errors; Removed js tool completely
2 parents a594073 + 6e80a55 commit 980f0bb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+1900
-858
lines changed

AGENTS.md

Lines changed: 5 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,7 @@ OpenBrowser/
3939
| Highlight tool | `server/agent/tools/highlight_tool.py` | HighlightTool for element discovery |
4040
| Element interaction | `server/agent/tools/element_interaction_tool.py` | ElementInteractionTool with 2PC flow |
4141
| Dialog tool | `server/agent/tools/dialog_tool.py` | DialogTool for dialog handling |
42-
| JavaScript tool | `server/agent/tools/javascript_tool.py` | JavaScriptTool for fallback execution |
43-
| ToolSet aggregator | `server/agent/tools/toolset.py` | OpenBrowserToolSet aggregates all 5 tools |
42+
| ToolSet aggregator | `server/agent/tools/toolset.py` | OpenBrowserToolSet aggregates all 4 tools |
4443
| Extension entry | `extension/src/background/index.ts` | Command handler, dialog processing |
4544
| Dialog manager | `extension/src/commands/dialog.ts` | CDP dialog events, cascading |
4645
| JavaScript execution | `extension/src/commands/javascript.ts` | CDP Runtime.evaluate, dialog race |
@@ -154,26 +153,15 @@ OpenBrowser uses Jinja2 templates for agent prompts, enabling dynamic content in
154153
### Template Structure
155154
- **Location**: `server/agent/prompts/` directory
156155
- **Format**: `.j2` extension with Jinja2 syntax
157-
- **5 Tool Templates**: Each of the 5 focused tools has its own template:
156+
- **4 Tool Templates**: Each of the 4 focused tools has its own template:
158157
- `tab_tool.j2` - Tab management documentation
159158
- `highlight_tool.j2` - Element discovery with color coding
160159
- `element_interaction_tool.j2` - 2PC flow with orange confirmations
161160
- `dialog_tool.j2` - Dialog handling
162-
- `javascript_tool.j2` - JavaScript fallback
163-
164-
### Dynamic JavaScript Control
165-
The `javascript_execute` command can be disabled via environment variable:
166-
```bash
167-
export OPEN_BROWSER_DISABLE_JAVASCRIPT_EXECUTE=1
168-
```
169-
When disabled:
170-
- Template removes all `javascript_execute` references using `{% if not disable_javascript %}` conditionals
171-
- `OpenBrowserAction.type` description excludes `'javascript_execute'`
172-
- Command execution returns error if attempted
173161

174162
### Template Features
175163
- **Conditional rendering**: Use `{% if %}` blocks for configurable sections
176-
- **Variable injection**: Pass context variables like `disable_javascript` at render time
164+
- **Variable injection**: Pass context variables like model profile flags at render time
177165
- **Clean output**: `trim_blocks=True` and `lstrip_blocks=True` remove extra whitespace
178166
- **Caching**: Templates are cached after first load for performance
179167

@@ -246,34 +234,18 @@ Elements are identified by a 6-character hash string:
246234
| `scroll_element` | Scroll by element ID | `{element_id: "m5k2p8", direction: "down"}` |
247235
| `keyboard_input` | Type into element | `{element_id: "j4n7q1", text: "hello"}` |
248236

249-
### Tool Mapping (5-Tool Architecture)
250-
The visual interaction workflow is implemented across 5 focused tools:
237+
### Tool Mapping (4-Tool Architecture)
238+
The visual interaction workflow is implemented across 4 focused tools:
251239

252240
| Tool | Commands | Purpose |
253241
|------|----------|---------|
254242
| `tab` | `tab init`, `tab open`, `tab close`, `tab switch`, `tab list`, `tab refresh`, `tab view`, `tab back`, `tab forward` | Session and tab management |
255243
| `highlight` | `highlight_elements` | Element discovery with blue overlays |
256244
| `element_interaction` | `click_element`, `confirm_click_element`, `hover_element`, `scroll_element`, `keyboard_input`, `confirm_keyboard_input`, `select_element` | Element interaction with 2PC only for click and keyboard input |
257245
| `dialog` | `handle_dialog` | Dialog handling (accept/dismiss) |
258-
| `javascript` | `javascript_execute` | JavaScript fallback execution |
259246

260247
## UNIQUE PATTERNS
261248

262-
### JavaScript-First Automation (Fallback)
263-
For complex interactions not covered by visual commands:
264-
```javascript
265-
// Click by visible text (universal pattern)
266-
(() => {
267-
const text = 'YOUR_TEXT';
268-
const leaf = Array.from(document.querySelectorAll('*'))
269-
.find(el => el.children.length === 0 && el.textContent.includes(text));
270-
if (!leaf) return 'not found';
271-
const target = leaf.closest('a, button, [role="button"]') || leaf;
272-
target.click();
273-
return 'clicked: ' + target.tagName;
274-
})()
275-
```
276-
277249
### Multi-Session Tab Isolation
278250
- `tab init <url>` creates managed session with tab group
279251
- `conversation_id` ties all commands to session

eval/evaluate_browser_agent.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -546,6 +546,7 @@ def cleanup_managed_tabs(self, conversation_id: str) -> bool:
546546

547547
return all_closed
548548

549+
549550
class EvalServerClient:
550551
"""Client for evaluation server tracking API"""
551552

@@ -609,15 +610,17 @@ def start_openbrowser(self) -> bool:
609610
return True
610611

611612
root_dir = EVAL_DIR.parent
612-
logger.error(f"""
613+
logger.error(
614+
f"""
613615
❌ OpenBrowser server is not running!
614616
Please start the OpenBrowser server manually with:
615617
616618
cd {root_dir}
617619
uv run local-chrome-server serve
618620
619621
The server should start on port 8765 (REST API) and 8766 (WebSocket).
620-
""")
622+
"""
623+
)
621624
return False
622625

623626
except Exception as e:
@@ -634,7 +637,8 @@ def start_eval_server(self) -> bool:
634637

635638
eval_dir = EVAL_DIR
636639
root_dir = EVAL_DIR.parent
637-
logger.error(f"""
640+
logger.error(
641+
f"""
638642
❌ Eval server is not running!
639643
Please start the eval server manually with:
640644
@@ -646,7 +650,8 @@ def start_eval_server(self) -> bool:
646650
uv run python eval/server.py
647651
648652
The server should start on port 16605.
649-
""")
653+
"""
654+
)
650655
return False
651656

652657
except Exception as e:
@@ -1397,6 +1402,7 @@ def _check_count_min_condition(
13971402

13981403
def _event_matches_expected(self, event: Dict, expected: Dict) -> bool:
13991404
"""Check if a track event matches expected criteria"""
1405+
14001406
def normalize_text(value: Any) -> str:
14011407
return str(value or "").casefold()
14021408

0 commit comments

Comments
 (0)