Add drag-and-drop, set_slider, hover persistence, and scroll-clip fix by softpudding · Pull Request #58 · softpudding/OpenBrowser

softpudding · 2026-04-13T13:50:47Z

Summary

drag_and_drop_element: 2-phase commit interaction for dragging elements between containers (TaskFlow +10.5 score improvement)
set_slider: Universal slider control via slidable interaction hints (VidHub 15.0/15.0 all models)
Hover persistence: Maintains hover state for elements revealed by mouseover
Scroll-container clipping: isElementVisibleInScrollParent() filters out elements scrolled outside overflow:auto/scroll containers, preventing phantom highlight labels (fixes Drive Bulk Release flash regression)
API-stall retry: Auto-retries eval tests when API response gaps exceed 60s threshold
Dev reload: Vite watch mode with WebSocket auto-reload for Chrome extension development
qwen3.6-plus: Added to large model profile

Eval Results (84.76% vs 82.86% main baseline)

Model	Pass	Rate
qwen3.5-flash	27/35	77.1%
qwen3.5-plus	31/35	88.6%
qwen3.6-plus	31/35	88.6%
Total	89/105	84.76%

Test plan

Full 35-test eval pass for all 3 models
Drive Bulk Release Assets flash: FAIL 7.8 → PASS 10.0 after scroll-clip fix
TaskFlow Full Workflow: plus/3.6-plus both PASS 13.0 (was FAIL 2.5/3.0)
VidHub Comment slider: 15.0/15.0 all models
Extension dev build succeeds with watch mode
Pre-commit passes (black + prettier)
Pytest: 464 passed, 4 skipped
Extension tests: 191 passed, 0 fail

🤖 Generated with Claude Code

Implement end-to-end drag-and-drop support: element discovery via draggable/droppable element types and interaction hints, 2PC confirmation flow with container preview, and precise drop placement using relative_to/position. This addresses the 0% pass rate on taskflow_drag_and_edit eval tasks where the agent had no way to discover or execute DnD operations. Key changes: - Add draggable/droppable as element_types and interactionHints - Detection heuristics: explicit attrs, cursor:grab, parent-of-draggable - HighlightDropPreviewCommand: crops container, highlights inner elements - confirm_drag_and_drop with relative_to/position for precise placement - Fix drag script rAF hang on hidden tabs (setTimeout fallback) - Fix post-drag occlusion false positive (DOM mutates during drag) - Remove misleading offset_x/offset_y from LLM-facing tool schema - Harden eval SSE streaming with retry and configurable timeouts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…lider Three features addressing eval failures and interaction gaps: 1. Hover persistence: store last-hovered element per conversation/tab and replay hover events before confirmation screenshots, so hover-revealed UI (video controls) stays visible during click confirmation. 2. Slidable interaction hint: detect slider-like elements across three tiers (native range, ARIA role=slider, custom progress bars via structural heuristics) and annotate them with a "slidable" hint. Ancestor walk ensures leaf elements inside slider containers inherit the hint. 3. Universal set_slider: extend from native-only to three paths — native range (write value), ARIA slider (position click via aria-valuemin/max), and generic custom sliders (percentage-based position click with ancestor walk to find full-width container). Also fixes: ancestor opacity walk for visibility detection, large interactive region detection (video players), and slider/draggable exclusion to prevent conflicting hints. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Updates VISUAL_GROUNDING, INTERACTION_MODEL, and DISCOVERY_STRATEGY sections in the system prompt to cover the new DnD 2PC flow, slidable interaction hints, and droppable element discovery. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds a Vite plugin that starts a WebSocket server (port 8767) during `npm run dev`. The extension's background script connects to it in dev builds and calls chrome.runtime.reload() on each rebuild, eliminating the need to manually reload on chrome://extensions. The reload code is tree-shaken out of production builds via a __DEV__ compile-time constant. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The dynamic import via Vite's __vitePreload polyfill silently failed in MV3 service workers (polyfill references `document` which doesn't exist). Switch to static import and disable the modulepreload polyfill. Also change `npm run dev` from watch mode to one-shot build+reload. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The instruction said "board-packet" (hyphenated) but the mock thread subject uses "Board Packet" (space-separated), causing the agent to use a search term that doesn't match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…eval results - Add isElementVisibleInScrollParent() to filter elements scrolled out of view inside overflow containers, preventing phantom labels from confusing the agent (fixes Drive Bulk Release Assets flash regression) - Add API-stall detection and retry logic to evaluate_browser_agent.py so transient API timeouts trigger automatic re-queue - Add qwen3.6-plus to the large model profile - Update evaluation_report.json with merged results: 89/105 (84.76%) flash 27/35, plus 31/35, 3.6-plus 31/35 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The agent-sdk prompt changed from "Only click, keyboard_input, and select use the YELLOW stage" to "click, keyboard_input, and select return a YELLOW preview screenshot before execution." Update the test assertion to match. Also includes auto-formatting from black and prettier. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

softpudding and others added 9 commits April 12, 2026 19:50

Apply black and prettier formatting to all changed files

6661a8e

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

softpudding merged commit bb93236 into main Apr 13, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add drag-and-drop, set_slider, hover persistence, and scroll-clip fix#58

Add drag-and-drop, set_slider, hover persistence, and scroll-clip fix#58
softpudding merged 9 commits intomainfrom
fix/eval-t7-t1-drag-and-timeout

softpudding commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

softpudding commented Apr 13, 2026

Summary

Eval Results (84.76% vs 82.86% main baseline)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant