Replay a user's routine with OpenBrowser by softpudding · Pull Request #54 · softpudding/OpenBrowser

softpudding · 2026-04-08T15:02:28Z

No description provided.

Introduce a first end-to-end recording workflow across the server, extension, and frontend. This adds a recording manager plus REST routes for creating, listing, stopping, and appending recording sessions, and wires a new recording_control command through the processor so the server can drive recorder state in the browser extension. Add extension-side recording support with a background recorder module, content-script event capture for trusted user interactions, and tab lifecycle tracking. Exclude the OpenBrowser UI itself from recording so the recorder only captures the target workflow instead of localhost:8765 control interactions. Add a dedicated recording panel in the frontend with a header-level Record entry point, live event polling, recording summaries, and event detail inspection. Fix the recording event list layout so long traces keep full-height cards inside an internally scrollable panel instead of collapsing into thin horizontal rows. Verification: pytest server/tests/unit/test_recording_routes.py server/tests/unit/test_api_uuid.py; npm run build (extension); node --check on the extracted frontend script.

Add a dedicated recording workflow that is separate from task chat and can launch recordings in an isolated browser window. The backend now supports recording launch modes, recording start/stop control, and test coverage for the new API behavior. Extend the extension recorder to track scoped tabs, browser-level navigation events, semantic container context for recorded elements, and keyframe screenshots for actionable events. Recording review UI now lives in a standalone panel, shows captured events and keyframe previews, and excludes the OpenBrowser app itself from recorded activity. Document the screenshot finding in AGENTS.md and codify the final recording rule: page_view is a lifecycle signal only and must not capture keyframes. Startup or refresh page_view screenshots were reproduced to shrink the live Chrome page into the top-left corner, while tab_ready remained safe for startup snapshots.

Rebuild the recording panel timeline around page-side event timestamps instead of raw persistence order so click events with keyframes no longer appear artificially late. Fold near-adjacent focus and ambient scroll events into the surrounding click card when they refer to the same element and tab, while still exposing the supporting events in the details panel. Also add the missing normalizeWhitespace helper used by the new timeline grouping logic.

…ompilation pipeline Restructure the recording panel into three distinct phases: 1. Record — live capture with event list and controls 2. Review — inspect trace, add intent note, continue to compile 3. Compile — interactive compiler agent session with streaming log, Q&A, and SOP output Backend: implement compiler agent with SSE streaming via background thread + queue pattern, add trace_viewer/file/submit_workflow tools, compile and compile/answer endpoints, recording metadata update support, and intent note persistence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Frontend: Compress recording panel chrome (smaller header, stepper, toolbar) and event cards so 5-6 events fit in the list. Move intent note out of the right detail pane into a footer row so the EVENT DETAIL pane has full vertical space, and open the raw JSON details by default. Fix wheel scrolling on the EVENT DETAIL pane by switching the nested flex chain (.recording-phase-content → .recording-view → .recording-split-layout → .recording-detail-content) from flex: 1 1 auto to flex: 1 1 0 with explicit overflow: hidden, so the inner scroller is hard-bounded by its parent's track height. Compiler agent: Persist conversation traces to ~/.openbrowser/compiler_traces/{recording_id}_{timestamp}.json on completion, error, or clarification, with long base64 strings truncated, and surface the trace path through SSE results so failures can be replayed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The new skill/claude/open-browser/ tree mirrors the codex flavor but is tuned for Claude Code's execution model: - send_task.py no longer truncates [message:assistant], [thought], or [observation] lines, so the final agent answer always lands in the conversation log intact. - The 17 KB SystemPromptEvent is collapsed to a single "[system_prompt] suppressed (N chars)" line by default; pass --show-system-prompt to opt back in. - New --conversation-id flag lets follow-up turns reuse an existing browser session instead of always creating a fresh conversation. - SKILL.md drops the "background + sleep + tail" guidance and points at Claude Code's native run_in_background Bash option, with foreground SSE streaming as the default. - references/ refresh the script paths and add a "final assistant message looks cut off" troubleshooting entry plus the NO_PROXY="127.0.0.1,localhost" tip for proxied environments. Verified end-to-end against a real example.com task: full assistant message arrives untruncated, system prompt is suppressed, and the log shrinks from hundreds of lines to ~8 for a one-step task. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e dumps in AGENTS.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

submit_workflow no longer flips the conversation to FINISHED. After it validates, the agent gets one more turn to send a plain-language wrap-up message and then the loop ends naturally. _collect_result detects this state via _detect_review_state (walks events for the latest successful submit + the most recent agent text after it) and returns status:"review", keeping the session alive so the user can either finalize or send revision feedback that triggers another submit cycle. Adds POST /recordings/{id}/compile/finalize wired to a new finalize_compiler_session helper for the approval path. The frontend gains a green review block (wrap-up summary, finalize button, revision textbox), auto-scrolls and flashes it when the SOP is drafted, and persists the draft on review events so navigating away doesn't lose work. max_iteration_per_run bumped to 80 to accommodate multi-round revisions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds a routines layer on top of the compile pipeline so finalized SOPs are named, persisted, and replayable from the Execute panel without re-recording. - routine_manager + /routines CRUD API (SQLite-backed, validated via the newly extracted validate_sop_markdown helper) - compile/finalize now requires a name and atomically creates a routine - Compiler review block prompts for the routine name (suggested from the SOP goal) before the finalize button enables - Execute panel gains a Saved Routines section, a routine card that can be staged in the input area, slash-command autocomplete in the task textarea, and a management modal for edit/rename/delete Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Folds the inline routine chip list behind a single launcher button that opens a modal dialog, paginating ten routines at a time so long lists stay scannable instead of overflowing the toolbar. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Compiler agent was assuming OpenBrowser's select action matched options by visible label, so SOPs would name the human-readable text and the runtime would fail to find a match. Sharpens the compiler tool prompt and the select command description so SOPs always quote the literal option.value (with the visible label as a parenthetical cue), records both value and selectedText on <select> change events, and adds a value -> exact-text -> case-insensitive-contains fallback in the extension that returns the available option inventory on miss. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tions The OpenBrowserAction base class exposed conversation_id as a regular pydantic field, so it leaked into every tool's JSON schema. The LLM occasionally filled it (e.g. mistaking it for tab_id and passing 1737540392), and the executor then overwrote its real conversation_id from the action, sending bogus routing data to the Chrome extension server and getting back HTTP 400. Mark the field as SkipJsonSchema/exclude=True so it never appears in the tool schema, and remove the executor override that read action.conversation_id — the real id is set in __call__ from conversation._state.id. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e-recording - Rename "SOP" → "Browser Routine" across docs, frontend, API, compiler agent, validators, and the routines table (with an on-startup column migration). - Put `select_element` behind the YELLOW 2PC preview alongside click and keyboard_input. New `confirm_select` action; pending state now echoes the chosen `value` so the LLM can verify it against the rendered `<option>` list. - Introduce a `mode="routine_replay"` conversation tag that flows from the API through session metadata into the system prompt and tool schemas. In replay mode, small models get a restricted highlight action that exposes `keywords` only for tokens copied verbatim from the active Routine step's `**Keywords:**` line. The compiler agent learns to emit those optional Keywords lines for stable testid-style identifiers, and the validator enforces the single bare-token rule. The highlight detector now recognises data-testid / data-test / data-cy / data-qa so those tokens can be surfaced. - Frontend always opens a fresh routine-replay conversation when the user runs a saved routine, so the replay system prompt is in force. - Add DELETE /recordings/{id} (refuses active recordings, closes any bound compiler session, drops events + session in one transaction) and a hover-revealed delete button on the recording history cards. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The submit_workflow tool description used to call **Keywords:** an optional line "only for 100% fixed elements", which contradicted the system prompt's "include whenever there's a clean candidate" stance and trained the compiler agent to write empty `**Keywords:**` boilerplate that the validator then rejected. Realign the tool description with the system prompt and explicitly instruct the agent to OMIT the line when no clean token exists. Repoint openhands-sdk/openhands-tools at the published agent-sdk commit 316612396c25e3c4396ce3282829b07399a5d30c (which adds visible-text words as a last-resort keyword candidate, matching the runtime matcher). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ef on TS - Apply black/prettier reformatting across server and extension so the branch satisfies pre-commit (no behavioral change). - Update test_base_classes::TestOpenBrowserAction to assert that conversation_id is internal-only: still settable from Python, but excluded from model_dump() and from the JSON schema exposed to the LLM. Matches the intent of 98cf819 where exclude=True was added alongside SkipJsonSchema. - Disable core `no-undef` for TS files in extension/eslint.config.mjs so DOM type references like `RequestInit` (used as type-only casts in the recorder tests) don't get flagged. TypeScript already validates these. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

All four were flagged in a Codex review of the branch diff against main. None of the fixes change public schemas; all touched code paths are covered by new or updated tests. 1. [P1] Stop-on-disconnect no longer strands recordings as ACTIVE. Previously, if the browser websocket dropped during a recording, POST /recordings/{id}/stop returned 409 without transitioning state. Because DELETE and create_recording both refuse ACTIVE rows, the user was locked out of that browser until the DB was fixed manually. The stop handler now transitions the row to STOPPED locally with a stop_reason=browser_disconnected note, and does NOT dispatch a stop command the extension can't receive. 2. [P2] finalize_compiler_session now guards on review state AND re-validates the draft. _collect_result keeps sessions alive in both "asking" and "review" states, so a client could previously finalize while the agent was still asking a clarifying question and persist a half-formed routine. The new guard: - refuses to finalize unless _detect_review_state() returns true (i.e. a successful submit_workflow observation exists) - runs validate_routine_markdown() on the draft before teardown - leaves the session alive on validation failure so the user can send revision feedback via /compile/answer instead of being stranded This mirrors the validation the /routines create/update paths already run via _validate_or_raise. 3. [P2] /recordings/{id}/events now rejects writes once the row leaves ACTIVE. Keyframe capture in the extension runs async, so an /events POST started before /stop could land after /stop finished — letting a trace the user had already reviewed or compiled change underneath them. The handler now returns 409 with the current status when the session is no longer ACTIVE. 4. [P2] <select> resolution drops the substring fallback. The previous resolveOption() used a case-insensitive .includes() fallback as a third-choice match, which silently picked the first option whose label contained the requested token. On filters/screeners with overlapping labels (e.g. several "Market cap over ..." choices), this caused select_element to mutate page state against an arbitrary option without surfacing the ambiguity. Matching is now exact-only on option.value and trimmed option.text, restoring the intent of b18824c ("Teach compiler and runtime that <select> matches by option value"). The error path already reports the full inventory so callers can retry with the correct value. Tests: - test_recording_routes.py: - test_stop_recording_handles_disconnected_browser_by_stopping_locally replaces the old "rejects disconnected browser" test and verifies the new local-stop path, metadata note, and that no extension command is dispatched. - test_append_recording_event_rejects_non_active_recording verifies the new 409 on writes to a stopped session and that the trace is untouched. - test_compiler_agent_finalize.py (new): - asking-state rejection (session stays alive) - invalid-markdown rejection (session stays alive) - happy-path finalize tears down the session and returns a completed routine doc populated from validate_routine_markdown's summary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Three small fixes uncovered while running the qwen3.5 eval: - Emit usage_metrics before complete in both SSE worker paths (server/agent/api.py, server/core/browser_executor_bundle.py). The streamer drains and breaks right after yielding complete, so anything queued after it can race and be dropped — that left the eval logging "no usage_metrics event received" and the frontend showing all-zero usage stats. - Always render cost in RMB on the frontend; this project accounts in RMB across the board, so the USD/¥ branching was unnecessary. - Reset .main-terminal scrollTop on advanced-mode toggle. The shell is position:absolute inside main-terminal, so a leftover scrollTop (carried over when overflow flips from auto to hidden) pushed the panel above the visible viewport. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

softpudding and others added 23 commits April 4, 2026 01:22

Improve recording review UI and cleanup flow

f3e82b1

Adjust recording keyframe capture policy

5cbf909

Refine recording review UX and annotate keyframes

5a60eaa

Use full-size screenshot capture and agent-only workflow drafts

617aaa9

Capture pre-action recording keyframes and compile semantic scrolls

ebf0cbd

Refine recording semantics and simplify review flow

757f525

Document server test commands, vendored SDK layout, and compiler trac…

ad5f984

…e dumps in AGENTS.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

softpudding merged commit 6f02ded into main Apr 9, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replay a user's routine with OpenBrowser#54

Replay a user's routine with OpenBrowser#54
softpudding merged 23 commits intomainfrom
codex/recording-mode-foundation

softpudding commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

softpudding commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant