Skip to content

Sync#2

Open
metehanozdev wants to merge 725 commits intoemregucerr:mainfrom
browserbase:main
Open

Sync#2
metehanozdev wants to merge 725 commits intoemregucerr:mainfrom
browserbase:main

Conversation

@metehanozdev
Copy link
Collaborator

why

what changed

test plan

tkattkat and others added 29 commits December 4, 2025 11:15
# why

adds support for using claude 4.5 opus with cua 

# what changed

added opus to model maps 

# test plan






<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds support for Anthropic Claude 4.5 Opus in CUA. Registers
anthropic/claude-opus-4-5-20251101 and maps it to the Anthropic
provider.

<sup>Written for commit 2e54c27.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
## 🤖 Installing Claude Code GitHub App

This PR adds a GitHub Actions workflow that enables Claude Code
integration in our repository.

### What is Claude Code?

[Claude Code](https://claude.com/claude-code) is an AI coding agent that
can help with:
- Bug fixes and improvements  
- Documentation updates
- Implementing new features
- Code reviews and suggestions
- Writing tests
- And more!

### How it works

Once this PR is merged, we'll be able to interact with Claude by
mentioning @claude in a pull request or issue comment.
Once the workflow is triggered, Claude will analyze the comment and
surrounding context, and execute on the request in a GitHub action.

### Important Notes

- **This workflow won't take effect until this PR is merged**
- **@claude mentions won't work until after the merge is complete**
- The workflow runs automatically whenever Claude is mentioned in PR or
issue comments
- Claude gets access to the entire PR or issue context including files,
diffs, and previous comments

### Security

- Our Anthropic API key is securely stored as a GitHub Actions secret
- Only users with write access to the repository can trigger the
workflow
- All Claude runs are stored in the GitHub Actions run history
- Claude's default tools are limited to reading/writing files and
interacting with our repo by creating comments, branches, and commits.
- We can add more allowed tools by adding them to the workflow file
like:

```
allowed_tools: Bash(npm install),Bash(npm run build),Bash(npm run lint),Bash(npm run test)
```

There's more information in the [Claude Code action
repo](https://github.com/anthropics/claude-code-action).

After merging this PR, let's try mentioning @claude in a comment on any
PR to get started!





<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds GitHub Actions to integrate Claude Code for automated PR reviews
and comment-triggered help. Enables @claude to review code and perform
tasks using repository context.

- **New Features**
- claude.yml: Runs when @claude is mentioned in issue/PR comments or
reviews, or in issue title/body; uses anthropics/claude-code-action@v1
with actions: read to access CI; requires secrets.ANTHROPIC_API_KEY.
- claude-code-review.yml: Auto-reviews PRs on open/sync for quality,
bugs, performance, security, and tests; posts feedback via gh; uses
claude.md for guidance; includes optional filters and limited allowed
tools.

- **Migration**
  - Add ANTHROPIC_API_KEY to repository secrets.
  - Merge this PR, then mention @claude in a PR or issue to trigger.
- Optional: adjust file path filters, author filters, or allowed tools.

<sup>Written for commit d7e4303.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: Miguel <36487034+miguelg719@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Keeps `@\claude` support but drops the auto PR reviews by claude, we
already have plenty of auto-review feedback from greptile and cubic.

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Removed the Claude PR review GitHub Action to stop automatic reviews.
Keeps @claude support and reduces duplicate bot feedback, since greptile
and cubic already provide auto-review.

<sup>Written for commit 740d927.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
# why

After the transition to v3, the model handling for agent evals was not
updated to account for new model formats

# what changed

- added isCua flag and two separate model maps to allow for models that
can be ran with cua and non
- adjusted model handling to properly parse cua models 
- added tag to distinguish if the run is using cua or non 

# test plan
- tested evals for cua, and non cua 


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Updated the agent evals CLI to support and correctly run both CUA and
non-CUA agent models in v3. Fixes agent model parsing and enables mixed
eval runs.

- **New Features**
- Split agent models into standard and CUA lists; added
getAgentModelEntries with a cua flag.
- Passed isCUA through EvalInput to initV3 and tasks; selects a safe
internal model for handlers when CUA.
- Improved provider lookup and error messages for CUA models using short
names; testcases now tag models as "cua" or "agent".

<sup>Written for commit 13b906c.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
# why
- to clean up the actHandler before #1330 




<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Refactors actHandler to centralize LLM action parsing and execution,
reduce duplication, and improve metrics reporting. Behavior stays the
same, with clearer naming and more reliable two-step and fallback flows.

## Why:
- Reduce duplicated LLM calls and normalization logic.
- Improve readability and maintainability.
- Ensure consistent metrics and variable substitution.
- Make the self-heal/fallback path more robust.

## What:
- Renamed actFromObserveResult to takeDeterministicAction and updated
all call sites (ActCache, AgentCache, v3).
- Added getActionFromLLM for inference, metrics, normalization, and
variable substitution.
- Added recordActMetrics to centralize ACT metrics reporting.
- Extracted normalizeActInferenceElement and
substituteVariablesInArguments helpers.
- Simplified two-step act flow and fallback retry using shared helpers.
- Kept existing behavior (selector normalization, variable substitution,
retries).

## Test Plan:
- [ ] Run unit tests for actHandler to confirm no regressions.
- [ ] Verify single-step actions execute as before.
- [ ] Verify two-step flow triggers when LLM returns twoStep and
executes the second action.
- [ ] Confirm fallback self-heal path updates selector and retries
successfully.
- [ ] Check metrics are recorded once per inference call in both steps
and fallback.
- [ ] Validate variable substitution replaces %key% tokens in action
arguments.
- [ ] Exercise AgentCache and ActCache paths to ensure
takeDeterministicAction works end-to-end.
- [ ] Build passes and type checks for all renamed method references.

<sup>Written for commit 08d8454.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
great catch from @loic-carbonne
# why
- currently it is not possible to rerun a cached agent run with a
different prompt
- therefore, this docs example is misleading
# what changed
- removed misleading example

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Removed the incorrect docs example that suggested cached agent workflows
can be reused with different inputs. This aligns the deterministic agent
page with current behavior where each instruction generates a new cache
key, so runs cannot be rerun with a different prompt.

<sup>Written for commit 4908805.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->

Co-authored-by: Loïc Carbonne <loic.carbonne.mail@gmail.com>
…1330)

# why
- async functions invoked by act, extract, and observe all continued to
run even after the timeout was reached
# what changed
- this PR introduces a time remaining check mechanism which runs between
each major IO operation inside each of the handlers
- this ensures that user defined timeout are actually respected inside
of act, extract, and observe
# test plan
- added tests to confirm that internal async functions do not continue
running after the timeout is reached



















<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes act, extract, and observe to truly honor the timeout parameter
with step-wise guards that abort early and return clear errors.
Deterministic actions now use the same guard path in v3.

- **Bug Fixes**
- Added createTimeoutGuard and specific ActTimeoutError,
ExtractTimeoutError, and ObserveTimeoutError (exported).
- Replaced Promise.race with per-step checks across snapshot capture,
LLM inference, action execution, and self-heal retries.
- Enforced per-step timeouts in ActHandler.takeDeterministicAction;
metrics unchanged.
- Wired v3 deterministic actions to pass a timeout guard; shadow DOM and
unsupported actions behavior unchanged.

<sup>Written for commit d6bbfb8.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: miguel <miguelg71921@gmail.com>
Co-authored-by: Miguel <36487034+miguelg719@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
# why

our slack link expired 

# what changed

updated slack invite link 
# test plan


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Replaced the expired Slack invite link with a new working one. Updated
the core README and contributing docs so contributors can join the
community without broken links.

<sup>Written for commit 9f0b262.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
# why

Users don't know about the v2/v3 version toggle in the docs navigation.

# what changed

Added a banner at the top of the v3 docs pages to help users easily
discover Stagehand Python (v2).

# test plan
n/a




<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Added a reusable banner to the top of all v3 docs pages to highlight the
Stagehand Python (v2) option. Improves discoverability of the v2/v3
toggle and reduces confusion.

- **New Features**
- Added V3Banner MDX snippet linking to “/v2/first-steps/introduction”.
- Imported and rendered the banner across v3 Basics, Best Practices,
Configuration, First Steps, Integrations, Migrations, and References
pages.
- Minor metadata/formatting updates in v2 docs (e.g., User Data
frontmatter) for consistency.

<sup>Written for commit 515a13d.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
# why
Anthropic agents in CUA mode are unable to issue key presses (not to be
confused with `type` actions)

# what changed
The format for the anthropic tool `computer_20250124` replies with:
```ts
{
  "action":"key",
  "text":"BackSpace"
}
```
wasn't properly mapped to our internal action abstraction: `keypress`,
which accepts parameter `keys`. It was issued directly from the
anthropic format. Updated `AnthropicCUAClient.ts` to account for this
and map appropriately

# test plan
- [x] Tested on sample eval






<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes key action mapping in Anthropic CUA so agents can send key presses
(e.g., Backspace) correctly instead of failing on the "key" action.

- **Bug Fixes**
- Map Anthropic "key" to internal "keypress" and pass keys from
input.text.
- Remove the old "key" path and Playwright key mapping to avoid
mismatches.

<sup>Written for commit b9716b9.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: Sean McGuire <75873287+seanmcguire12@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
# why

The banner was hard-coded for light mode only

# what changed

<img width="706" height="316" alt="image"
src="https://github.com/user-attachments/assets/64fadf31-a96e-43ae-b435-7082db9b6a64"
/>

<img width="707" height="314" alt="image"
src="https://github.com/user-attachments/assets/515ab34a-f040-4574-89bf-7c2d621a63e6"
/>

# test plan


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixed the V3 docs banner to support light mode while preserving dark
mode styling. Added light-theme border, background, and text colors with
dark: variants and aligned link hover states to improve readability.

<sup>Written for commit 14ab04f.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
…rve/extract, CLICK/HOVER/SCROLL, and CDP (#1283)

# why

Clarify where the execution flow goes when stagehand runs by showing
more detailed logs.

<img width="1443" height="529" alt="image"
src="https://github.com/user-attachments/assets/1c85f91e-de94-46c3-8226-fe42d4c3e338"
/>


# what changed

Adds a log line printed at the beginning and end of each layer's
execution:

1. 🅰 Agent TASK: top-level user intent: when agent.execute('<intent
here>') is called (the initial entrypoint)
  2. 🆂 Stagehand STEP: any call to .act(...) .extract() or .observe()
3. 🆄 Understudy ACTION: any playwright or browser interaction api action
dispatched, e.g. CLICK, HOVER, SCROLL, etc.
4. 🧠 LLM req/resp, 🅲 CDP CALL/Event: any LLM calls or CDP websocket msgs
to/from the browser

Log lines are written to
`./.browserbase/sessions/{sessionId}/{agent,stagehand,understudy,cdp}.log`
at runtime, and can be followed in a single unified screen by doing:

`tail -f ./.browserbase/sessions/latest/*.log`

# test plan

Test by running:

```bash

# (make sure `OPENAI_API_KEY` and `ANTHROPIC_API_KEY` are both set in env too)
export BROWSERBASE_CONFIG_DIR=./.browserbase

nano packages/core/examples/flowLoggingJourney.ts  # paste in contents (it's just a basic test of the main apis)

pnpm tsx packages/core/examples/flowLoggingJourney.ts & 
tail -f ./.browserbase/sessions/latest/*
```

`flowLoggingJourney.ts`:
```typescript
import { Stagehand } from "../lib/v3";

async function run(): Promise<void> {
  const openaiKey = process.env.OPENAI_API_KEY;
  const anthropicKey = process.env.ANTHROPIC_API_KEY;

  if (!openaiKey || !anthropicKey) {
    throw new Error(
      "Set both OPENAI_API_KEY and ANTHROPIC_API_KEY before running this demo.",
    );
  }

  const stagehand = new Stagehand({
    env: "LOCAL",
    verbose: 2,
    model: { modelName: "openai/gpt-4.1-mini", apiKey: openaiKey },
    localBrowserLaunchOptions: {
      headless: true,
      args: ["--window-size=1280,720"],
    },
    disablePino: true,
  });

  try {
    await stagehand.init();

    const [page] = stagehand.context.pages();
    await page.goto("https://example.com/", { waitUntil: "load" });

    // Test standard agent path
    const agent = stagehand.agent({
      systemPrompt:
        "You are a QA assistant. Keep answers short and deterministic. Finish quickly.",
    });
    const agentResult = await agent.execute(
      "Glance at the Example Domain page and confirm that you see the hero text.",
    );
    console.log("Agent result:", agentResult);

    // Test CUA (Computer Use Agent) path
    await page.goto("https://example.com/", { waitUntil: "load" });
    const cuaAgent = stagehand.agent({
      cua: true,
      model: {
        modelName: "anthropic/claude-sonnet-4-5-20250929",
        apiKey: anthropicKey,
      },
    });
    const cuaResult = await cuaAgent.execute({
      instruction: "Click on the 'More information...' link on the page.",
      maxSteps: 3,
    });
    console.log("CUA Agent result:", cuaResult);

    const observations = await stagehand.observe("Find any links on the page");
    console.log("Observe result:", observations);

    if (observations.length > 0) {
      await stagehand.act(observations[0]);
    } else {
      await stagehand.act("click the link on the page");
    }

    const extraction = await stagehand.extract(
      "Summarize the current page title and contents in a single sentence",
    );
    console.log("Extraction result:", extraction);
  } finally {
    await stagehand.close({ force: true }).catch(() => {});
  }
}

run().catch((error) => {
  console.error(error);
  process.exitCode = 1;
});
```

EXPECTED OUTPUT:
```bash
2025-12-08 12:20:26.23300 ⤑ ⤑ [🆄 #694a GOTO] ▷ Page.goto({args:[https://example.com/,{waitUntil:load}]})
2025-12-08 12:20:26.23401 ⤑ ⤑ [🆄 #694a GOTO] [🅲 #FE7B CDP] ⏵ Page.navigate({url:https://example.com/})
2025-12-08 12:20:26.26402 ⤑ ⤑ [🆄 #694a GOTO] [🅲 #FE7B CDP] ⏴ Page.frameStartedNavigating({frameId:8A6B…FE7B,u…rId:F41F…7B31,navigationType:differentDocument})
2025-12-08 12:20:26.26403 ⤑ ⤑ [🆄 #694a GOTO] [🅲 #FE7B CDP] ⏴ Page.frameStartedLoading({frameId:8A6B…FE7B})
2025-12-08 12:20:26.57304 ⤑ ⤑ [🆄 #694a GOTO] [🅲 #FE7B CDP] ⏵ Page.setLifecycleEventsEnabled({enabled:true})
2025-12-08 12:20:26.57605 ⤑ ⤑ [🆄 #694a GOTO] [🅲 #FE7B CDP] ⏴ Page.frameNavigated({frame:{id:8A6B…FE7B,loaderI…tIsolated,gatedAPIFeatures:[]},type:Navigation})
2025-12-08 12:20:26.57706 ⤑ ⤑ [🆄 #694a GOTO] [🅲 #FE7B CDP] ⏴ Network.policyUpdated({})
2025-12-08 12:20:26.57807 ⤑ ⤑ [🆄 #694a GOTO] [🅲 #FE7B CDP] ⏴ Runtime.consoleAPICalled({type:info,args:[{type:…ptId:5,url:",lineNumber:0,columnNumber:2837}]}})
2025-12-08 12:20:26.57908 ⤑ ⤑ [🆄 #694a GOTO] [🅲 #FE7B CDP] ⏴ Page.domContentEventFired({timestamp:545864.312948})
2025-12-08 12:20:26.58009 ⤑ ⤑ [🆄 #694a GOTO] [🅲 #FE7B CDP] ⏴ Page.loadEventFired({timestamp:545864.313355})
2025-12-08 12:20:26.58110 ⤑ ⤑ [🆄 #694a GOTO] [🅲 #FE7B CDP] ⏴ Page.frameStoppedLoading({frameId:8A6B…FE7B})
2025-12-08 12:20:26.58311 ⤑ ⤑ [🆄 #694a GOTO] [🅲 #FE7B CDP] ⏵ Runtime.evaluate({expression:document.readyState,contextId:2,returnByValue:true})
2025-12-08 12:20:26.58412 ⤑ ⤑ [🆄 #694a GOTO] ✓ GOTO completed in 0.35s
2025-12-08 12:20:26.58513 [🅰 #1d66] ▷ Agent.execute(Glance at the Example Domain page and confirm that you see the hero text.)
2025-12-08 12:20:26.59314 [🅰 #1d66] ⤑ [🧠 #21e1 LLM] gpt-4.1-mini ⏴ user: Glance at the Example Domain page and confirm that you see the hero text. +{10 tools}
2025-12-08 12:20:29.44715 [🅰 #1d66] ⤑ [🧠 #21e1 LLM] gpt-4.1-mini ↳ ꜛ688 ꜜ12 | tool call: ariaTree()
2025-12-08 12:20:29.44816 [🅰 #1d66] [🆂 #9ac4 EXTRACT] ▷ Stagehand.extract()
2025-12-08 12:20:29.45317 [🅰 #1d66] [🆂 #9ac4 EXTRACT] ⤑ [🅲 #FE7B CDP] ⏵ DOM.getDocument({depth:-1,pierce:true})
2025-12-08 12:20:29.46018 [🅰 #1d66] [🆂 #9ac4 EXTRACT] ⤑ [🅲 #FE7B CDP] ⏵ Accessibility.getFullAXTree({frameId:8A6B…FE7B})
2025-12-08 12:20:29.46419 [🅰 #1d66] [🆂 #9ac4 EXTRACT] ✓ EXTRACT completed in 0.02s
2025-12-08 12:20:29.46520 [🅰 #1d66] ⤑ [🧠 #03a1 LLM] gpt-4.1-mini ⏴ tool result: ariaTree(): Accessibility Tre…7] paragraph [0-18] link: Learn more +{10 tools}
2025-12-08 12:20:32.21321 [🅰 #1d66] ⤑ [🧠 #03a1 LLM] gpt-4.1-mini ↳ ꜛ806 ꜜ34 | tool call: close()
2025-12-08 12:20:32.21422 [🅰 #1d66] ✓ Agent.execute() DONE in 5.6s | 2 LLM calls ꜛ1494 ꜜ46 tokens | 6 CDP msgs
2025-12-08 12:20:32.21523 ⤑ ⤑ [🆄 #cb65 GOTO] ▷ Page.goto({args:[https://example.com/,{waitUntil:load}]})
2025-12-08 12:20:32.21524 ⤑ ⤑ [🆄 #cb65 GOTO] [🅲 #FE7B CDP] ⏵ Page.navigate({url:https://example.com/})
2025-12-08 12:20:32.25425 ⤑ ⤑ [🆄 #cb65 GOTO] [🅲 #FE7B CDP] ⏴ Page.frameStartedNavigating({frameId:8A6B…FE7B,u…rId:2130…4BDE,navigationType:differentDocument})
2025-12-08 12:20:32.25426 ⤑ ⤑ [🆄 #cb65 GOTO] [🅲 #FE7B CDP] ⏴ Page.frameStartedLoading({frameId:8A6B…FE7B})
2025-12-08 12:20:32.25727 ⤑ ⤑ [🆄 #cb65 GOTO] [🅲 #FE7B CDP] ⏵ Page.setLifecycleEventsEnabled({enabled:true})
2025-12-08 12:20:32.25828 ⤑ ⤑ [🆄 #cb65 GOTO] [🅲 #FE7B CDP] ⏴ DOM.scrollableFlagUpdated({nodeId:1,isScrollable:false})
2025-12-08 12:20:32.25929 ⤑ ⤑ [🆄 #cb65 GOTO] [🅲 #FE7B CDP] ⏴ Page.frameNavigated({frame:{id:8A6B…FE7B,loaderI…tIsolated,gatedAPIFeatures:[]},type:Navigation})
2025-12-08 12:20:32.26030 ⤑ ⤑ [🆄 #cb65 GOTO] [🅲 #FE7B CDP] ⏴ Network.policyUpdated({})
2025-12-08 12:20:32.26031 ⤑ ⤑ [🆄 #cb65 GOTO] [🅲 #FE7B CDP] ⏴ DOM.documentUpdated({})
2025-12-08 12:20:32.26032 ⤑ ⤑ [🆄 #cb65 GOTO] [🅲 #FE7B CDP] ⏴ Runtime.consoleAPICalled({type:info,args:[{type:…ptId:5,url:",lineNumber:0,columnNumber:2837}]}})
2025-12-08 12:20:32.26133 ⤑ ⤑ [🆄 #cb65 GOTO] [🅲 #FE7B CDP] ⏴ DOM.documentUpdated({})
2025-12-08 12:20:32.26134 ⤑ ⤑ [🆄 #cb65 GOTO] [🅲 #FE7B CDP] ⏴ Page.domContentEventFired({timestamp:545869.998129})
2025-12-08 12:20:32.26135 ⤑ ⤑ [🆄 #cb65 GOTO] [🅲 #FE7B CDP] ⏴ Page.loadEventFired({timestamp:545869.998762})
2025-12-08 12:20:32.26136 ⤑ ⤑ [🆄 #cb65 GOTO] [🅲 #FE7B CDP] ⏴ Page.frameStoppedLoading({frameId:8A6B…FE7B})
2025-12-08 12:20:32.26237 ⤑ ⤑ [🆄 #cb65 GOTO] [🅲 #FE7B CDP] ⏵ Runtime.evaluate({expression:document.readyState,contextId:3,returnByValue:true})
2025-12-08 12:20:32.26338 ⤑ ⤑ [🆄 #cb65 GOTO] ✓ GOTO completed in 0.05s
2025-12-08 12:20:32.26339 [🅰 #c756] ▷ Agent.execute({instruction:Click on the More information... link on the page.,maxSteps:3})
2025-12-08 12:20:32.26440 [🅰 #c756] ⤑ ⤑ [🅲 #FE7B CDP] ⏵ Page.addScriptToEvaluateOnNewDocument({source:(() => …ue });\n setTimeout(install, 100);\n }\n })();})
2025-12-08 12:20:32.26441 [🅰 #c756] ⤑ ⤑ [🅲 #FE7B CDP] ⏴ Accessibility.loadComplete({root:{nodeId:23,ignored:f…ds:[24],backendDOMNodeId:23,frameId:8A6B…FE7B}})
2025-12-08 12:20:32.26542 [🅰 #c756] ⤑ ⤑ [🅲 #FE7B CDP] ⏵ Runtime.evaluate({expression:({ w: window.innerWidth,…ntextId:3,awaitPromise:true,returnByValue:true})
2025-12-08 12:20:32.26543 [🅰 #c756] ⤑ ⤑ [🅲 #FE7B CDP] ⏵ Runtime.evaluate({expression:(() => {\n const ID = __… 100);\n }\n })();,includeCommandLineAPI:false})
2025-12-08 12:20:32.26744 [🅰 #c756] ⤑ [🧠 #2798 LLM] claude-sonnet-4-5-20250929 ⏴ Click on the More information... link on the page.
2025-12-08 12:20:36.15745 [🅰 #c756] ⤑ [🧠 #2798 LLM] claude-sonnet-4-5-20250929 ↳ ꜛ1875 ꜜ79 | Ill help you click on the More information... l tool_use:computer
2025-12-08 12:20:36.96146 [🅰 #c756] ⤑ [🆄 #f55d SCREENSHOT] ▷ Page.screenshot({args:[{fullPage:false}]})
2025-12-08 12:20:36.96447 [🅰 #c756] ⤑ [🆄 #f55d SCREENSHOT] [🅲 #FE7B CDP] ⏵ Runtime.evaluate({expression:(() …ntextId:3,awaitPromise:true,returnByValue:true})
2025-12-08 12:20:36.96648 [🅰 #c756] ⤑ [🆄 #f55d SCREENSHOT] [🅲 #FE7B CDP] ⏵ Page.captureScreenshot({format:png,fromSurface:true,captureBeyondViewport:false})
2025-12-08 12:20:37.01149 [🅰 #c756] ⤑ [🆄 #f55d SCREENSHOT] [🅲 #FE7B CDP] ⏵ Runtime.evaluate({expression:(() …ntextId:3,awaitPromise:true,returnByValue:true})
2025-12-08 12:20:37.01250 [🅰 #c756] ⤑ [🆄 #f55d SCREENSHOT] ✓ SCREENSHOT completed in 0.05s
2025-12-08 12:20:37.01251 [🅰 #c756] ⤑ [🆄 #cce8 SCREENSHOT] ▷ Page.screenshot({args:[{fullPage:false}]})
2025-12-08 12:20:37.01352 [🅰 #c756] ⤑ [🆄 #cce8 SCREENSHOT] [🅲 #FE7B CDP] ⏵ Runtime.evaluate({expression:(() …ntextId:3,awaitPromise:true,returnByValue:true})
2025-12-08 12:20:37.01453 [🅰 #c756] ⤑ [🆄 #cce8 SCREENSHOT] [🅲 #FE7B CDP] ⏵ Page.captureScreenshot({format:png,fromSurface:true,captureBeyondViewport:false})
2025-12-08 12:20:37.04054 [🅰 #c756] ⤑ [🆄 #cce8 SCREENSHOT] [🅲 #FE7B CDP] ⏵ Runtime.evaluate({expression:(() …ntextId:3,awaitPromise:true,returnByValue:true})
2025-12-08 12:20:37.04155 [🅰 #c756] ⤑ [🆄 #cce8 SCREENSHOT] ✓ SCREENSHOT completed in 0.03s
2025-12-08 12:20:37.04156 [🅰 #c756] ⤑ [🧠 #ce80 LLM] claude-sonnet-4-5-20250929 ⏴ Current URL: https://example.com/ +{15.8kb image}
2025-12-08 12:20:44.82757 [🅰 #c756] ⤑ [🧠 #ce80 LLM] claude-sonnet-4-5-20250929 ↳ ꜛ3192 ꜜ192 | I can see a pag…ith Example Domain as the head tool_use:computer
2025-12-08 12:20:45.12958 [🅰 #c756] ⤑ [🆄 #f8c3 V3CUA.SCROLL] ▷ v3CUA.scroll({target:(644, 400),args:[{type:sc…scroll_amount:3,pageUrl:https://example.com/}]})
2025-12-08 12:20:45.12959 [🅰 #c756] ⤑ [🆄 #3fc9 SCROLL] [🅲 #FE7B CDP] ⏵ Runtime.evaluate({expression:typeof w…"undefined\"&&window.__v3Cursor.move(644, 400)})
2025-12-08 12:20:45.12960 [🅰 #c756] ⤑ [🆄 #3fc9 SCROLL] ▷ Page.scroll({args:[644,400,0,300]})
2025-12-08 12:20:45.13061 [🅰 #c756] ⤑ [🆄 #3fc9 SCROLL] [🅲 #FE7B CDP] ⏵ Input.dispatchMouseEvent({type:mouseMoved,x:644,y:400,button:none})
2025-12-08 12:20:45.13762 [🅰 #c756] ⤑ [🆄 #3fc9 SCROLL] [🅲 #FE7B CDP] ⏵ Input.dispatchMouseEvent({type:mouseW…el,x:644,y:400,button:none,deltaX:0,deltaY:300})
2025-12-08 12:20:45.14663 [🅰 #c756] ⤑ [🆄 #3fc9 SCROLL] ✓ SCROLL completed in 0.02s
2025-12-08 12:20:45.64764 [🅰 #c756] ⤑ [🆄 #ccb0 SCREENSHOT] ▷ Page.screenshot({args:[{fullPage:false}]})
2025-12-08 12:20:45.64965 [🅰 #c756] ⤑ [🆄 #ccb0 SCREENSHOT] [🅲 #FE7B CDP] ⏵ Runtime.evaluate({expression:(() …ntextId:3,awaitPromise:true,returnByValue:true})
2025-12-08 12:20:45.65266 [🅰 #c756] ⤑ [🆄 #ccb0 SCREENSHOT] [🅲 #FE7B CDP] ⏵ Page.captureScreenshot({format:png,fromSurface:true,captureBeyondViewport:false})
2025-12-08 12:20:45.68567 [🅰 #c756] ⤑ [🆄 #ccb0 SCREENSHOT] [🅲 #FE7B CDP] ⏵ Runtime.evaluate({expression:(() …ntextId:3,awaitPromise:true,returnByValue:true})
2025-12-08 12:20:45.68668 [🅰 #c756] ⤑ [🆄 #ccb0 SCREENSHOT] ✓ SCREENSHOT completed in 0.04s
2025-12-08 12:20:45.68769 [🅰 #c756] ⤑ [🆄 #87f4 SCREENSHOT] ▷ Page.screenshot({args:[{fullPage:false}]})
2025-12-08 12:20:45.68770 [🅰 #c756] ⤑ [🆄 #87f4 SCREENSHOT] [🅲 #FE7B CDP] ⏵ Runtime.evaluate({expression:(() …ntextId:3,awaitPromise:true,returnByValue:true})
2025-12-08 12:20:45.68971 [🅰 #c756] ⤑ [🆄 #87f4 SCREENSHOT] [🅲 #FE7B CDP] ⏵ Page.captureScreenshot({format:png,fromSurface:true,captureBeyondViewport:false})
2025-12-08 12:20:45.71372 [🅰 #c756] ⤑ [🆄 #87f4 SCREENSHOT] [🅲 #FE7B CDP] ⏵ Runtime.evaluate({expression:(() …ntextId:3,awaitPromise:true,returnByValue:true})
2025-12-08 12:20:45.71473 [🅰 #c756] ⤑ [🆄 #87f4 SCREENSHOT] ✓ SCREENSHOT completed in 0.03s
2025-12-08 12:20:45.71474 [🅰 #c756] ⤑ [🧠 #ed51 LLM] claude-sonnet-4-5-20250929 ⏴ Current URL: https://example.com/ +{15.8kb image}
```

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
Co-authored-by: Miguel <36487034+miguelg719@users.noreply.github.com>
# why
Stand up a Fastify Stagehand server we can reuse for thin-client SDKs
across multiple languages.

 # what changed
created new fastify server
# test Plan
- Start the Fastify server (pnpm --filter server dev or your usual
command).
- Local browser smoke: MODEL_API_KEY=... ./scripts/test_local_browser.sh
- Browserbase smoke: MODEL_API_KEY=... BROWSERBASE_API_KEY=...
BROWSERBASE_PROJECT_ID=... ./scripts/test_remote_browser.sh.
















<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds a new Fastify-based Stagehand API server exposing V3 browser
automation over REST with streaming responses and session management.
Supports both local Chrome and Browserbase, includes health/readiness
endpoints, and ships an OpenAPI spec.

- **New Features**
- New packages/server with REST routes: start, navigate, observe, act,
extract, agentExecute, end (streaming logs/results)
- In-memory LRU session store with TTL, lazy V3 init, and cleanup on end
  - Local and Browserbase browsers; credentials passed via headers
- Health (/healthz) and readiness (/readyz), metrics, and structured
request logging
  - OpenAPI v3 spec and README
  - Removed v2 code and DB dependency; auth currently disabled

- **Migration**
  - Run: pnpm --filter @browserbasehq/stagehand-server dev
- Required header: x-model-api-key; for Browserbase also x-bb-api-key
and x-bb-project-id

<sup>Written for commit ed1089b.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
# Agent Abort Signal and Message Continuation

## Why

Enable users to cancel long-running agent tasks and continue
conversations across multiple `execute()` calls. Also ensures graceful
shutdown when `stagehand.close()` is called by automatically aborting
any running agent tasks.

## What Changed

### New Features (behind `experimental: true`)

#### Abort Signal Support

- Pass `signal` to `agent.execute()` to cancel execution mid-run
- Works with `AbortController` and `AbortSignal.timeout()`
- Throws `AgentAbortError` when aborted

#### Message Continuation

- `execute()` now returns `messages` in the result
- Pass previous messages to continue a conversation across calls

### New Utilities

| File | Purpose |

|---------------------------------|-------------------------------------------------------------------------------------------|
| `combineAbortSignals.ts` | Merges multiple signals (uses native
`AbortSignal.any()` on Node 20+, fallback for older) |
| `errorHandling.ts` | Consolidates abort detection logic—needed because
`close()` may cause indirect errors (e.g., null context) that should
still be treated as abort |
| `validateExperimentalFeatures.ts` | Single place for all
experimental/CUA feature validation |

### CUA Limitations

Abort signal and message continuation are not supported with CUA mode
(throws `StagehandInvalidArgumentError`). This matches existing
streaming limitation.

### Tests Added

- `agent-abort-signal.spec.ts` (7 tests)
- `agent-message-continuation.spec.ts` (4 tests)
- `agent-experimental-validation.spec.ts` (17 tests)

























<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds agent abort support and conversation continuation. You can cancel
long runs, auto-abort on close, and carry messages across execute()
calls. Feature is gated behind experimental: true and has clear CUA
limitations.

- **New Features**
- Abort signal for execute() and stream() with AbortController and
AbortSignal.timeout; throws AgentAbortError; stagehand.close()
auto-aborts via an internal controller combined with any user signal.
- Message continuation: execute() returns messages and accepts previous
messages on the next call; tool calls and results are included.

- **Refactors**
- Centralized experimental/CUA validation via
validateExperimentalFeatures: CUA disallows streaming, abort signal, and
message continuation; experimental required for integrations, tools,
streaming, callbacks, signal, and messages.
- Public API updates: re-export ModelMessage; Agent types include
messages and signal; AgentAbortError exported for consistent abort
typing.

<sup>Written for commit 5276e41.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: Nick Sweeting <github@sweeting.me>
# why
Click count in CDP's
[Input.dispatchMouseEvent](https://chromedevtools.github.io/devtools-protocol/tot/Input/#method-dispatchMouseEvent)
does **not** issue multiple click events, is mainly kept for tracking.
Individual `mousePressed`/`mouseReleased` events must be sent

# what changed
Added a for loop for the `clickCount` number provided in both
`locator.click()` and `page.click()`. Also built redundancy around
`AnthropicCUAClient` double_click coordinate parsing.

# test plan
- [x] tested on https://doubleclicktest.com/
- [x] added evals site and unit tests on `click-count.spec.ts`






<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes multiple-click behavior by dispatching individual
mousePressed/mouseReleased events per click and normalizes Anthropic CUA
doubleClick coordinates. Double-clicks and multi-clicks now work
reliably via CDP and CUA.

- **Bug Fixes**
- locator.click and page.click now loop over clickCount, sending
pressed/released pairs for each click.
- AnthropicCUAClient parses doubleClick consistently and falls back to
coordinate arrays when x/y are missing.
- Added tests for single, double, and triple clicks for locator.click
and page.click.

<sup>Written for commit 26b784d.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
# why

- Google CUA agent was crashing with `Cannot read properties of
undefined (reading 'parts')`
- This can happen when the model's response is blocked due to safety
filters, rate limiting, or other API-level issues

# what changed

- Added a null check for `candidate.content` and
`candidate.content.parts` in `GoogleCUAClient.processResponse()`
- When content is missing, the agent now gracefully returns with the
finishReason logged for debugging



<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixed crash in the Google CUA agent when Gemini returns an empty or
blocked response. We now guard against missing content, log the
finishReason, and return a safe, completed response with no actions or
function calls.

<sup>Written for commit 5309757.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
# why

ci test failed due to timeout being hit on 1/3 ci runs 

unsure if this will fail again, but increasing delay to prevent in the
future

# what changed

increased timeout from 10s to 20s


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Increased the test timeout from 10s to 20s in agent-abort-signal.spec to
reduce CI flakiness and avoid false timeouts on slower runs.

<sup>Written for commit 8d3c418.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
# why

These dev dependencies don't belong here. Some are no longer used, some
should go into their respective packages

# what changed

Moved dev dependencies to respective packages and removed unused ones

# test plan










<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Moved dev dependencies from the workspace root into the packages that
use them and removed unused ones to cut install bloat.

- **Dependencies**
- Removed unused devDependencies from the root; moved required ones into
packages/core and packages/evals.
- Added missing dev deps to packages/core (@types/adm-zip, @types/node,
@types/ws, adm-zip, chalk, esbuild) and packages/evals (braintrust,
chalk, string-comparison).
  - Cleaned pnpm-lock.yaml (large reduction in entries).

<sup>Written for commit c6f6221.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
# why
- writing base64 screenshots to disk is unnecessary: screenshots do not
get replayed, so there is no sense in writing it to disk
# what changed
- added a `pruneAgentResult()` fn which prunes the screenshot entry
before it is written to disk
# test plan
- existing tests & evals should suffice for this one

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Stop writing base64 screenshots to the agent cache to reduce disk usage
and keep cache entries lean. Screenshots aren’t replayed, so pruning
them has no impact on behavior.

- **Refactors**
- Added pruneAgentResult to remove screenshot base64 blobs from actions
before persisting.
- Prunes only the cached copy; the live AgentResult returned to callers
is unchanged.

<sup>Written for commit 625f982.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
# why
- `extract()` was missing from `stagehand.history()`
- addresses #1357 
# what changed
- added a call to `addToHistory()` after `extract()` finishes
# test plan






<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Include extract() in stagehand.history() so extract actions and results
are tracked with instruction, selector, timeout, and schema details.
Fixes missing history entries for extract and addresses #1357.

<sup>Written for commit 84f95db.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @browserbasehq/stagehand@3.0.6

### Patch Changes

- [#1388](#1388)
[`605ed6b`](605ed6b)
Thanks [@miguelg719](https://github.com/miguelg719)! - Fix multiple
click event dispatches on CDP and Anthropic CUA handling (double clicks)

- [#1400](#1400)
[`34e7e5b`](34e7e5b)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - don't write
base64 encoded screenshots to disk when caching agent actions

- [#1345](#1345)
[`943d2d7`](943d2d7)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add support for
aborting / stopping an agent run & continuing an agent run using
messages from prior runs

- [#1334](#1334)
[`0e95cd2`](0e95cd2)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add support for
google vertex provider

- [#1410](#1410)
[`d4237e4`](d4237e4)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix:
include extract in stagehand.history()

- [#1315](#1315)
[`86975e7`](86975e7)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add streaming support
to agent through stream:true in the agent config

- [#1304](#1304)
[`d5e119b`](d5e119b)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add support for
Microsoft's Fara-7B

- [#1346](#1346)
[`4e051b2`](4e051b2)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: don't
attach to targets twice

- [#1327](#1327)
[`6b5a3c9`](6b5a3c9)
Thanks [@miguelg719](https://github.com/miguelg719)! - Informed error
parsing from api

- [#1335](#1335)
[`bb85ad9`](bb85ad9)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - add support
for page.addInitScript()

- [#1331](#1331)
[`88d28cc`](88d28cc)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix:
page.evaluate() now works with scripts injected via
context.addInitScript()

- [#1316](#1316)
[`45bcef0`](45bcef0)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add support for
callbacks in stagehand agent

- [#1374](#1374)
[`6aa9d45`](6aa9d45)
Thanks [@miguelg719](https://github.com/miguelg719)! - Fix key action
mapping in Anthropic CUA

- [#1330](#1330)
[`d382084`](d382084)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: make
act, extract, and observe respect user defined timeout param

- [#1336](#1336)
[`1df08cc`](1df08cc)
Thanks [@tkattkat](https://github.com/tkattkat)! - Patch agent on api

- [#1358](#1358)
[`2b56600`](2b56600)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add support for 4.5
opus in cua agent

## @browserbasehq/stagehand-evals@1.1.5

### Patch Changes

- [#1364](#1364)
[`ca0630e`](ca0630e)
Thanks [@tkattkat](https://github.com/tkattkat)! - Update model handling
in agent evals cli

- Updated dependencies
\[[`605ed6b`](605ed6b),
[`34e7e5b`](34e7e5b),
[`943d2d7`](943d2d7),
[`0e95cd2`](0e95cd2),
[`d4237e4`](d4237e4),
[`86975e7`](86975e7),
[`d5e119b`](d5e119b),
[`4e051b2`](4e051b2),
[`6b5a3c9`](6b5a3c9),
[`bb85ad9`](bb85ad9),
[`88d28cc`](88d28cc),
[`45bcef0`](45bcef0),
[`6aa9d45`](6aa9d45),
[`d382084`](d382084),
[`1df08cc`](1df08cc),
[`2b56600`](2b56600)]:
    -   @browserbasehq/stagehand@3.0.6

## @browserbasehq/stagehand-server@3.0.6

### Patch Changes

- Updated dependencies
\[[`605ed6b`](605ed6b),
[`34e7e5b`](34e7e5b),
[`943d2d7`](943d2d7),
[`0e95cd2`](0e95cd2),
[`d4237e4`](d4237e4),
[`86975e7`](86975e7),
[`d5e119b`](d5e119b),
[`4e051b2`](4e051b2),
[`6b5a3c9`](6b5a3c9),
[`bb85ad9`](bb85ad9),
[`88d28cc`](88d28cc),
[`45bcef0`](45bcef0),
[`6aa9d45`](6aa9d45),
[`d382084`](d382084),
[`1df08cc`](1df08cc),
[`2b56600`](2b56600)]:
    -   @browserbasehq/stagehand@3.0.6

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
# why

update agent docs to reflect new features 

# what changed

- docs on abort signal 
- docs on message continuation
- docs on streaming 
- docs on callbacks





<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Updated Agent docs to cover new experimental capabilities—streaming,
callbacks, abort signals, and message continuation—and clarified what’s
supported for Computer Use Agents vs non-CUA. This helps build real-time
UIs, control execution, and maintain conversation state.

- **New Features**
  - Added CUA vs non-CUA feature matrix.
- Documented streaming mode (`stream: true`), `textStream`/`fullStream`,
and `AgentStreamResult`.
- Added lifecycle callbacks for non-streaming and streaming, with
examples.
- Added `AbortSignal` usage, timeout patterns, and streaming abort
behavior.
  - Added message continuation via `messages` in `execute` options.
- Updated references: `AgentConfig.stream`, `messages`, `signal`,
`callbacks`, response fields (e.g., `messages`, `timestamp`), and new
error types.

- **Migration**
- Set `experimental: true` to use these features; they are not supported
with CUA.
- Enable `stream: true` for streaming and streaming callbacks; using
streaming-only callbacks without streaming will throw.
- Pass previous result `messages` to continue conversations; use
`AbortController.signal` to cancel runs.

<sup>Written for commit 3b58bf9.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
# why
- `act`, `extract`, & `observe` fail, and stagehand logs
`AI_LoadAPIKeyError` if a user attempts to use a google LLM, and has
`GOOGLE_API_KEY` in their `.env` instead of
`GOOGLE_GENERATIVE_AI_API_KEY` or `GEMINI_API_KEY`
# what changed
- this PR widens the accepted env vars for google models to accept
`GOOGLE_API_KEY`




<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Allow GOOGLE_API_KEY for Google models by expanding the env var lookup.
Fixes key-loading failures in act, extract, and observe when users set
GOOGLE_API_KEY in .env.

<sup>Written for commit 4984318.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
…1409)

# why
- update `act` reference to use `"provider/model-name"` formatting

---------

Co-authored-by: Sean McGuire <seanmcguire1@outlook.com>
# why

We didn't have a link to our Discord

# what changed

<img width="289" height="245" alt="image"
src="https://github.com/user-attachments/assets/d1e12f96-db02-4982-806f-fc45d6bb42fb"
/>

# test plan

n/a


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Added a Discord link across the docs (global anchors, navbar, footer) so
users can quickly join the community. Also added a GitHub anchor and
removed the outdated “Stagehand by Browserbase” link.

<sup>Written for commit fb5d591.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
# why

- when transitioning to v3, we did not use the latest version of
screenshot collector
- screenshot collector currently fails due to not having page.on and
page.off support for the load, and domcontentloaded events.

# what changed

- added latest version of screenshot collector 

# test plan

- ran evals in cli with additional logging to also verify everything is
working as expected
























<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Updated the evals CLI screenshot collector to the latest version, adding
image-diff filtering and a V3 event bus that emits agent screenshots.
This reduces duplicate screenshots and stabilizes capture on v3 pages
where navigation events are disabled.

- **New Features**
  - Skip similar screenshots using MSE/SSIM thresholds with sharp.
- Event bus integration: agents emit screenshots; collector can ingest
them.
- Non-blocking initial/final captures and safer interval capture with
error handling.

- **Dependencies**
  - Added sharp ^0.34.5 for image processing (evals and core).
  - Patch bump via changeset for @browserbasehq/stagehand-evals.

<sup>Written for commit f4e90f8.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: miguel <miguelg71921@gmail.com>
Co-authored-by: Miguel <36487034+miguelg719@users.noreply.github.com>
# why

we need more evals for agent 

# what changed

- Added 19 new evals composed primarily of "hard" level tasks from
public datasets such as onlineMind2web
- Updated evals to import agent from agent, rather than v3Agent, as it
was an incorrect import causing tasks to fail

# test plan

ran evals 



<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Added 18 new hard-level agent evals and fixed the agent import to use
the correct agent, improving coverage and stability of browser tasks.

- **New Features**
- Added evals for diverse sites (Amazon cart, KFC order, Redfin rentals,
Flipkart filters, WebMD tools, Trustpilot, Uniqlo, Alibaba, NVIDIA
drivers, OED search, Radiotimes, TheGamer, Trailhead, etc.).
- Integrated ScreenshotCollector in new evals to capture journeys for
better automated evaluation.
- Updated evals.config.json to register all new tasks under the agent
category.

- **Bug Fixes**
- Replaced v3Agent with agent across existing evals to prevent task
failures.
- Standardized agent.execute usage and evaluation flow to improve
reliability.

<sup>Written for commit b947d97.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
# why

We had `page.click(x, y)` for coordinate-based clicking but no
equivalent for hovering. Also, the agent's will need hover abilities

# what changed

- Added `page.hover(x, y, options?)` to dispatch mouse move events at
coordinates

# test plan

Added `page-hover.spec.ts` with 6 tests covering:
- Mouseover event triggers
- Hover doesn't click
- `returnXpath` option
- CSS `:hover` pseudo-class activation
- Multiple sequential hovers






<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds page.hover(x, y, options?) for coordinate-based hovering. Enables
mouseover and CSS :hover without clicking, with an option to return the
hovered element’s XPath.

- **New Features**
  - Dispatches mouseMoved at absolute page coordinates via CDP.
  - Supports options.returnXpath to return the element XPath.
- Moves cursor without triggering click; activates mouseover and :hover
states.

<sup>Written for commit 5b3b39f.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
seanmcguire12 and others added 30 commits February 25, 2026 19:38
# why
- this function was from legacy stagehand which only operated on one
page
- presently, it was only being used to produce a log which:
- at best, misinformed users on whether the page had actually navigated,
and,
  - at worst, resulted in a noisy error log
- the error log would happen if `clickElement()` triggered page closure.
this means that the frame.evaluate() to get the URL would attempt
`.evaluate()` on a frame that no longer existed
# what changed
- removed `handlePossibleNavigation()`
# test plan
- existing tests are fine here

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Removed the legacy handlePossibleNavigation() that tried to detect
navigation by URL and produced misleading logs. This also prevents
errors when clicks close the page and evaluate runs on a non-existent
frame, reducing log noise.

<sup>Written for commit 1d2c3d6.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1761">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
# what changed
- added documentation for the `context.setExtraHTTPHeaders()` function


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Add v3 docs for context.setExtraHTTPHeaders(), including API,
context-wide behavior (applies to all pages, replaces not merges, clear
via {}), examples, and error docs. Also updates the V3Context interface
to include this method; addresses Linear STG-1414.

<sup>Written for commit c6f64ee.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1762">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
Fixes CE-731

## Summary
- Remove Claude 3.5 Sonnet (`claude-3-5-sonnet-latest`, `-20241022`, `-20240620`) and Claude 3.7 Sonnet (`claude-3-7-sonnet-latest`, `-20250219`) from all supported model lists
- These models are **retired** by Anthropic — API calls to them will fail
- Replace with `claude-sonnet-4-20250514` across evals, CI, docs, and examples

## What changed (27 files)
- **Core types**: Removed from `model.ts` type union, `agent.ts` CUA models list
- **Provider mappings**: Removed from `LLMProvider.ts`, `AgentProvider.ts`, server `utils.ts`, server `model.ts`
- **Evals/CI**: Updated `taskConfig.ts`, `initV3.ts`, `ci.yml`, `.env.example` to use `claude-sonnet-4-20250514`
- **Tests**: Updated `model-deprecation.test.ts` and `llm-and-agents.test.ts` (513/513 pass)
- **Docs**: Updated all v2 and v3 documentation references (11 `.mdx` files)
- **Other**: Issue template, MCP example

## Context
Per [Anthropic's model deprecations page](https://docs.anthropic.com/en/docs/resources/model-deprecations):
| Model | Retired |
|-------|---------|
| `claude-3-5-sonnet-20240620` | Oct 28, 2025 |
| `claude-3-5-sonnet-20241022` | Oct 28, 2025 |
| `claude-3-7-sonnet-20250219` | Feb 19, 2026 |

## Test plan
- [x] All 513 unit tests pass (`pnpm exec turbo run test:core`)
- [x] `grep` confirms zero remaining references outside CHANGELOG.md (historical)
- [ ] Verify CI passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)
## Summary
- expose `headers` on `GoogleVertexProviderSettings` in Stagehand public model types
- add a public API type test proving model configs with headers are accepted for google/openai/anthropic
- add a patch changeset

## Context
Runtime already forwards provider options to `createVertex()`, but TypeScript rejected `headers` in model config. This aligns public types with runtime behavior.

## Validation
- `pnpm -C packages/core run typecheck`
- `pnpm -C packages/core run build:esm`
- `pnpm -C packages/core run test:core -- packages/core/dist/esm/tests/unit/public-api/llm-and-agents.test.js`
- `pnpm -C packages/core run test:core -- packages/core/dist/esm/tests/unit/llm-provider.test.js`

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Expose the headers field on GoogleVertexProviderSettings in the public model config types so custom provider headers (e.g., X-Goog-Priority) are accepted without TypeScript errors. Updated the public API type test to cover Vertex headers and align the model config check with the public API style, keeping types consistent with runtime behavior.

<sup>Written for commit bf4907d. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1764">Review in cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
…l Cache sections (#1770)

## Summary

Restructures the caching best practices docs page into two clear
sections:

### Changes
- **Removed** the disclaimer/note about server-side caching only working
with `env: "BROWSERBASE"` — this is now naturally conveyed in the
section description
- **Renamed** "Server-side Caching" → **"Browserbase Cache"** with a
clear description of what it is (managed, server-side, automatic,
zero-config)
- **Renamed** "Local Caching" → **"Local Cache"** with a clear
description of what it is (file-based, works everywhere, portable)
- **Added** use-case bullets to the Local Cache section explaining when
to reach for it (agent workflows, CI/CD, local dev, cross-machine
sharing)
- **Preserved** all existing code snippets, configuration examples, and
best practices

### What stays the same
- All code examples (disabling on constructor, disabling per call,
inspecting cache status, act/agent caching, cache directory
organization)
- The limitations section for Browserbase Cache
- The best practices accordion (descriptive dirs, clearing cache,
committing for CI/CD)
- The blog link for deeper technical details

Only modifies `packages/docs/v3/best-practices/caching.mdx`.

Linear: https://linear.app/browserbase/issue/STG-1482
…ction time (#1719)

# why

Init script injection was racing with Debugger.resume() sometimes,
causing frames to load without init scripts running sometimes. This led
to flaky init script tests, which were legitimately catching the issue.

-
https://github.com/browserbase/stagehand/actions/runs/22233062982/job/64336364420?pr=1580

<img width="1613" height="987" alt="image"
src="https://github.com/user-attachments/assets/e836cd65-ed3b-41c8-8f8e-152fd70f30f4"
/>


# what changed

- queues calls on page load to run before we resume
- catches oopifs and lazy frames and click-triggered popups the same way
playwright does
- removes flaky timeout/retry based prior approach


https://deepwiki.com/search/how-does-playwright-guarantee_8cf2339b-c060-4cfc-bc62-f3baaf57b229?mode=deep

# test plan





<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes the init‑script race by guaranteeing pre‑resume setup and
correcting popup attach order. Init scripts now run reliably in same‑
and cross‑process popups, OOPIF iframes, and across reloads; race tests
verify addScript is sent before resume per session.

- **Bug Fixes**
- Enforce pre‑resume ordering: per‑session dispatch waiters ensure
Page/Runtime enables, Target.setAutoAttach(waitForDebuggerOnStart),
Network.enable/setExtraHTTPHeaders, and
Page.addScriptToEvaluateOnNewDocument(runImmediately) are sent before
Runtime.runIfWaitingForDebugger; resume only after dispatch; log
ordering issues only for top‑level pages.
- Stabilize attach and evaluation: fix popup attach ordering; fan out
Target.* events to root listeners; retry Runtime.evaluate once on stale
context ids; pre‑register the piercer script before resume and
lazy‑install if needed.
- Harden lifecycle: convert detach errors to PageNotFoundError and
propagate; treat Page.enable/lifecycle acks as best‑effort; never drop
top‑level Page.create due to local timeouts.
- Expand tests and deflake: add delayed‑CDP‑send popup/iframe race repro
with real URLs; assert addScript precedes resume per session; cover
in‑process and cross‑process popups, window.open, OOPIF iframes, and
reload persistence; update detach expectations and timeouts.

<sup>Written for commit 6f464d3.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1719">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
# why
- to allow for setting HTTP headers at the page level
# what changed
- added new function `page.setExtraHTTPHeaders()` , which sets HTTP
headers for the CDP session of the page, and all of its child sessions
(eg, iframes)

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds page.setExtraHTTPHeaders() to set per-page HTTP headers on all
requests from the page and its iframes. Applies to pipeline sessions
immediately and replays on newly adopted child sessions. Addresses ST
LaurensG- NB: STG-1316.

<sup>Written for commit cf677c2.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1774">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
)

## Summary
- Adds support for CDP (Chrome DevTools Protocol) extra HTTP headers
when connecting to browser sessions
- Passes `extraHTTPHeaders` from the Stagehand config through to the CDP
connection layer
- Warns when `cdpHeaders` provided without `cdpUrl`
- Includes integration test for the new functionality

Related: #1737

Co-authored-by: aditya-silna <aditya@silnahealth.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: aditya-silna <aditya@silnahealth.com>
# why
After the build migration, `pnpm build:cli` was no longer linking or
preserving overriden configs

# what changed
- Added bin field in `package.json` to enable npm linking
- Implemented smart config merging in the build script that updates
tasks/benchmarks from source while preserving user-customized defaults
- Added auto-linking via npm link --force at the end of the build
process with graceful fallback, for whenever users run `pnpm build:cli`
- Set `serverCache: false` in initV3 for consistent eval behavior on API

# test plan

---------

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
## Summary
- Server integration tests, evals, and Stainless preview builds require repo secrets that GitHub doesn't expose to fork PRs
- These jobs were running and failing with missing env var errors on every fork PR
- Add the same fork guard (`head.repo.full_name == github.repository`) that e2e tests already use

### Jobs fixed:
- `server-integration-tests` in `ci.yml`
- `run-evals` in `ci.yml`
- `preview` in `stainless.yml`

## Test plan
- [ ] Verify existing (non-fork) PRs still run all CI jobs
- [ ] Verify fork PRs skip the guarded jobs gracefully

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Skip CI jobs that require repo secrets on fork PRs to prevent missing env errors. These jobs now run only when the PR comes from this repo.

- **Bug Fixes**
  - Guarded server integration tests in ci.yml.
  - Guarded eval runs in ci.yml.
  - Guarded Stainless preview builds in stainless.yml.

<sup>Written for commit f71de8d. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/1780">Review in cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
PSA potential hackers: dont get excited, we don't have any real secrets
in CI worth stealing, and our CI does not autodeploy anything to prod.
All important secrets and CD processes are kept in our closed-source
repos.

# why

# what changed

# test plan


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Add a gating workflow that blocks CI until a maintainer approves running
secrets on forked PRs. CI now triggers from that gate, resolves labels
and path filters under workflow_run, removes same-repo guards so
integration/e2e/evals run on approved forks, and checks out the PR
commit consistently across jobs.

<sup>Written for commit c682847.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1782">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
…ed" (#1786)

Reverts #1782

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Reverts the approval-based CI for external contributors. CI now runs on
pull_request and blocks secrets for forked PRs by skipping integration,
E2E, and eval jobs.

- **Refactors**
  - Removed the “Ensure Contributor Is Trusted to Run CI” workflow.
  - Switched CI trigger to pull_request; removed workflow_run logic.
  - Read labels from github.event.pull_request; removed API calls.
  - Simplified checkouts; dropped explicit head_sha refs.
  - Updated concurrency group to use github.ref.
  - Ignored docs-only changes in CI.

<sup>Written for commit d6ace82.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1786">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
Reverts #1780

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Reverts the change that skipped CI on forked PRs. Integration tests,
evals, and the Stainless preview now run for all PRs by removing the
head-repo equality checks in ci.yml and stainless.yml.

<sup>Written for commit 18480e8.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1787">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
# why

cdpHeaders is already plumbed through packages/server correctly, it was
just missing from the spec.

- packages/core/lib/v3/types/public/api.ts:15 defines cdpHeaders on
LocalBrowserLaunchOptionsSchema.
- packages/server/src/routes/v1/sessions/start.ts:192 forwards
browser.launchOptions with a spread into localBrowserLaunchOptions, so
cdpHeaders is preserved.
- packages/server/src/lib/InMemorySessionStore.ts:240 passes
localBrowserLaunchOptions straight into new V3(...).
- packages/core/lib/v3/v3.ts:750 passes lbo.cdpHeaders into
V3Context.create(...).
- packages/core/lib/v3/understudy/context.ts:167 finally uses it in
CdpConnection.connect(wsUrl, { headers: opts?.cdpHeaders }).

# what changed

# test plan


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Added the missing `cdpHeaders` field to the v3 server OpenAPI spec so
clients can send custom Chrome DevTools Protocol headers. This aligns
the spec with server launch options and prevents client
codegen/validation errors.

<sup>Written for commit 39ee737.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1797">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
…and server-v4 dirs (#1796)

# Follow-up Tasks

- [ ] Update stainless SDK custom code for all languages to pull new
`stagehand-server-v3-darwin-x64` binary names (`-v3-` added)

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Split the Stagehand API into `packages/server-v3` and
`packages/server-v4`, each with its own builds, tests, SEA binaries, and
release workflows. Delivers STG-1536 and lets us keep v3 stable while
iterating on v4; CI/test discovery and OpenAPI artifacts are versioned.

- **Refactors**
- Renamed the original server to `packages/server-v3`
(`@browserbasehq/stagehand-server-v3`); updated docs and runtime path
helpers (now synced across core/docs/evals and both servers), ESLint
globs/ignores, scripts/Turbo filters, tests, and Stainless to read
`packages/server-v3/openapi.v3.yaml`; v3 SEA binaries use
`stagehand-server-v3-*`.
- Added `packages/server-v4` (`@browserbasehq/stagehand-server-v4`) with
`/v4/**` routes, SSE streaming via `x-stream-response`, LRU/TTL
in-memory session store, health/readiness, logging/metrics,
`openapi.v4.yaml` + generator, SEA tooling, and v4 integration tests.
- CI: path filters, test discovery, and artifacts cover both versions;
added `stagehand-server-v4-release.yml` and
`stagehand-server-v4-sea-build.yml`; renamed v3 workflows; artifacts
include `packages/server-v3/**` and `packages/server-v4/**` dists and
OAS.

- **Migration**
- Replace `packages/server/**` refs with `packages/server-v3/**` or
`packages/server-v4/**`.
- Use new package filters and binary names:
`@browserbasehq/stagehand-server-v3` /
`@browserbasehq/stagehand-server-v4`; `stagehand-server-v3-*` /
`stagehand-server-v4-*`.
- Update OpenAPI consumers to `packages/server-v3/openapi.v3.yaml` or
`packages/server-v4/openapi.v4.yaml`.

<sup>Written for commit 2b9114c.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1796">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
## Summary
- Adds the `@browserbasehq/browse-cli` package (`packages/cli`) to the
stagehand monorepo, open-sourcing browser automation for AI agents
- CLI provides stateful browser control via a daemon architecture —
navigation, clicking, typing, screenshots, accessibility snapshots,
multi-tab, network capture, and env switching (local/remote)
- Uses `@browserbasehq/stagehand` as a workspace dependency (bundled
into the CLI binary via tsup)
- Includes full test suite and documentation

## Changes
- `packages/cli/` — all CLI source code, config, tests, and docs
- `pnpm-workspace.yaml` — added `packages/cli` to workspace
- `.github/workflows/ci.yml` — added CLI path filters and build artifact
uploads
- `.changeset/open-source-browse-cli.md` — changeset for initial release
- `pnpm-lock.yaml` — updated lockfile

## Test plan
- [x] CLI builds successfully (`pnpm --filter @browserbasehq/browse-cli
run build`)
- [x] Full monorepo build passes (`turbo run build` — 9/9 tasks)
- [x] `browse --help` and `browse --version` output correctly
- [x] `browse status` returns valid JSON
- [x] Lint passes clean (`pnpm --filter @browserbasehq/browse-cli run
lint`)
- [x] Source verified identical to stagent-cli (only import path
changed)
- [x] Empirically tested Browserbase credential requirements match core
- [ ] Run `pnpm --filter @browserbasehq/browse-cli run test` (requires
Chrome/browser environment)

## Known issues (pre-existing from stagent-cli, not introduced by this
PR)
- Network capture `response.json` always writes `status: 0` — response
metadata from `responseReceived` CDP event is not persisted to
`loadingFinished` handler
- Ref-based `click` command silently ignores
`--button`/`--count`/`--force` flags (coordinate-based `click_xy`
handles them correctly)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…g CI (#1801)

# why

# what changed

# test plan


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Corrects the changeset package reference from
`@browserbasehq/stagehand-server` to
`@browserbasehq/stagehand-server-v3` to unblock CI and ensure the
correct package receives the patch release.

<sup>Written for commit 177bc48.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1801">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
## Summary
- `browse env` showed stale "local" mode after `browse env remote`
- Root cause: `.mode` file was only written during lazy browser init
(`ensureBrowserInitialized`), not at daemon startup. Between daemon
start and first command, `readCurrentMode()` returned `null` and fell
back to hardcoded `"local"`
- Write `.mode` eagerly in `runDaemon()` at startup so it's immediately
available
- Fall back to `desiredMode` instead of `"local"` in the `env` display
handler as a safety net

## Test plan
- [x] Reproduced bug: `browse env remote` → `browse env` showed
`"mode":"local"`
- [x] Verified fix: `browse env remote` → `browse env` now shows
`"mode":"remote"`
- [x] `mode.test.ts` passes (3/3)


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes `browse env` showing stale "local" after `browse env remote`
(STG-1547). The daemon now writes `.mode` at startup, the display falls
back to `desiredMode` until mode is written, and a patch changeset is
added for `@browserbasehq/browse-cli`.

<sup>Written for commit 9661d92.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1806">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Stacked on #1800
- Only `BROWSERBASE_API_KEY` is required for remote mode in the CLI
- `BROWSERBASE_PROJECT_ID` is still passed through if set, but no longer
checked

## Changes
- `packages/cli/src/index.ts` — `hasBrowserbaseCredentials()` only
checks for API key
- `packages/cli/tests/mode.test.ts` — Updated test to match new error
message
- `packages/cli/README.md` — Updated docs to reflect optional project ID

## Test plan
- [x] Existing mode test updated
- [x] Manual: `browse env remote` with only `BROWSERBASE_API_KEY` set

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Make `BROWSERBASE_PROJECT_ID` optional in the CLI for remote mode, so
only `BROWSERBASE_API_KEY` is required. The project ID is still
forwarded when provided.

- **Bug Fixes**
- Updated remote mode check and error message to only require
`BROWSERBASE_API_KEY`.
- Autodetection now defaults to `remote` when the API key is set;
otherwise `local`.
  - Updated tests and `@browserbasehq/browse-cli` README to match.

<sup>Written for commit 99eb186.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1803">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r PRs to run CI with secrets (#1794)

# why

- External contributor PRs currently fail CI because they cant run with
secrets
- We dont want to allow them to run with secrets until a team member
"claims" them and reviews for any secrets exfiltration / sketchy code
- Once claimed, we want to run the full CI suite with secrets

# what changed

# test plan

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds two GitHub Actions that let maintainers claim external contributor
PRs by mirroring the approved head SHA to a maintainer-owned branch so
full CI can run with secrets. Claims come from an approving review by a
team member with write access on the latest commit and are
auto-invalidated on new commits (Linear STG-1518).

- **New Features**
- Detects forked PRs and posts claim instructions; manages labels:
`external-contributor`, `external-contributor:awaiting-approval`,
`external-contributor:mirrored`, `external-contributor:stale`,
`external-contributor:completed`.
- On approving review of the latest commit, verifies reviewer
permission, mirrors that exact SHA to
`external-contributor-pr-<PR#>-<12sha>`, and creates/reopens a “[Claimed
#X]” PR assigned to the approver.
- Closes and links the original PR with marker comments; keeps
labels/status in sync on both PRs.
- Auto-closes the mirror when new commits land on the external PR and
comments with next steps; if the mirror closes without merge, reopens
and relabels the original PR; if the external PR is reopened with the
same approved SHA while the mirror is open, it is closed again to keep
discussion on the mirror.
- Implemented via `external-contributor-pr-approval-handoff.yml`
(captures approved reviews, uploads artifact) and
`external-contributor-pr.yml` (consumes artifact, performs mirroring);
uses `actions/github-script@v7`, `actions/create-github-app-token@v1`,
`actions/checkout@v4`, `actions/download-artifact@v4`,
`actions/upload-artifact@v4`; concurrency scoped per PR/workflow run.

- **Migration**
- Create a GitHub App with `contents:write`, `pull_requests:write`, and
`issues:write`; add `EXTERNAL_CONTRIBUTOR_PR_APP_ID` and
`EXTERNAL_CONTRIBUTOR_PR_APP_PRIVATE_KEY` secrets.
- To claim: submit an approving review on the latest commit of a forked
PR. If new commits are pushed, approve again to re-claim and rerun CI.

<sup>Written for commit 4875e99.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1794">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
# why

bug in previous approach

# what changed

# test plan

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes the external PR approval flow by switching to the correct
`GITHUB_TOKEN`, stabilizing the mirror/refresh behavior, and ignoring
third‑party bot comments when parsing claim markers. Also improves the
`claude` workflow to build the repo before edits and allow rerunning
failed jobs.

- **Bug Fixes**
- Use `GITHUB_TOKEN` for branch pushes and API calls; remove the GitHub
App token path.
  - Enable `persist-credentials: true` during checkout to allow pushes.
- Keep the mirrored PR open and mark it stale when new commits land on
the external PR; relabel both PRs consistently.
- Auto-handle reopen/close transitions across external and mirrored PRs.
- Ignore comments from non-managed bots (e.g., Greptile, Cubic); only
parse claim markers from `github-actions[bot]` to avoid false triggers.

- **Refactors**
- Inline a small JS lib (`ECPR_LIB`) to manage labels, comments,
lifecycle, and claims; jobs run in clear phases (external lifecycle →
claim prep → branch refresh → claim finalize).
- Refresh internal branches by rebasing onto the approved external ref;
report conflicts cleanly for manual follow-up.
- Improve `claude.yml`: upgrade to `actions/checkout@v6`, set `actions:
write`, run `pnpm`/`turbo` build via `setup-node-pnpm-turbo`, enable
`track_progress`, and use an explicit tool allowlist for
`anthropics/claude-code-action@v1`.

<sup>Written for commit a46b159.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1812">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
# Why

OpenAI organizations with Zero Data Retention (ZDR) rejects stored
responses from the Responses API (`store: true` is the default when the
AI SDK auto selects it). This causes agent runs to fail

# What Changed

- Set `openai: { store: false }` in `providerOptions` across
`generateText` / `streamText` calls: `v3AgentHandler.ts` (execute +
stream), `handleDoneToolCall.ts`,
- Simplified the existing Gemini `providerOptions` — removed the
conditional `modelId.includes("gemini-3")` check and always pass
`google: { mediaResolution: "MEDIA_RESOLUTION_HIGH" }` since non-Google
providers ignore it.

# Test Plan

- [ ] Run agent in mode with an OpenAI model to confirm no breaking
changes


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Defaulted agent calls to OpenAI to not store responses, preventing
failures for Zero Data Retention orgs. Also simplified Gemini options by
always sending high media resolution.

- **Bug Fixes**
- Set `providerOptions.openai.store` to `false` for agent `generateText`
and `streamText` calls in `v3AgentHandler` (execute + stream) and
`handleDoneToolCall`, avoiding Responses API rejections in ZDR orgs.

- **Refactors**
- Always pass `google: { mediaResolution: "MEDIA_RESOLUTION_HIGH" }` in
`providerOptions`; non-Google providers ignore it. Added a changeset for
a patch release of `@browserbasehq/stagehand`.

<sup>Written for commit a01d8c0.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1814">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
## Summary
- Adds `--context-id <id>` and `--persist` flags to `browse open` so
agents can load/persist browser state (cookies, localStorage, etc.)
across Browserbase sessions using Contexts
- Validates edge cases: `--persist` requires `--context-id`,
`--context-id` requires remote mode, context change triggers daemon
restart

## Usage
```bash
# Load a context (read-only — state not saved back)
browse open https://app.com --context-id ctx_abc123

# Load and persist changes back on session end
browse open https://app.com --context-id ctx_abc123 --persist
```

## How it works
1. `browse open --context-id` writes context config to
`/tmp/browse-{session}.context`
2. The daemon reads this file during browser initialization and passes
it through as `browserbaseSessionCreateParams.browserSettings.context`
3. If a second `browse open` is called with a different context ID, the
daemon is restarted (context is baked into the BB session at creation
time)

Context config uses a temp file (same pattern as `.mode`) because it's
needed at Browserbase session creation time, before the daemon's command
socket is up.

## Test plan
- [x] `browse open https://example.com --context-id <known-id>
--persist` on remote mode — verify session created with context in BB
dashboard
- [x] `browse stop` then reopen with same context — verify state
persists
- [x] Verify context mismatch triggers daemon restart (open with context
A, then open with context B)
- [x] Same context, second open — verify no unnecessary restart
- [x] `browse open https://example.com --context-id <id>` on local mode
— verify clear error
- [x] `browse open https://example.com --persist` without `--context-id`
— verify clear error
- [x] Plain `browse open` (no context flags) — verify no regression
- [x] `cleanupStaleFiles` removes `.context` file on shutdown
- [x] Stale `.context` file from crashed daemon is cleared on next
`browse open` without `--context-id`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# why

when running pnpm format, it formats files that are not relevant to
current changes which is annoying

# what changed

formatted the unformatted files in cli package 

# test plan


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Standardized Prettier/ESLint formatting in `packages/cli` so `pnpm
format` runs are stable and don’t touch unrelated files. No functional
changes.

- **Refactors**
- Applied Prettier across `packages/cli/src` and tests (line breaks,
parens, quotes).
- Tidied lint/Prettier config formatting (`eslint.config.mjs`,
`.prettierrc` newline).
  - Adjusted test imports and one assertion to match formatter.

<sup>Written for commit 31570db.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1819">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
# why

Allow users to pass custom headers in their LLM calls

# what changed

Add headers to the model.ts types 

# test plan


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds `headers` support to `ClientOptions` so clients can send custom
HTTP headers with every provider request. Useful for auth tokens or
routing hints without changing global config.

- **New Features**
- Added `headers?: Record<string, string>` to `ClientOptions` in
`packages/core/lib/v3/types/public/model.ts`; headers are sent with each
request.
  - No breaking changes; default behavior is unchanged.

<sup>Written for commit 424dc1a.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/1817">Review in
cubic</a></sup>

<!-- End of auto-generated description by cubic. -->
# why
Sync the Stagehand MCP docs with the Browserbase MCP docs for STG-1576.

# what changed
Copied the refreshed Browserbase MCP introduction and setup pages into
`packages/docs/v3/integrations/mcp`.

# test plan
`pnpm exec prettier --check packages/docs/docs.json
packages/docs/v3/integrations/mcp/introduction.mdx
packages/docs/v3/integrations/mcp/setup.mdx`; `pnpm --dir packages/docs
exec mint broken-links` (unrelated existing failures only); `pnpm lint`
fails in `packages/core` on an existing ESLint rule config issue.

---------

Co-authored-by: ci-test <ci-test@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.