Skip to content

fix(bridge): fix master sentinel timeout + enforce friend directory isolation#36

Merged
A-Souhei merged 2 commits intomainfrom
fix/bridge-timeout-and-directory-enforcement
Mar 12, 2026
Merged

fix(bridge): fix master sentinel timeout + enforce friend directory isolation#36
A-Souhei merged 2 commits intomainfrom
fix/bridge-timeout-and-directory-enforcement

Conversation

@A-Souhei
Copy link
Owner

Summary

Fixes two related Bridge Mode issues:

1. Master Sentinel times out waiting for friend tasks

Root cause: pollTaskResult used a hard 5-minute wall-clock deadline (timeoutMs = 300_000). A friend Sentinel running a long task (hours) would always be abandoned after 5 minutes even if fully alive.

Fix:

  • New signature: pollTaskResult(taskID, nodeID?, signal?)
  • When nodeID is provided (bridge dispatch), the loop runs indefinitely — it only stops when:
    • Task result is found in Redis ✅
    • Friend node's heartbeat goes stale (>60 s) — friend is truly dead 💀
    • Caller's AbortSignal fires (master session cancelled) 🛑
    • Bridge session ends (s.bridgeID becomes null) 🔴
  • ctx.abort is passed directly as signal — no more Promise.race goroutine leak
  • Redis blips are caught in try/catch and retried instead of aborting the poll
  • getContext limit raised 50 → 200 so results deep in the list are still found
  • Added AbortSignal.timeout(10_000) on the initial /bridge/dispatch-task handshake fetch

2. Friend Alice attempts to read master's working directory

Root cause: The FRIEND system prompt had a loophole ("stay within your directory unless explicitly told otherwise") and Alice's external_directory permission was "ask" — which hangs indefinitely on a headless input-locked friend instead of hard-failing.

Fix:

  • Session created by dispatch-task now includes permission: [{ permission: "external_directory", pattern: "*", action: "deny" }] — same hard deny that sentinel/scout subagents already had
  • FRIEND system prompt rewritten with three explicit rules: all ops must stay within Instance.directory, NEVER access external directories even if the task prompt asks, and the master's directory is on a different machine entirely

Files changed

  • packages/opencode/src/bridge/index.tspollTaskResult rewrite
  • packages/opencode/src/tool/task.ts — bridge dispatch improvements
  • packages/opencode/src/agent/agent.ts — FRIEND prompt hardening
  • packages/opencode/src/server/routes/bridge.ts — permission deny on dispatch

…solation

- pollTaskResult now waits indefinitely while friend heartbeat is alive
  (replaces hard 5-min deadline with liveness-based loop keyed on nodeID)
- Pass ctx.abort signal into pollTaskResult to properly cancel background
  poll loop when master session is aborted (no more Promise.race leak)
- Add AbortSignal.timeout(10s) on initial /bridge/dispatch-task handshake
- Wrap Redis calls in try/catch so transient errors don't abort the poll
- Check s.bridgeID nullability at each iteration to handle Bridge.leave()
- Increase getContext limit 50→200 to avoid missing task_result entries
- Guard res.json() and check data.success before entering poll loop
- Hard-deny external_directory for bridge-dispatched friend sessions at
  the permission layer (was "ask" which hangs on headless input-locked node)
- Harden FRIEND system prompt: remove loophole, add explicit prohibition
  against accessing any path outside the friend's own Instance.directory
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens Bridge “task dispatch” behavior between master and friend nodes by adding dispatch timeouts, changing how the master waits for friend task results, and hardening friend sessions/prompts around filesystem boundaries.

Changes:

  • Add a 10s timeout to bridge task dispatch requests and simplify dispatch response handling.
  • Update Bridge.pollTaskResult to support abort signals and (optionally) friend liveness checks while polling.
  • On friend nodes, create dispatched-task sessions with an external_directory deny rule and strengthen FRIEND-mode system guidance about staying within Instance.directory.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
packages/opencode/src/tool/task.ts Adds dispatch timeout + changes parsing/validation and result polling for bridge-dispatched tasks.
packages/opencode/src/server/routes/bridge.ts Creates friend task sessions with external_directory denied by default.
packages/opencode/src/bridge/index.ts Refactors task result polling to accept node liveness + abort signals and adjusts polling behavior.
packages/opencode/src/agent/agent.ts Tightens FRIEND-mode rules in the generated bridge settings prompt.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +85 to +89
@@ -86,22 +86,18 @@ export const TaskTool = Tool.define("task", async (ctx) => {
method: "POST",
headers: { "Content-Type": "application/json", "x-bridge-id": bid },
body: JSON.stringify({ taskID, prompt, description: params.description }),
signal: AbortSignal.timeout(10_000),
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dispatch fetch is only gated by AbortSignal.timeout(10_000) and does not respect ctx.abort. If the tool call is cancelled, the request will continue until the timeout elapses. Consider combining signals (e.g., AbortSignal.any([ctx.abort, AbortSignal.timeout(...)])) so cancellation stops the network request immediately while still enforcing a max dispatch timeout.

Copilot uses AI. Check for mistakes.
Comment on lines +92 to +96
const data = await res.json().catch(() => null)
if (!data?.success) {
return {
title: params.description,
metadata: {} as { [key: string]: any },
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data is parsed as any and the code only checks data?.success, but later interpolates data.sessionID into the output/metadata. If the response JSON is malformed or missing sessionID, this will produce undefined in user-visible output and metadata. Suggest validating the response shape (e.g., ensure success === true and typeof sessionID === "string") before using it.

Copilot uses AI. Check for mistakes.
Comment on lines +727 to +731
if (nodeID !== undefined) {
while (true) {
if (signal?.aborted || !s.bridgeID || !s.pubClient) return null
try {
const entries = await getContext(s.bridgeID, 200)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When nodeID is provided, pollTaskResult enters an unbounded while (true) loop with no deadline. If the friend stays "alive" but never posts a task_result (or posts a non-matching one), the caller can hang indefinitely unless the provided signal is aborted. Consider restoring a timeout (like the previous 5 min default) and/or supporting an explicit timeout parameter for the nodeID polling path.

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "unbounded" loop is intentional and bounded by two independent termination conditions that don't require a wall-clock deadline:

  1. signal?.abortedctx.abort is passed as the signal from task.ts. This is the same abort signal that governs the entire tool call lifetime; when the user cancels, the session ends, or the parent agent is killed, the signal fires and the loop exits immediately.

  2. Friend liveness — every iteration calls Bridge.getNodes() and checks whether nodeID is still listed among live nodes (heartbeat within the last 60 s). If the friend crashes, loses network, or is shut down, its heartbeat stops and the loop exits within one poll interval (~2 s).

The previous 5-minute deadline was the wrong fix: it caused master to give up on legitimate long-running friend tasks (e.g. a friend doing a multi-file refactor). The correct invariant is "keep waiting as long as the friend is provably alive", which is exactly what the liveness check enforces.

If you'd like an additional safety net for the case where a friend stays alive but never posts a result (e.g. a hung Alice session), that could be added as a separate configurable timeout parameter — but it's not required for correctness here, and the session-level abort covers the user-facing cancellation path.

Comment on lines +366 to +369
const session = await Session.create({
title: description,
permission: [{ permission: "external_directory", pattern: "*", action: "deny" }],
})
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new session permission only denies external_directory. That blocks access outside the project boundary, but it does not enforce the stricter constraint described in the updated FRIEND prompt ("all file operations must stay within Instance.directory"). Many tools (e.g. read/list) will allow any path inside Instance.worktree without triggering external_directory. If the intention is to confine the friend to its working directory, consider adding explicit read/edit/glob/grep/list/bash rules that deny outside Instance.directory (and allow within it), or introducing a dedicated permission check for "outside cwd".

Copilot uses AI. Check for mistakes.
…ctory confinement

- task.ts: combine ctx.abort + 10s timeout via AbortSignal.any() for dispatch fetch
- task.ts: validate response shape (success === true, sessionID is string) before use
- bridge.ts: expand session permissions to deny all tools outside Instance.directory
  (read, edit, glob, grep, bash — not just external_directory)
@A-Souhei A-Souhei merged commit 59505aa into main Mar 12, 2026
2 checks passed
@A-Souhei A-Souhei deleted the fix/bridge-timeout-and-directory-enforcement branch March 12, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants