From 51e8ecef5f05bd50a6f39e058802353ed207969c Mon Sep 17 00:00:00 2001 From: Srinivas Pendela Date: Fri, 27 Feb 2026 12:09:15 +0100 Subject: [PATCH 01/23] docs: reframe vision to ssh-first and detail two-track roadmap --- README.md | 232 ++++---------- docs/TODO.md | 522 ++++++++++++++----------------- docs/brand/commandrelay-logo.svg | 7 +- 3 files changed, 303 insertions(+), 458 deletions(-) diff --git a/README.md b/README.md index d560ade..e61f8e4 100644 --- a/README.md +++ b/README.md @@ -4,108 +4,78 @@ CommandRelay logo

-CommandRelay is a secure, bi-directional terminal control gateway for long-running coding sessions. +CommandRelay is a production-oriented gateway for remote terminal control of long-running coding sessions. -It lets you monitor and control home-machine terminal sessions (tmux and ghostty/cmux runtime) from remote clients, with replay, guarded input, and auditability. +## SSH-First Architecture -## Why This Exists +CommandRelay is oriented around an SSH-first operating model: -Long AI coding sessions run for hours while you are away from your main machine. +1. Run the gateway on the machine that owns the terminal runtime. +2. Keep runtime state in `tmux` so sessions survive client disconnects. +3. Reach the gateway from remote clients over an SSH transport path (for example, tunnel + WebSocket) while CommandRelay enforces terminal control policy. -CommandRelay gives you one control surface to: +`ghostty` is an optional local terminal UI. It is not the control backend for remote lane ownership, replay, or policy enforcement. -1. See active terminal sessions. -2. Reattach after disconnects. -3. Send commands safely when needed. -4. Keep read-only mode as default. +## Control Plane (Bi-Directional) -## What You Get +Web and native clients use the same v1 envelope and core flow: -1. WebSocket event protocol with strict envelope validation. -2. Replay-aware terminal output streaming (`streamSeq` + `attach(lastSeq)`). -3. Guarded input flow: `enable_input` -> `input` -> `disable_input`. -4. Kill switch and lane-ownership controls. -5. Runtime backend multiplexer (`tmux`, `cmux`) with backend-aware pane/session routing. -6. Proxy package ecosystem for reusable outbound proxy behavior. +1. `list_sessions` +2. `attach(paneId, lastSeq)` +3. `output` stream with ordered `streamSeq` replay behavior +4. guarded input: `enable_input` -> `input` -> `disable_input` -## Architecture +Safety controls: -### Runtime Topology +1. read-only by default +2. explicit input enable per client +3. global input kill switch +4. pane writer-lane ownership arbitration +5. message/input rate limits and max input bytes +6. audit logging for auth/input/policy events -```text -Remote Client (web/iOS/android/macos) - | - | WS (/ws) - v -+-----------------------------------+ -| CommandRelay Gateway (Node/TS) | -| - Auth / policy / limits | -| - Replay + output stream engine | -| - Input lane arbitration | -+----------------+------------------+ - | - | Runtime multiplexer - v - +-------------------+-------------------+ - | | -+----+------------------+ +-----------+-----------+ -| tmux backend adapter | | cmux backend adapter | -| pane ids: %1, %2 ... | | pane ids: surface-* | -+-----------------------+ +-----------------------+ -``` - -### Event Flow (Condensed) +## Architecture Snapshot ```text -Client -> hello/auth -> list_sessions -> attach(paneId,lastSeq) -Server -> session_list -> output(snapshot/delta, streamSeq) -Client -> enable_input -> input -> disable_input -Server -> ack/error + policy_update +Remote Web / Native Client + | + | SSH transport path + WS (/ws) + v ++----------------------------------------+ +| CommandRelay Gateway (Node/TS) | +| - auth + policy | +| - replay + stream sequencing | +| - input lane arbitration | ++----------------------+-----------------+ + | + v + +------------------+ + | tmux runtime | + | panes: %1, %2... | + +------------------+ + +Local optional UI: Ghostty (operator convenience only) ``` -### Safety State Model +## UX Model (Operator View) ```text -DISCONNECTED - -> AUTHENTICATED_READ_ONLY - -> STREAMING_READ_ONLY - -> STREAMING_INPUT_ENABLED (explicit only) - -> READ_ONLY (disable_input / kill switch / disconnect) -``` - -## Runtime Backends (tmux + ghostty/cmux) - -Configure runtime backends with: - -```bash -COMMANDRELAY_RUNTIME_BACKENDS=tmux -# or -COMMANDRELAY_RUNTIME_BACKENDS=tmux,cmux -``` - -Optional cmux command override: - -```bash -COMMANDRELAY_CMUX_COMMAND=/opt/homebrew/bin/cmux ++------------------------------------------------------+ +| Session Tab: "backend-api" | +| +--------------------------+ +----------------------+ | +| | Pane %1 (read-only) | | Pane %2 (writer) | | +| | replay + live output | | input explicitly on | | +| +--------------------------+ +----------------------+ | +| Notifications: lane conflict, input enabled, | +| kill switch active, reconnect + replay complete | ++------------------------------------------------------+ ``` -Notes: - -1. Default backend set is `tmux`. -2. In multi-backend mode, pane IDs are backend-namespaced (`tmux:%1`, `cmux:surface-1`). -3. Startup logs availability per backend. -4. Startup fails only when all configured backends are unavailable in non-tmux-only mode. - -## Security Model +## Proxy Packages: Parallel Track -CommandRelay is secure-by-default: +The proxy packages (`@commandrelay/*`, `@termina/proxy-*`) are a parallel product track for outbound HTTP/proxy reuse. -1. Read-only mode on connect. -2. Explicit input enable required. -3. Global kill switch available. -4. Per-client input rate limits and max payload limits. -5. Pane ownership arbitration to prevent silent concurrent writers. -6. Audit logging support for auth/input/policy events. +They are not mandatory for the core terminal-control path (`list/attach/replay/input`) and should be treated as adjacent infrastructure, not a prerequisite for SSH + tmux operation. ## Quick Start @@ -115,100 +85,34 @@ npm run check npm start ``` -Default endpoints: +Default local endpoints: -1. Health: `GET http://127.0.0.1:8787/health` -2. Web app (if enabled): `http://127.0.0.1:8787/app/` -3. WebSocket: `ws://127.0.0.1:8787/ws` +1. `GET http://127.0.0.1:8787/health` +2. `http://127.0.0.1:8787/app/` (when static app hosting is enabled) +3. `ws://127.0.0.1:8787/ws` -## Core Environment Variables +## Core Configuration | Variable | Purpose | | --- | --- | | `COMMANDRELAY_AUTH_TOKEN` | Token auth for non-loopback binds | -| `COMMANDRELAY_RUNTIME_BACKENDS` | Runtime backend list (`tmux,cmux`) | -| `COMMANDRELAY_CMUX_COMMAND` | cmux executable/path override | +| `COMMANDRELAY_RUNTIME_BACKENDS` | Runtime backends (`tmux` default, optional `tmux,cmux`) | +| `COMMANDRELAY_CMUX_COMMAND` | Optional `cmux` command/path override | | `COMMANDRELAY_INPUT_KILL_SWITCH` | Global input disable switch | -| `COMMANDRELAY_ALLOW_INPUT_OVERRIDE` | Allow explicit pane ownership takeover | -| `COMMANDRELAY_MAX_INPUT_BYTES` | Max input payload bytes | +| `COMMANDRELAY_ALLOW_INPUT_OVERRIDE` | Allow/deny forced lane takeover | +| `COMMANDRELAY_MAX_INPUT_BYTES` | Max input payload size | | `COMMANDRELAY_MAX_MSG_PER_MIN` | Per-client message rate limit | | `COMMANDRELAY_MAX_INPUT_PER_MIN` | Per-client input rate limit | -| `COMMANDRELAY_STRICT_PROTOCOL_PARSING` | Strict envelope parsing toggle | -| `COMMANDRELAY_APP_STATIC_ENABLED` | Enable/disable static web app hosting | -| `COMMANDRELAY_APP_STATIC_DIR` | Static app root | +| `COMMANDRELAY_STRICT_PROTOCOL_PARSING` | Strict v1 envelope parsing | | `COMMANDRELAY_AUDIT_LOG` | Audit JSONL path | -| `HTTP_PROXY`/`HTTPS_PROXY`/`ALL_PROXY`/`NO_PROXY` | Outbound proxy settings | - -## Protocol and Behavior - -Primary protocol docs: - -1. `docs/protocol-v1.md` (normative contract) -2. `docs/protocol.md` (operator-facing summary) -3. `docs/security.md` (threat model + controls) - -`list_sessions` behavior in multi-backend mode: - -1. `payload.panes[]` include backend-aware pane ids. -2. `payload.sessions[]` are grouped by `(backendId, sessionName)` to avoid cross-backend session-name collisions. - -## Validation - -Use these for repeatable validation (not date-bound): - -```bash -npm run check:root -npm run test:root -npm run ci:all -``` - -Targeted protocol/runtime checks: - -```bash -node --import tsx --test src/protocol.conformance.test.ts -node --import tsx --test src/server/ws-contract-matrix.test.ts -node --import tsx --test src/server/bridge-server.policy.test.ts -``` - -## Project Structure - -```text -src/ - bridge/ replay + delta streaming engine - server/ ws/http gateway, policies, contract tests - runtime/ runtime mux + cmux adapter - tmux/ tmux adapter - net/ proxy routing and agent factory adapters -packages/ - cli-proxy/ - proxy-core/ - proxy-agent/ - proxy-fetch/ - proxy-http-client/ - proxy-undici/ -docs/ - protocol, security, operations, roadmap, proxy ecosystem -apps/ - ios/, android/, web/ -``` - -## Documentation Map - -1. `docs/README.md` - full docs index -2. `docs/getting-started.md` - setup and runbook -3. `docs/operations.md` - operations and runtime handling -4. `docs/roadmap-native.md` - iOS/Android/macos/web rollout -5. `docs/proxy-ecosystem-roadmap.md` - proxy package expansion + discovery/use strategy - -## Project Status - -The core TypeScript gateway is implemented, tested, and production-oriented for the tmux/cmux runtime path. -Active work continues on: +## Docs -1. Native client parity and UX hardening. -2. Multi-runtime and control-lane reliability. -3. Externalized `@commandrelay` / `@termina` proxy package line, with P1 (`@termina/proxy-undici`, `@termina/cli-proxy`, `@termina/proxy-fetch`) implemented and validated. +1. `docs/protocol-v1.md` - normative wire contract +2. `docs/security.md` - controls and threat notes +3. `docs/operations.md` - deployment and operator runbook +4. `docs/roadmap-native.md` - web/native parity roadmap +5. `docs/proxy-ecosystem-roadmap.md` - proxy package track ## License diff --git a/docs/TODO.md b/docs/TODO.md index c5a06fa..93b68f3 100644 --- a/docs/TODO.md +++ b/docs/TODO.md @@ -1,297 +1,233 @@ -# CommandRelay Execution TODO (Native-First) +# CommandRelay Execution TODO (SSH-First + Proxy Hardening) -Last reviewed: 2026-02-26 -Owner scope: iOS first, Android second, web fallback last. - -## Gateway Runtime Baseline - -- [x] TypeScript gateway runtime on Node.js `>=22` (`tsx` entrypoint and `tsc --noEmit` checks). -- [x] WebSocket transport baseline via `ws`. -- [x] Proxy agent package baseline via `http-proxy-agent`, `https-proxy-agent`, `socks-proxy-agent`, and `pac-proxy-agent`. - -## Priority Order - -1. iOS Swift app (primary delivery target). -2. Android app (feature-parity follow-up). -3. Web fallback (minimum viable control surface). - -## Current Milestones - -## Controlled-Input Status Snapshot - -- [x] Gateway controlled-input runtime is implemented and test-covered (`enable_input`, `input`, `disable_input`, kill switch enforcement). -- [x] iOS controlled-input baseline is implemented (`enable_input`, `input`, `disable_input` wiring + UX safety gate); Mac runtime validation is pending. - -## M0 - Gateway and Mobile Contract Baseline (Target: 2026-03-13) - -- [x] Freeze mobile event contract (`auth`, `list_sessions`, `attach`, `output`, `input`, `ack`, `error`). -- [x] Define replay/ordering guarantees (`streamSeq`, reconnect with `lastSeq`). -- [x] Finalize read-only-by-default and explicit input-enable policy. -- [x] Add API conformance checks for protocol envelope and event types. -- [x] Publish v1 contract doc for native clients. -- [x] Add CI Node 22 gate for root/package typecheck + TAP test artifacts. -- [ ] Exit criteria met: -- [ ] iOS can consume mocked gateway events without schema drift for 7 days. -- [x] Command input path can be disabled globally and per-session. - -## M1 - iOS Alpha (Read-Only Streaming) (Target: 2026-04-03) - -- [x] Create Swift app shell (auth, session list, pane viewer). -- [x] Implement WebSocket connection + reconnect with backoff. -- [x] Implement pane attach, output render, replay resume. -- [ ] Add accessibility baseline (VoiceOver labels, dynamic type, focus order). -- [ ] Add telemetry for connect latency, reconnect count, stream lag. -- [ ] Exit criteria met: -- [ ] 30-minute stable stream under flaky network simulation. -- [ ] Crash-free rate >= 99% on TestFlight alpha cohort. - -## M2 - iOS Beta (Controlled Input) (Target: 2026-04-24) - -- [x] Implement explicit `enable_input` UX with clear risk gate. -- [x] Implement input send/ack path with timeout/error handling. -- [x] Add safeguards: per-command length limits, rate limit feedback, kill switch handling (`input_too_large` + `input_rate_limited` payload metadata, 2026-02-26). -- [ ] Add audit event surfacing for sent commands. -- [ ] Exit criteria met: -- [ ] Read-only mode remains default on every reconnect. -- [ ] Input commands are fully auditable by pane and timestamp. - -## M3 - iOS GA (Target: 2026-05-15) - -- [ ] Complete reliability pass (background/foreground, idle resume, token refresh). -- [ ] Complete App Store readiness (privacy manifest, permission copy, support docs). -- [ ] Produce on-call runbook for gateway + mobile incidents. -- [ ] Exit criteria met: -- [ ] 14 days beta with no Sev-1 mobile-to-gateway regression. -- [ ] Median command round-trip latency <= 250ms on Tailscale path. - -## M4 - Android Alpha/Beta (Target: 2026-06-12) - -- [ ] Port protocol client and session UX in Kotlin (read-only first). -- [ ] Add controlled input flow matching iOS safety model. -- [ ] Validate device/network matrix and background limits. -- [ ] Exit criteria met: -- [ ] Functional parity with iOS core flows (`list`, `attach`, `replay`, guarded `input`). - -## M5 - Web Fallback (Last) (Target: 2026-07-03) - -- [ ] Build minimal responsive web console for emergency access. -- [ ] Support auth, session list, pane attach, read-only stream. -- [ ] Add guarded input behind explicit enable flow. -- [ ] Keep web control lane on the same v1 envelope/event set as native clients (no web-only protocol fork). -- [ ] Implement lane conflict UX: block send on `input_lane_conflict`, show owner context, require explicit takeover action. -- [ ] Add takeover path using `override=true`/`takeOwnership=true` with clear operator confirmation. -- [ ] Add multi-tab tests for single-writer lane ownership, detach/disconnect release, and takeover behavior. -- [ ] Exit criteria met: -- [ ] Works on modern mobile browsers as fallback only. -- [ ] iOS + web lane handoff scenarios pass shared fixture suite without schema drift. - -## M6 - macOS Menu Bar + iOS/Web Parity Follow-Through (Target: 2026-07-24) - -- [x] Define macOS menu bar scope: quick connect, session pick, read-only attach, explicit input arm/disarm (`docs/macos-menu-bar-control-lane-spec.md`, completed 2026-02-25). -- [x] Specify menu bar lane-state indicators (`read-only`, `input-enabled`, `lane-conflict`, `kill-switch-blocked`) (`docs/macos-menu-bar-control-lane-spec.md`, completed 2026-02-25). -- [x] Reuse the same gateway client contract/events used by iOS/web (`hello`, `policy_update`, `ack`, `error`) in spec mapping (`docs/macos-menu-bar-control-lane-spec.md`, completed 2026-02-25). -- [ ] Build parity matrix covering iOS/web/menu bar for connect/auth/list/attach/replay/enable/disable/input/conflict/takeover (baseline iOS/web matrix: `docs/control-lane-parity-checklist.md`). -- [ ] Add remaining cross-client fixture cases: - - [x] iOS writer -> web takeover (`src/server/bridge-server.policy.test.ts`, completed 2026-02-25) - - [x] web writer -> iOS takeover (`src/server/bridge-server.policy.test.ts`, completed 2026-02-25) - - menu bar observer -> iOS writer handoff - - menu bar writer -> web takeover -- [ ] Exit criteria met: -- [ ] Menu bar flow can attach read-only and complete guarded input handoff without protocol drift. -- [ ] iOS/web/menu bar parity checklist is fully green in weekly checkpoint artifact. - -## Dependencies - -- [x] Stable gateway protocol and auth policy. -- [ ] Tailscale network path for low-friction private connectivity. -- [ ] Test environments: tmux session fixtures + replay test data. -- [ ] Apple/Google developer accounts and release pipelines. -- [ ] Observability stack (logs, metrics, crash reporting). - -## Top Risks and Mitigations - -- [ ] Risk: protocol churn blocks native velocity. -- [ ] Mitigation: lock v1 schema in M0; version all event changes. -- [ ] Risk: accidental destructive remote commands. -- [ ] Mitigation: default read-only, explicit enable input, kill switch, audit trail. -- [ ] Risk: mobile reconnect instability on poor networks. -- [ ] Mitigation: replay buffer tests, chaos simulation, backoff tuning. -- [ ] Risk: app store review delays. -- [ ] Mitigation: submit early TestFlight/Internal tracks and stage approvals. - -## Immediate Next Actions (Rolling) - -- [x] Create `v1` protocol contract tests in gateway repo. -- [x] Build iOS spike for WebSocket connect/list/attach/output, then extend with controlled-input baseline. -- [x] Define iOS screen map and navigation for three core flows. -- [x] Decide telemetry schema (connect time, replay time, input ack latency). -- [x] Implement weekly checkpoint workflow artifacts (`scripts/checkpoints/generate-weekly-checkpoint.sh` + template) and document tracking in `docs/roadmap-native.md`. -- [x] Wire the existing proxy stack into auth/pairing/telemetry outbound clients and add integration tests for `HTTP_PROXY`/`HTTPS_PROXY`/`ALL_PROXY`/`NO_PROXY`. -- [x] Draft macOS menu bar control-lane spec and state diagram (`docs/macos-menu-bar-control-lane-spec.md`, completed 2026-02-25). -- [x] Author iOS/web parity checklist for control-lane flows and map each item to an automated/manual test (`docs/control-lane-parity-checklist.md`, completed 2026-02-25). -- [x] Add two gateway fixture scenarios for lane conflict + explicit takeover (iOS writer -> web takeover, web writer -> iOS takeover) (`src/server/bridge-server.policy.test.ts`, completed 2026-02-25). -- [x] Document distilled capsule-to-brief wiring (`npm run capsule:build --` -> `npm run capsule:brief --`) in operations/docs/skill guidance (`docs/operations.md`, `docs/README.md`, `skills/termina-orchestrator/SKILL.md`, completed 2026-02-26). -- [x] Document distilled capsule-to-dispatch wiring (`npm run capsule:build --` -> `npm run capsule:brief --` -> `npm run capsule:dispatch --`) in operations/docs/skill guidance (`docs/operations.md`, `docs/README.md`, `skills/termina-orchestrator/SKILL.md`, completed 2026-02-26). - -## Weekly Cross-Platform Checkpoint Runbook - -- [x] Workflow script: `scripts/checkpoints/generate-weekly-checkpoint.sh` -- [x] Template: `scripts/checkpoints/templates/weekly-cross-platform-checkpoint.md` -- [x] Weekly command: - `scripts/checkpoints/generate-weekly-checkpoint.sh --date YYYY-MM-DD --facilitator "Owner Name"` -- [x] Weekly artifact to track in git: - `scripts/checkpoints/runs/YYYY-MM-DD-weekly-cross-platform-checkpoint.md` -- [ ] Post-sync tracking rule: - checkpoint is complete only after sign-off boxes are checked and milestone decisions are mirrored in `docs/roadmap-native.md` + `docs/TODO.md`. - -## Mac Validation Acceptance Checklist - -- [ ] Node.js runtime is `v22.x` on Mac validation machine. -- [ ] `npm run check` passes. -- [ ] `node --import tsx --test src/protocol.conformance.test.ts` passes. -- [ ] `node --import tsx --test src/server/ws-contract-matrix.test.ts src/server/bridge-server.policy.test.ts src/server/input-policy.test.ts` passes. -- [ ] `node --import tsx --test src/server/startup-validation.test.ts` passes. -- [ ] `node --import tsx --test src/control-plane/control-plane-client.test.ts src/net/proxy-agent-factory.test.ts src/net/proxy-router.test.ts` passes. -- [ ] `cd apps/ios/M0ProtocolMockClient && swift test` passes. -- [ ] `npm run test:ci:all` passes on Mac. -- [ ] Live smoke (kill switch off) passes: `COMMANDRELAY_INPUT_KILL_SWITCH=off npm run start` + `npm run bench:input -- --iterations 5`. -- [ ] Live smoke (kill switch on) blocks input: `COMMANDRELAY_INPUT_KILL_SWITCH=on npm run start` + `npm run bench:input -- --iterations 3` fails with input-disabled behavior. -- [ ] Nightly evidence captured: TAP artifacts + swift test log + short smoke summary. - -## Home Pickup TODO - -- [ ] Copy MCP template and set absolute paths: `cp mcp.example.json .mcp.json` then edit paths. -- [ ] Verify Chitragupta MCP starts with the workaround command from `docs/operations.md`. -- [ ] Run full gateway test pack: `npm run check && npm test && npm run test:ci:all`. -- [ ] Run replay-focused suites: - - `node --import tsx --test src/bridge/bridge-engine.replay.test.ts` - - `node --import tsx --test src/server/bridge-server.replay.e2e.test.ts` -- [ ] Run iOS transport tests: - - `cd apps/ios/M0ProtocolMockClient && swift test --filter M0WebSocketTransportClientTests` - - `swift test` -- [ ] Run Android parity module tests (requires Gradle wrapper or local Gradle): - - `cd apps/android/M0ProtocolMockClient && ./gradlew test` (or `gradle test`) -- [ ] Run tmux fixture harness smoke: - - `scripts/tmux-fixtures/create-fixture.sh --session fixture_smoke --panes 2` - - `scripts/tmux-fixtures/emit-fixture-output.sh --session fixture_smoke --profile replay --cycles 5` - - `scripts/tmux-fixtures/teardown-fixture.sh --session fixture_smoke` -- [ ] Run perf smoke benchmarks: - - `npm run bench:connect -- --iterations 20` - - `npm run bench:list -- --iterations 20` - - `npm run bench:input -- --iterations 20` -- [ ] If all green, open/update PR notes with: - - replay coverage results - - iOS/Android local test results - - perf summary (`p50/p95/p99`) - -## Home-Mac Continuation Checklist - -### Session Bootstrap (single owner) - -- [ ] Confirm Node.js `v22.x` and clean install: `node -v && npm ci`. -- [ ] Confirm local MCP wiring and Chitragupta launch command from `docs/operations.md`. -- [ ] Start evidence log for this run (tests, perf, publish dry-run outputs). - -### Parallel Track A: tmux Engine Follow-up - -- [ ] Run replay/ordering suites: - - `node --import tsx --test src/bridge/bridge-engine.replay.test.ts` - - `node --import tsx --test src/server/bridge-server.replay.e2e.test.ts` -- [ ] Run tmux fixture harness: - - `scripts/tmux-fixtures/create-fixture.sh --session fixture_smoke --panes 2` - - `scripts/tmux-fixtures/emit-fixture-output.sh --session fixture_smoke --profile replay --cycles 5` - - `scripts/tmux-fixtures/teardown-fixture.sh --session fixture_smoke` -- [ ] Run perf smoke (`connect`, `list`, `input`) with `--iterations 20`; record `p50/p95/p99`. -- [ ] Mark tmux track complete only when replay + fixture + perf evidence is captured. - -### Parallel Track B: iOS Follow-up - -- [ ] `cd apps/ios/M0ProtocolMockClient && swift test --filter M0WebSocketTransportClientTests`. -- [ ] `cd apps/ios/M0ProtocolMockClient && swift test`. -- [ ] Validate controlled-input safety behavior against gateway kill-switch on/off runs. -- [ ] Capture iOS evidence summary (pass/fail, flaky tests, retry count). - -### Merge Gate (both tracks) - -- [ ] Run full aggregate check: `npm run check && npm test && npm run test:ci:all`. -- [ ] Update `scripts/checkpoints/runs/2026-02-25-weekly-cross-platform-checkpoint.md` with outcomes. -- [ ] Update proxy release gate status in `docs/release/proxy-publish.md`. - -## Proxy Package Release Gates (for internal v0.1 prep) - -- [x] Gate 1: version readiness confirmed for each `@commandrelay/proxy-*` package (`@commandrelay/proxy-core@0.1.0`, `@commandrelay/proxy-agent@0.1.0`, `@commandrelay/proxy-http-client@0.1.0`). -- [ ] Gate 2: root/package `check`, `build`, `test` all green on Mac run. - - Batch evidence (2026-02-25): TAP green in current environment (`root 14/14`, `proxy-core 1/1`, `proxy-agent 2/2`, `proxy-http-client 1/1`). -- [ ] Gate 3: publish workflow dry-run green with expected package selector and `dist_tag`. - - Home Mac action: run `Publish Proxy Packages` with `mode=dry-run`, `package_selector=@commandrelay/proxy-*`, `dist_tag=latest`. -- [ ] Gate 4: `NPM_TOKEN` + `npm-publish` environment policy verified. - - Home Mac action: verify `NPM_TOKEN` secret, `npm-publish` reviewers, and default-branch restrictions. -- [ ] Gate 5: release notes/changelog draft reviewed before any publish-mode trigger. - - Home Mac action: append dry-run URL + artifact summary + go/no-go note. - -## Internal v0.1 Tag Plan (proposal only; do not create tags yet) - -- [ ] Step 1: complete tmux + iOS follow-up tracks and evidence capture. -- [ ] Step 2: run proxy publish dry-run gate review and resolve blockers. -- [ ] Step 3: freeze internal v0.1 candidate scope and finalize release notes draft. -- [ ] Step 4: run final go/no-go check (tests, perf, release gates, checkpoint sign-off). -- [ ] Step 5: if all gates stay green, prepare internal `v0.1` tag request in PR/release notes (no tag creation in this step). - -## Proxy Ecosystem Expansion Backlog - -Reference roadmap: `docs/proxy-ecosystem-roadmap.md`. - -- [x] Harden current package line for external use (`@commandrelay/proxy-core`, `@commandrelay/proxy-agent`, `@commandrelay/proxy-http-client`) with reusable docs/assets/examples. -- [ ] Publish/validate adapter ecosystem package plan and naming contract (`@termina/proxy-*`). -- [ ] P1 package wave (active): - - [x] `@termina/cli-proxy` (`packages/cli-proxy`, diagnostics CLI + JSON/human modes + tests/docs/assets completed on 2026-02-26) - - [x] `@termina/proxy-undici` (`packages/proxy-undici`, check/build/test + docs/assets/examples complete on 2026-02-26) - - [x] `@termina/proxy-fetch` (`packages/proxy-fetch`, fetch adapter + JSON/timeout/size guards + tests/docs/assets completed on 2026-02-26) -- [x] P1 exit criteria: `@termina/cli-proxy` + `@termina/proxy-fetch` both pass check/build/test and include README + NOTES + SVG branding assets. -- [ ] P2 package wave: +Last reviewed: 2026-02-27 +Primary strategy: SSH-first transport, tmux-first runtime, remote state owned by host. + +## Vision Reset (SSH-First) + +- CommandRelay is a host-adjacent remote operations product, not a mobile-first product. +- SSH is the default control/data transport for production use; WebSocket remains for compatibility and controlled environments. +- `tmux` on the remote host is the source of truth for live session state. +- Host runtime owns replay/order state (`streamSeq`, `lastSeq`, lane ownership, audit trail); clients only render and request actions. +- iOS, Android, web, and macOS menu bar are thin clients over one contract; no client-specific protocol forks. +- Safety posture remains strict: read-only default, explicit input enable, lane conflict controls, global kill switch, and auditable input history. + +## Two-Track Plan (Run in Parallel) + +- Track A ships the SSH-first CommandRelay product baseline. +- Track B hardens and productizes the `proxy-*` ecosystem used by CommandRelay and external consumers. +- Merge/release gate: no Track A GA candidate without Track B publish/process hardening at release-ready status. + +## Track A: SSH-First CommandRelay Product + +### A1) Transport + +- [ ] Finalize SSH transport contract for connect/auth/list/attach/replay/input/ack/error with explicit reconnect semantics. +- [ ] Specify host identity + trust model (host key verification mode, fingerprint surfacing, rotation handling). +- [ ] Lock protocol compatibility matrix for SSH transport and existing WebSocket transport. +- [ ] Add transport conformance tests covering: + - happy path attach + replay resume + - reconnect with `lastSeq` + - lane conflict + explicit takeover + - kill-switch enforcement on active lane + +### A2) Runtime (Host-State Ownership) + +- [ ] Make host runtime authoritative for session metadata, lane owner, replay offsets, and capability flags. +- [ ] Validate tmux fixture harness for deterministic replay and multi-pane ordering. +- [ ] Add startup validation profile for remote host environments (Node runtime, tmux availability, permissions, env policy). +- [ ] Ensure runtime failure modes are explicit and recoverable (auth reject, transport drop, tmux session loss, stale lane owner). + +### A3) UX (Thin Clients) + +- [ ] Keep one interaction model across iOS/web/macOS menu bar: + - read-only attach by default + - explicit input arm/disarm + - lane conflict message with owner context + - explicit takeover confirmation +- [ ] Complete parity checklist for connect/auth/list/attach/replay/enable/disable/input/conflict/takeover. +- [ ] Finish accessibility baseline in active clients (labels, focus order, dynamic type, keyboard paths where applicable). + +### A4) Safety + +- [x] Controlled-input baseline exists (`enable_input`, `input`, `disable_input`, kill switch, size/rate limit payload metadata). +- [ ] Add host-side input audit log record (actor, pane, command hash/preview policy, timestamp, result). +- [ ] Add policy tests for default read-only on reconnect, lane lease expiry, and takeover audit event. +- [ ] Add operator safety runbook for kill-switch and lane lockout incidents. + +### A5) Observability + +- [ ] Define metrics contract and dashboards: + - connect latency + - replay lag + - reconnect count + - input ack latency + - lane conflict frequency + - kill-switch blocks +- [ ] Add structured logs for lifecycle events (connect, attach, replay resume, input enabled/disabled, takeover, policy reject). +- [ ] Set minimum evidence pack for weekly checkpoint artifacts. + +### A6) Release Criteria (Track A) + +- [ ] 7-day stability window with no Sev-1 SSH transport regressions in checkpoint evidence. +- [ ] 30-minute flaky-network stream test passes with replay correctness. +- [ ] Controlled input remains opt-in on every reconnect path. +- [ ] Full parity checklist is green across active clients. +- [ ] On-call incident/runbook document is complete and reviewed. + +## Track B: `proxy-*` Ecosystem Hardening and Productization + +### B1) Current Package Line Hardening + +- [x] Baseline line exists and is in active use: + - `@commandrelay/proxy-core` + - `@commandrelay/proxy-agent` + - `@commandrelay/proxy-http-client` +- [ ] Complete API stability review and public surface lock for v0.1. +- [ ] Add negative tests for malformed proxy URLs, auth variants, NO_PROXY edge cases, PAC failures, and fallback behavior. +- [ ] Add interoperability matrix validation (Node fetch/undici/http(s), env var permutations, proxy chaining expectations). + +### B2) Productization Readiness + +- [ ] Complete docs pack per package: README usage matrix, NOTES, migration/compat notes, troubleshooting. +- [ ] Add runnable examples with expected output snapshots. +- [ ] Ensure CI gates are explicit and reproducible (`check`, `build`, `test`) at root and per package. +- [ ] Confirm publish workflow dry-run path with selector and dist-tag policy. +- [ ] Validate npm publish governance (`NPM_TOKEN`, `npm-publish` environment reviewers, branch protections). + +### B3) Parallel Ecosystem Wave + +- [x] P1 completed: + - `@termina/cli-proxy` + - `@termina/proxy-undici` + - `@termina/proxy-fetch` +- [ ] P2 hardening wave (parallelizable): - `@termina/proxy-axios` - `@termina/proxy-got` - `@termina/proxy-runtime` -- [ ] P3 exploration: - - `@termina/proxy-ssh` (`ssh-proxy`) feasibility and threat model. -- [ ] External compatibility checks against ecosystem dependencies/counterparts: - - `agent-base` - - `data-uri-to-buffer` - - `degenerator` - - `get-uri` - - `http-proxy-agent` - - `https-proxy-agent` - - `pac-proxy-agent` - - `pac-resolver` - - `proxy-agent` - - `proxy` - - `socks-proxy-agent` - -## Research-Backed Next Wave (Home Pickup) - -Reference notes: `docs/research-next-opportunities.md`. - -- [ ] Cross-platform command safety contract: - - shared `input` timeout/retry semantics for iOS, Android, macOS, and web fallback - - shared telemetry keys for `enable_input` -> `input` -> `ack/error` - - deterministic kill-switch and lane-conflict behavior across clients -- [ ] Multi-session UX + handoff model: - - session switch rules while preserving read-only default - - explicit takeover UX with owner visibility and confirmation - - per-pane activity/audit indicators in native clients -- [ ] Reliability + SLO matrix: - - reconnect success target, command RTT target, replay catch-up target - - failover behavior when current writer disconnects mid-command - - weekly checkpoint artifact includes SLO trend deltas -- [ ] Proxy family hardening gates (pre external publish): - - mandatory benchmark budgets (latency, throughput, memory/socket growth) - - dependency/license/SBOM/vulnerability gate in release flow - - interoperability matrix for `fetch`, `undici`, `axios`, `got`, and CLI adapters -- [ ] Advanced transport exploration (feature-flagged): - - evaluate QUIC/WebTransport lane for degraded-network resilience - - compare against current WebSocket lane with controlled benchmark harness - - adopt only if reliability and operability improve without security regression -- [ ] Remote-control trust model upgrades: - - short-lived pairing via QR + signed challenge response - - step-up confirmation for risky command classes - - immutable command audit stream export for incident review +- [ ] P3 exploration gate: + - `@termina/proxy-ssh` feasibility + threat model (explicit go/no-go decision doc) + +### B4) Release Criteria (Track B) + +- [ ] Gate 1: version and changelog readiness confirmed for all release candidates. +- [ ] Gate 2: `check/build/test` green on designated Mac validation environment. +- [ ] Gate 3: publish dry-run green with expected selector + dist-tag. +- [ ] Gate 4: release notes and rollback notes approved before publish-mode is allowed. +- [ ] Gate 5: support/troubleshooting docs linked from package READMEs. + +## Next 2-4 Week Milestones (Execution-Ready) + +### Milestone W1 (2026-03-02 to 2026-03-08) + +- Track A goals: + - freeze SSH transport contract + compatibility matrix + - complete host-state authority spec for lane/replay ownership + - add first-pass SSH conformance tests +- Track B goals: + - finish proxy API surface audit for v0.1 candidates + - close negative-test gaps for proxy env parsing/fallback +- Acceptance criteria: + - [ ] SSH contract/spec document merged and referenced by tests. + - [ ] At least one automated suite exercises SSH reconnect with `lastSeq` replay. + - [ ] Proxy test report includes malformed URL + NO_PROXY + PAC failure cases with expected results. + +### Milestone W2 (2026-03-09 to 2026-03-15) + +- Track A goals: + - implement host audit log events and lane/takeover policy assertions + - complete tmux fixture replay ordering validation + - advance parity checklist coverage +- Track B goals: + - complete docs/examples pack for `@commandrelay/proxy-*` + - execute publish workflow dry-run and archive artifacts +- Acceptance criteria: + - [ ] Audit log records are emitted for enable/disable/input/takeover flows. + - [ ] Replay ordering suite passes under fixture harness without manual intervention. + - [ ] Dry-run artifacts contain selected package set, dist-tag, and no publish-policy blockers. + +### Milestone W3 (2026-03-16 to 2026-03-22) + +- Track A goals: + - finalize observability metrics + dashboard baseline + - run 30-minute flaky-network soak for stream/replay behavior + - close critical UX parity gaps (conflict + takeover messaging) +- Track B goals: + - complete P2 package scaffolds and core adapter conformance tests + - resolve docs/troubleshooting gaps found in dry-run review +- Acceptance criteria: + - [ ] Weekly checkpoint includes metrics export and soak summary. + - [ ] No Sev-1/Sev-2 unresolved bugs in lane safety path. + - [ ] P2 packages have passing base check/build/test and minimal docs skeleton. + +### Milestone W4 (2026-03-23 to 2026-03-29) + +- Track A goals: + - run release-candidate gate review for SSH-first baseline + - publish incident/runbook docs for transport and safety operations +- Track B goals: + - run final pre-release gate review for proxy line + - prepare release notes + rollback notes for v0.1 internal release decision +- Acceptance criteria: + - [ ] Track A release criteria checklist is fully green or has explicit blocker list with owners. + - [ ] Track B gates 1-5 reviewed with evidence links and go/no-go status. + - [ ] Combined checkpoint artifact documents cross-track dependencies and release decision. + +## Prioritized Immediate Actions (Do Next) + +### P0 (Today -> next 48h) + +- [ ] Convert this TODO into owned tickets with single owners and explicit file scope. +- [ ] Land SSH transport contract doc + test plan references. +- [ ] Start host-state authority implementation plan (lane owner + replay offsets + audit schema). +- [ ] Execute proxy negative-test expansion for malformed env/config inputs. + +### P1 (This week) + +- [ ] Run and archive core validation suites: + - `npm run check` + - `npm test` + - `npm run test:ci:all` + - `node --import tsx --test src/server/ws-contract-matrix.test.ts src/server/bridge-server.policy.test.ts src/server/input-policy.test.ts` + - `node --import tsx --test src/control-plane/control-plane-client.test.ts src/net/proxy-agent-factory.test.ts src/net/proxy-router.test.ts` +- [ ] Update weekly checkpoint artifact and mirror milestone decisions into roadmap docs. +- [ ] Run publish dry-run for `@commandrelay/proxy-*` and capture artifact links. + +### P2 (Next 2 weeks) + +- [ ] Complete remaining parity matrix items including menu bar handoff cases. +- [ ] Complete observability dashboard baseline and alert thresholds. +- [ ] Finalize release-notes template for combined SSH-first + proxy hardening increment. + +## Retained Context (Still Relevant) + +### Completed Baselines + +- [x] TypeScript gateway runtime on Node.js `>=22` (`tsx` entrypoint and `tsc --noEmit` checks). +- [x] WebSocket baseline via `ws` remains available as compatibility transport. +- [x] Proxy agent baseline via `http-proxy-agent`, `https-proxy-agent`, `socks-proxy-agent`, `pac-proxy-agent`. +- [x] Controlled-input runtime baseline is implemented and test-covered. +- [x] iOS controlled-input baseline exists with safety gate wiring. +- [x] Weekly checkpoint workflow script/template exists. +- [x] Distilled capsule build/brief/dispatch wiring is documented. + +### Open Dependencies and Risks + +- [ ] Stable private network path for low-friction host connectivity (Tailscale or equivalent). +- [ ] Test environments: tmux fixtures + replay data kept fresh. +- [ ] Observability stack completion (logs, metrics, crash reporting). +- [ ] Risk: transport/protocol churn slows clients. +- [ ] Mitigation: versioned contract + conformance suite as release gate. +- [ ] Risk: accidental destructive commands. +- [ ] Mitigation: read-only default, explicit input enable, kill switch, audit trail. +- [ ] Risk: reconnect instability under poor networks. +- [ ] Mitigation: replay chaos tests + soak checkpoints + backoff tuning. + +## Key References + +- `docs/macos-menu-bar-control-lane-spec.md` +- `docs/control-lane-parity-checklist.md` +- `docs/proxy-ecosystem-roadmap.md` +- `docs/release/proxy-publish.md` +- `scripts/checkpoints/generate-weekly-checkpoint.sh` +- `scripts/checkpoints/templates/weekly-cross-platform-checkpoint.md` diff --git a/docs/brand/commandrelay-logo.svg b/docs/brand/commandrelay-logo.svg index b27ca46..f8595e7 100644 --- a/docs/brand/commandrelay-logo.svg +++ b/docs/brand/commandrelay-logo.svg @@ -1,5 +1,5 @@ - CommandRelay Termina Logo + CommandRelay Terminal Logo Terminal prompt inside a rounded badge with bidirectional relay arrows. @@ -10,6 +10,11 @@ + + + + + From 3229695b5cd5faeed149011b7efefa0ad49f54c3 Mon Sep 17 00:00:00 2001 From: Srinivas Pendela Date: Fri, 27 Feb 2026 12:23:04 +0100 Subject: [PATCH 02/23] feat(ssh): add transport contract, tunnel runbook, and state machine --- docs/README.md | 4 +- docs/adr/ADR-001-ssh-first-transport.md | 44 +++++ docs/operations.md | 19 ++ docs/ssh-transport-contract.md | 52 ++++++ scripts/ssh/README.md | 69 +++++++ scripts/ssh/open-tunnel.sh | 232 ++++++++++++++++++++++++ src/ssh/session-state-machine.test.ts | 138 ++++++++++++++ src/ssh/session-state-machine.ts | 213 ++++++++++++++++++++++ 8 files changed, 770 insertions(+), 1 deletion(-) create mode 100644 docs/adr/ADR-001-ssh-first-transport.md create mode 100644 docs/ssh-transport-contract.md create mode 100644 scripts/ssh/README.md create mode 100644 scripts/ssh/open-tunnel.sh create mode 100644 src/ssh/session-state-machine.test.ts create mode 100644 src/ssh/session-state-machine.ts diff --git a/docs/README.md b/docs/README.md index 3541e12..f9f5309 100644 --- a/docs/README.md +++ b/docs/README.md @@ -17,7 +17,7 @@ This folder contains project documentation for users, contributors, and operator ## Runtime Snapshot 1. Gateway runtime: TypeScript on Node.js `>=22` (`tsx` execution, `tsc --noEmit` checks). -2. Gateway transport package: `ws`. +2. SSH-first transport direction with WebSocket compatibility path (see `ssh-transport-contract.md` and ADR-001). 3. Outbound proxy package set: `http-proxy-agent`, `https-proxy-agent`, `socks-proxy-agent`, `pac-proxy-agent`. 4. Client ecosystem direction: iOS (Swift) first, Android (Kotlin) second, web fallback last. @@ -62,3 +62,5 @@ Local MCP note: 13. [research-next-opportunities.md](research-next-opportunities.md) 14. [TODO.md](TODO.md) 15. [release/proxy-publish.md](release/proxy-publish.md) +16. [ssh-transport-contract.md](ssh-transport-contract.md) +17. [adr/ADR-001-ssh-first-transport.md](adr/ADR-001-ssh-first-transport.md) diff --git a/docs/adr/ADR-001-ssh-first-transport.md b/docs/adr/ADR-001-ssh-first-transport.md new file mode 100644 index 0000000..3e2026d --- /dev/null +++ b/docs/adr/ADR-001-ssh-first-transport.md @@ -0,0 +1,44 @@ +# ADR-001: SSH-First Transport + +## Status +Accepted + +## Date +2026-02-27 + +## Context +- CommandRelay needs resilient remote control across disconnects and client restarts. +- Session durability depends on `tmux` reattach/replay semantics. +- A WebSocket-first default increases exposed network surface and deployment variance. +- SSH is already standard in operator environments with mature auth and host trust controls. + +## Decision +- Set SSH as the default transport for CommandRelay. +- Keep `tmux` as the persistence layer, independent of transport lifecycle. +- Keep WebSocket available as explicit opt-in fallback for constrained environments. + +## Rationale +- SSH reuses proven security controls (keys, host verification, access policy). +- Default exposure is narrower than opening a WebSocket endpoint by default. +- SSH attach/detach behavior aligns with `tmux` session continuity goals. +- Operators can use existing tooling and runbooks instead of introducing new transport infrastructure. + +## Tradeoffs +- SSH provisioning and key management are required. +- Browser-only clients need a bridge or gateway for SSH workflows. +- Some environments may see slower reconnect UX than long-lived WebSocket connections. + +## Consequences +- Product UX must treat SSH auth and host trust failures as first-class paths. +- Test coverage must include host-key checks, auth failures, reconnect, and replay. +- Docs and operational playbooks must position WebSocket as exception flow, not baseline. + +## Alternatives considered +- WebSocket-first default: rejected due to larger exposed surface and proxy/TLS drift risk. +- Custom transport protocol: rejected due to long-term maintenance burden. +- Dual equal default (SSH + WebSocket): rejected to avoid ambiguous guidance and split testing focus. + +## Rollout/rollback triggers +- Roll out once SSH path is feature-complete for attach, replay, input guardrails, and failure handling. +- Keep WebSocket fallback during migration for controlled cases. +- Trigger rollback review if SSH path shows sustained reliability or security regressions that cannot be mitigated operationally. diff --git a/docs/operations.md b/docs/operations.md index 01c6b53..a2d0e8e 100644 --- a/docs/operations.md +++ b/docs/operations.md @@ -32,6 +32,25 @@ Startup availability behavior: 3. Startup fails only when every configured backend is unavailable and runtime mode is not tmux-only. 4. tmux-only startup behavior is unchanged. +## SSH-First Tunnel Runbook + +Use the local helper at [`scripts/ssh/open-tunnel.sh`](../scripts/ssh/open-tunnel.sh) to reach a remote CommandRelay instance over SSH before exposing any direct network listener. + +Quick start (macOS/Linux): + +```bash +cd /mnt/c/sriinnu/personal/Kaala-brahma/terminal +./scripts/ssh/open-tunnel.sh --target +``` + +Tunnel defaults: + +1. Local endpoint: `127.0.0.1:8787` +2. Remote endpoint: `127.0.0.1:8787` +3. Local URLs: `http://127.0.0.1:8787` and `ws://127.0.0.1:8787/ws` + +For extra examples and option details, see [`scripts/ssh/README.md`](../scripts/ssh/README.md). + ## Web App Runtime Surface and Checks Implemented gateway routes: diff --git a/docs/ssh-transport-contract.md b/docs/ssh-transport-contract.md new file mode 100644 index 0000000..c4bccc8 --- /dev/null +++ b/docs/ssh-transport-contract.md @@ -0,0 +1,52 @@ +## SSH Transport Contract (CommandRelay) + +Status: Draft baseline for SSH-first track. + +This contract defines runtime and client behavior when CommandRelay is operated over SSH transport. + +## Transport assumptions +1. SSH is the only transport from bridge to backend; tmux sockets are never exposed directly. +2. Connection targets are resolved from server-side backend profiles (`host`, `port`, `user`, auth mode). +3. SSH execution is non-interactive and bounded by connection and command timeouts. +4. Transport carries terminal stream data and control messages only; file transfer is out of scope. +5. Host key verification policy is explicit per environment, with strict checking in production. + +## Session and runtime model (tmux persistence) +1. tmux is long-lived and survives bridge/client disconnects. +2. Bridge attaches and detaches from existing tmux sessions without terminating the shell process tree. +3. Session identity is stable across reconnects and unique per backend runtime. +4. Pane attachment is explicit; one client attachment context maps to one active pane target. +5. Runtime tracks output sequence state to support bounded replay after reconnect. + +## Safety controls +1. Default mode is read-only; write input requires explicit per-client enablement. +2. Auth is mandatory before non-auth operations when token mode is enabled. +3. Input ownership is exclusive per pane; conflicting writes are rejected unless override policy allows takeover. +4. Input path enforces max payload size and per-client rate limits before dispatch. +5. Audit logs capture auth results, attachment changes, write-lane toggles, and input attempts. +6. Production transport must not run SSH with verbose (`-v`) logging. +7. Termination paths must prefer graceful session close; avoid `kill -9` as normal control flow. + +## Failure and reconnect behavior +1. Failures are classified as transport, runtime, or policy errors and surfaced distinctly. +2. Transport reconnect uses bounded exponential backoff with jitter and retry limits. +3. Reattach requests include `lastSeq` so runtime can replay buffered output deterministically. +4. When full replay is unavailable, runtime sends a fresh snapshot and marks the sequence gap. +5. Reconnect never re-enables write mode automatically; clients must re-request write access. +6. Repeated reconnect exhaustion marks backend unavailable until explicit retry. + +## Client UX states +1. `connecting`: transport/auth bootstrap is in progress; controls are disabled. +2. `read_only`: output stream active, write lane disabled. +3. `write_enabled`: client owns pane write lane and input is accepted. +4. `conflict`: another client owns write lane; takeover is policy-gated. +5. `reconnecting`: transport dropped; replay/reattach pending. +6. `degraded`: attached but replay continuity is partial or unavailable. +7. `disconnected`: no active transport; explicit reconnect required. + +## Non-goals +1. Multi-writer collaborative command entry on the same pane. +2. Infinite or lossless replay guarantees beyond bounded in-memory history. +3. Built-in SSH key lifecycle management or secret vault behavior. +4. Shell intent parsing, semantic key translation, or command policy inference. +5. Zero-downtime guarantees across backend host restarts or tmux daemon crashes. diff --git a/scripts/ssh/README.md b/scripts/ssh/README.md new file mode 100644 index 0000000..893df39 --- /dev/null +++ b/scripts/ssh/README.md @@ -0,0 +1,69 @@ +# SSH Tunnel Runbook (CommandRelay) + +Use [`open-tunnel.sh`](./open-tunnel.sh) to open a local SSH tunnel to a remote CommandRelay instance without exposing remote ports publicly. + +## What this does + +1. Forwards local `127.0.0.1:` to remote `127.0.0.1:` through SSH. +2. Validates required arguments and port ranges before opening the tunnel. +3. Fails early when local port is already in use. +4. Prints only operational details (no auth tokens or secret values). + +## Requirements (macOS/Linux) + +1. `ssh` installed and available in `PATH`. +2. SSH access to the target host. +3. CommandRelay running on the remote host (default assumed on `127.0.0.1:8787`). + +## Quick start + +From repo root: + +```bash +./scripts/ssh/open-tunnel.sh --target +``` + +Then connect local clients to: + +1. `http://127.0.0.1:8787` +2. `ws://127.0.0.1:8787/ws` + +## Examples + +macOS/Linux default tunnel: + +```bash +./scripts/ssh/open-tunnel.sh --target dev@relay-host +``` + +Use a different local port: + +```bash +./scripts/ssh/open-tunnel.sh --target dev@relay-host --local-port 9878 +``` + +Use a specific SSH key: + +```bash +./scripts/ssh/open-tunnel.sh --target relay-prod --identity ~/.ssh/id_ed25519 +``` + +Through bastion/proxy jump: + +```bash +./scripts/ssh/open-tunnel.sh \ + --target relay-prod \ + --ssh-option ProxyJump=bastion-user@bastion-host +``` + +Preview command without opening the tunnel: + +```bash +./scripts/ssh/open-tunnel.sh --target dev@relay-host --dry-run +``` + +## Help + +```bash +./scripts/ssh/open-tunnel.sh --help +``` diff --git a/scripts/ssh/open-tunnel.sh b/scripts/ssh/open-tunnel.sh new file mode 100644 index 0000000..0c2f21c --- /dev/null +++ b/scripts/ssh/open-tunnel.sh @@ -0,0 +1,232 @@ +#!/usr/bin/env bash +set -euo pipefail + +readonly SCRIPT_NAME="$(basename "$0")" + +LOCAL_HOST="127.0.0.1" +LOCAL_PORT="8787" +REMOTE_HOST="127.0.0.1" +REMOTE_PORT="8787" +SSH_PORT="22" +SSH_TARGET="" +IDENTITY_FILE="" +DRY_RUN="false" +SSH_OPTIONS=() + +print_help() { + cat <<'EOF' +Open an SSH local tunnel for CommandRelay HTTP/WebSocket access. + +Usage: + open-tunnel.sh --target [options] + +Required: + -t, --target SSH target (user@host or SSH config host alias) + +Options: + --local-host Local bind host (default: 127.0.0.1) + -l, --local-port Local bind port (default: 8787) + --remote-host Remote forward host (default: 127.0.0.1) + -r, --remote-port Remote forward port (default: 8787) + -p, --ssh-port SSH server port (default: 22) + -i, --identity SSH identity key path + --ssh-option