termio: add zmx backend for persistent terminal sessions#2
termio: add zmx backend for persistent terminal sessions#2mdsakalu wants to merge 6 commits intomanaflow-ai:mainfrom
Conversation
Implement a new termio backend that connects to a zmx daemon session over a Unix domain socket instead of spawning a direct PTY subprocess. This enables terminal session persistence across app restarts. New files: - src/termio/Zmx.zig: Full zmx backend with IPC protocol, read thread, socket buffer, session creation, and all required termio methods Modified files: - src/termio/backend.zig: Add zmx variant to Kind, Config, Backend, ThreadData unions with all 8 dispatch methods - src/termio.zig: Register Zmx import - src/config/Config.zig: Add zmx-session and zmx-create config options - src/Surface.zig: Backend selection with exec fallback when zmx binary missing; fix childExitedAbnormally for zmx sessions - src/apprt/embedded.zig: Add zmx_session/zmx_create/zmx_mode to Surface.Options; auto-generate session names for inherited zmx mode - include/ghostty.h: Sync C API struct with new zmx fields Design: zmx mode is opt-in, missing zmx binary falls back to exec with warning, zmx mode inherits across splits/tabs with auto-generated session names, threadExit sends Detach (not Kill) to preserve sessions.
a1ce49a to
4eb974f
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new zmx termio backend and integrates it across config, public API, surface selection, threading/I/O, and messaging; implements IPC framing, socket/session management, read thread and write queuing, and exposes zmx lifecycle and diagnostics to the surface and termio abstractions. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Surface as Surface/Term
participant Backend as termio.Zmx
participant Daemon as zmx Daemon
participant RThread as ReadThread
Surface->>Backend: threadEnter(td, session)
Backend->>Backend: resolve socket dir, find zmx binary
Backend->>Backend: createSession / connect socket
Backend->>Daemon: connect Unix socket
Backend->>Daemon: send Init, send Resize
Backend->>RThread: spawn ReadThread
Surface->>Backend: queueWrite(data)
Backend->>Daemon: send Input frames
Daemon->>RThread: socket Output data
RThread->>Surface: dispatch termio.Output
Surface->>Backend: threadExit
Backend->>Daemon: send Detach / close
RThread->>Backend: shutdown and join
sequenceDiagram
autonumber
participant IO as Surface IO
participant Backend as termio.Zmx
participant Daemon as zmx Daemon
IO->>Backend: resize(grid, screen)
Backend->>Backend: update rows/cols
Backend->>Daemon: send Resize IPC
IO->>Backend: childExitedAbnormally(exit_code, runtime_ms)
Backend->>IO: render disconnection message (includes session info)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/Surface.zig (1)
632-698:⚠️ Potential issue | 🟠 MajorRestore cleanup before
Termio.inittakes ownership.This block now only
errdefers the mailbox. If anything fails after creatingtermio.Zmx/termio.Manual/termio.Execor the derived termio config, those resources leak on the init error path.♻️ Possible fix
- const io_backend: termio.Backend = if (zmx_backend) |zmx| + var io_backend: termio.Backend = if (zmx_backend) |zmx| .{ .zmx = zmx } else if (use_manual_io) manual_backend: { var io_manual = try termio.Manual.init(alloc, .{ .write_cb = manual_write_cb, .write_userdata = manual_write_userdata, @@ _ = &io_exec; break :exec_backend .{ .exec = io_exec }; }; + errdefer io_backend.deinit(); var io_mailbox = try termio.Mailbox.initSPSC(alloc); errdefer io_mailbox.deinit(alloc); + + var io_config = try termio.Termio.DerivedConfig.init(alloc, config); + errdefer io_config.deinit(); try termio.Termio.init(&self.io, alloc, .{ .size = size, .full_config = config, - .config = try termio.Termio.DerivedConfig.init(alloc, config), + .config = io_config, .backend = io_backend, .mailbox = io_mailbox, .renderer_state = &self.renderer_state, .renderer_wakeup = render_thread.wakeup, .renderer_mailbox = render_thread.mailbox,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Surface.zig` around lines 632 - 698, The code creates termio backends (termio.Zmx.init, termio.Manual.init, termio.Exec.init) and DerivedConfig via termio.Termio.DerivedConfig.init but only errdefer's the mailbox, so if termio.Termio.init fails those backend objects and the derived config will leak; add cleanup errdefer guards for any created backend (zmx_backend, io_manual, io_exec) and for the DerivedConfig result, then clear/reset those guards (or set the owned values to neutral/transfer ownership) immediately after successful Termio.init so Termio takes responsibility; locate the backend construction paths around io_backend and the DerivedConfig init used in Termio.init and ensure corresponding deinit/cleanup calls are invoked on error paths.
🧹 Nitpick comments (1)
src/config/Config.zig (1)
1541-1545: Add a smoke test for the new public config keys.These are new user-facing config entries, but there isn't a focused
Config.loadItertest here for the key names plus the default/override path. A tiny parser test would catch typo or wiring regressions early.Example test
test "parse zmx config" { const testing = std.testing; { var cfg = try Config.default(testing.allocator); defer cfg.deinit(); var it: TestIterator = .{ .data = &.{ "--zmx-session=session-1", } }; try cfg.loadIter(testing.allocator, &it); try cfg.finalize(); try testing.expectEqualStrings("session-1", cfg.@"zmx-session".?); try testing.expect(cfg.@"zmx-create"); } { var cfg = try Config.default(testing.allocator); defer cfg.deinit(); var it: TestIterator = .{ .data = &.{ "--zmx-session=session-1", "--zmx-create=false", } }; try cfg.loadIter(testing.allocator, &it); try cfg.finalize(); try testing.expectEqualStrings("session-1", cfg.@"zmx-session".?); try testing.expect(!cfg.@"zmx-create"); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config/Config.zig` around lines 1541 - 1545, Add a focused unit test that exercises Config.loadIter and Config.finalize to verify the new public keys @"zmx-session" and @"zmx-create" parse and default correctly: create Config.default(allocator) instances, feed a TestIterator with "--zmx-session=session-1" to ensure cfg.@"zmx-session" == "session-1" and cfg.@"zmx-create" defaults to true, then a second case with "--zmx-session=session-1" plus "--zmx-create=false" to ensure cfg.@"zmx-create" becomes false; deinit the configs and use testing.expect/testing.expectEqualStrings to assert values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/apprt/embedded.zig`:
- Around line 486-487: The default value for zmx_create in the surface config
causes ghostty_surface_config_new() to opt callers into creating a zmx session
even when they only pass zmx_session and diverges from C zero-init; change the
default for the field zmx_create to false in the config definition so
zero-initialized instances behave correctly and callers that explicitly want
creation must set config.zmx_create = true (see the zmx_create field and
ghostty_surface_config_new and zmx_session usage to locate the change).
- Around line 601-615: When opts.zmx_session is non-null but empty we must treat
it as unset so zmx_mode can auto-generate a session; change the branching around
opts.zmx_session/opts.zmx_mode so that after calling std.mem.sliceTo on
opts.zmx_session you only assign config.@"zmx-session" and config.@"zmx-create"
when session.len > 0, and if session.len == 0 and opts.zmx_mode is true fall
through to the existing auto-generate logic (reuse std.crypto.random.int,
config.arenaAlloc, and std.fmt.allocPrint to create the "cmux-{x}" session and
set config.@"zmx-session" and config.@"zmx-create" = true). Ensure no early
return or else-block prevents the zmx_mode branch when the provided session
string is empty.
In `@src/config/Config.zig`:
- Around line 1537-1541: Update the doc comment for the configuration field
@"zmx-session" (the doc block immediately above the declaration @"zmx-session":
?[]const u8 = null) to clarify that this is not an unconditional switch away
from spawning a shell: retain the existing description of connecting to a zmx
daemon and session persistence, and add a sentence stating that if the named zmx
session or daemon is unavailable the code will fall back to using exec (spawning
a local shell) and emit a warning to the user; ensure the wording is concise and
suitable for generated manual output.
In `@src/termio/Zmx.zig`:
- Around line 623-642: handleDisconnect currently synthesizes a .child_exited
with exit_code = 0 which causes Surface.childExited to treat unexpected zmx
disconnects as clean exits; instead add a dedicated mailbox variant (e.g.,
.child_disconnected or a .child_exited variant with an explicit .abnormal flag)
to the surface_mailbox message union, update disconnectMetadata/handleDisconnect
to push that new variant including runtime_ms and a reason sentinel, and
implement a new Surface.childDisconnected (or branch in Surface.childExited that
checks the new flag) to surface the abnormal disconnect path; update the other
similar call sites (around the other block noted at 727-736) to push the new
variant as well.
- Around line 229-250: The code currently probes the filesystem with
std.fs.accessAbsolute(socket_path) when self.create_if_missing is true, which
can succeed for a stale socket file; instead, after creating or seeing the path,
attempt a real TCP connect to verify readiness and only treat the socket as
ready on successful posix.connect; update the logic in the block around
create_if_missing (referencing create_if_missing, accessAbsolute, createSession,
posix.socket, posix.connect) to: 1) if accessAbsolute fails with FileNotFound
call createSession(socket_path) and then loop with a short backoff trying
posix.connect on the socket until it succeeds or a timeout/attempt limit is
reached, 2) if accessAbsolute succeeds, try posix.connect immediately and only
proceed when connect succeeds (otherwise retry similarly), and 3) remove the
blocking waitpid(pid, 0) readiness dependency and replace it with the
connect-based readiness check; apply the same connect-based readiness change to
the other similar block around lines 698-715.
- Around line 756-763: The test named "socket path resolution with ZMX_DIR"
never sets ZMX_DIR so it doesn't validate the precedence branch; update the code
so resolveSocketDir can be exercised deterministically: either refactor
resolveSocketDir to accept an environment-provider parameter (e.g., a function
or an env map) or add an optional zmxDir override argument, update all call
sites to pass the real env-provider, and then in the test call the new
resolveSocketDir variant with a synthetic environment that contains ZMX_DIR (or
set/unset the process env around the test and restore it). Reference
resolveSocketDir and the test "socket path resolution with ZMX_DIR" when making
the changes.
---
Outside diff comments:
In `@src/Surface.zig`:
- Around line 632-698: The code creates termio backends (termio.Zmx.init,
termio.Manual.init, termio.Exec.init) and DerivedConfig via
termio.Termio.DerivedConfig.init but only errdefer's the mailbox, so if
termio.Termio.init fails those backend objects and the derived config will leak;
add cleanup errdefer guards for any created backend (zmx_backend, io_manual,
io_exec) and for the DerivedConfig result, then clear/reset those guards (or set
the owned values to neutral/transfer ownership) immediately after successful
Termio.init so Termio takes responsibility; locate the backend construction
paths around io_backend and the DerivedConfig init used in Termio.init and
ensure corresponding deinit/cleanup calls are invoked on error paths.
---
Nitpick comments:
In `@src/config/Config.zig`:
- Around line 1541-1545: Add a focused unit test that exercises Config.loadIter
and Config.finalize to verify the new public keys @"zmx-session" and
@"zmx-create" parse and default correctly: create Config.default(allocator)
instances, feed a TestIterator with "--zmx-session=session-1" to ensure
cfg.@"zmx-session" == "session-1" and cfg.@"zmx-create" defaults to true, then a
second case with "--zmx-session=session-1" plus "--zmx-create=false" to ensure
cfg.@"zmx-create" becomes false; deinit the configs and use
testing.expect/testing.expectEqualStrings to assert values.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6f5c2d42-be8e-441f-ac18-00f33ebfd76a
📒 Files selected for processing (7)
include/ghostty.hsrc/Surface.zigsrc/apprt/embedded.zigsrc/config/Config.zigsrc/termio.zigsrc/termio/Zmx.zigsrc/termio/backend.zig
| /// Connect to a zmx daemon session instead of spawning a new shell process. | ||
| /// When set, the surface connects to the named zmx session over a Unix | ||
| /// domain socket. The zmx daemon owns the PTY and persists independently | ||
| /// of the surface, enabling session persistence across restarts. | ||
| @"zmx-session": ?[]const u8 = null, |
There was a problem hiding this comment.
Document the exec fallback here.
zmx-session is not an unconditional switch away from spawning a shell: the PR behavior falls back to exec with a warning when zmx is unavailable. Since these comments feed the generated manual, the current wording overpromises behavior.
Suggested wording
-/// Connect to a zmx daemon session instead of spawning a new shell process.
-/// When set, the surface connects to the named zmx session over a Unix
+/// Prefer connecting to a zmx daemon session instead of spawning a new shell
+/// process.
+/// When set, the surface connects to the named zmx session over a Unix
/// domain socket. The zmx daemon owns the PTY and persists independently
/// of the surface, enabling session persistence across restarts.
+/// If zmx is unavailable, Ghostty falls back to the exec backend and logs
+/// a warning.
@"zmx-session": ?[]const u8 = null,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Connect to a zmx daemon session instead of spawning a new shell process. | |
| /// When set, the surface connects to the named zmx session over a Unix | |
| /// domain socket. The zmx daemon owns the PTY and persists independently | |
| /// of the surface, enabling session persistence across restarts. | |
| @"zmx-session": ?[]const u8 = null, | |
| /// Prefer connecting to a zmx daemon session instead of spawning a new shell | |
| /// process. | |
| /// When set, the surface connects to the named zmx session over a Unix | |
| /// domain socket. The zmx daemon owns the PTY and persists independently | |
| /// of the surface, enabling session persistence across restarts. | |
| /// If zmx is unavailable, Ghostty falls back to the exec backend and logs | |
| /// a warning. | |
| @"zmx-session": ?[]const u8 = null, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/config/Config.zig` around lines 1537 - 1541, Update the doc comment for
the configuration field @"zmx-session" (the doc block immediately above the
declaration @"zmx-session": ?[]const u8 = null) to clarify that this is not an
unconditional switch away from spawning a shell: retain the existing description
of connecting to a zmx daemon and session persistence, and add a sentence
stating that if the named zmx session or daemon is unavailable the code will
fall back to using exec (spawning a local shell) and emit a warning to the user;
ensure the wording is concise and suitable for generated manual output.
src/termio/Zmx.zig
Outdated
| // Session creation if needed | ||
| if (self.create_if_missing) { | ||
| // Check if socket already exists | ||
| std.fs.accessAbsolute(socket_path, .{}) catch |err| switch (err) { | ||
| // Socket doesn't exist — create the session | ||
| error.FileNotFound => try self.createSession(socket_path), | ||
|
|
||
| // Other errors are real problems (permissions, bad path, etc.) | ||
| else => return err, | ||
| }; | ||
| } | ||
|
|
||
| // Connect to Unix domain socket | ||
| const sock = try posix.socket( | ||
| posix.AF.UNIX, | ||
| posix.SOCK.STREAM | posix.SOCK.CLOEXEC, | ||
| 0, | ||
| ); | ||
| errdefer posix.close(sock); | ||
|
|
||
| const addr = try std.net.Address.initUnix(socket_path); | ||
| try posix.connect(sock, &addr.any, addr.getOsSockLen()); |
There was a problem hiding this comment.
Probe the socket, not just the path, when zmx-create is enabled.
Both the pre-create fast path and the readiness loop use accessAbsolute(socket_path) as the success signal. A stale socket file, or one that exists before the daemon starts accepting connections, will pass those checks and then fail at the later connect(). The blocking waitpid(pid, 0) after the path appears also makes readiness depend on launcher lifetime again. Retry/return based on a successful connect() instead.
Also applies to: 698-715
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/termio/Zmx.zig` around lines 229 - 250, The code currently probes the
filesystem with std.fs.accessAbsolute(socket_path) when self.create_if_missing
is true, which can succeed for a stale socket file; instead, after creating or
seeing the path, attempt a real TCP connect to verify readiness and only treat
the socket as ready on successful posix.connect; update the logic in the block
around create_if_missing (referencing create_if_missing, accessAbsolute,
createSession, posix.socket, posix.connect) to: 1) if accessAbsolute fails with
FileNotFound call createSession(socket_path) and then loop with a short backoff
trying posix.connect on the socket until it succeeds or a timeout/attempt limit
is reached, 2) if accessAbsolute succeeds, try posix.connect immediately and
only proceed when connect succeeds (otherwise retry similarly), and 3) remove
the blocking waitpid(pid, 0) readiness dependency and replace it with the
connect-based readiness check; apply the same connect-based readiness change to
the other similar block around lines 698-715.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (2)
src/apprt/embedded.zig (1)
486-487:⚠️ Potential issue | 🟡 MinorDefault
zmx_createtofalsefor safer opt-in semantics.With
zmx_create: bool = true, callers who only setzmx_sessionare implicitly opted into session creation. This diverges from C zero-initialization behavior and could cause unexpected session creation. Consider defaulting tofalseso callers must explicitly enable creation.Suggested fix
- zmx_create: bool = true, + zmx_create: bool = false,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/apprt/embedded.zig` around lines 486 - 487, Change the default for the zmx_create field from true to false so callers are not implicitly opted into session creation; locate the zmx_create: bool = true initializer in the struct or options block in embedded.zig and set it to false (i.e., zmx_create: bool = false) so callers must explicitly enable session creation when using zmx_session.src/termio/Zmx.zig (1)
724-740:⚠️ Potential issue | 🟠 MajorReturn as soon as the socket is connectable.
Lines 735-739 still block on
waitpid(pid, 0)aftersocketReady(socket_path)succeeds. That couples attach readiness back to the lifetime ofzmx run <session>, so a launcher that stays alive a bit longer can still hang surface init even though the socket is already usable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/termio/Zmx.zig` around lines 724 - 740, The code currently does a blocking posix.waitpid(pid, 0) after socketReady(socket_path) succeeds, which can hang startup; replace that blocking wait with a non-blocking reap (posix.waitpid(pid, std.c.W.NOHANG)) or simply skip the blocking reap and return immediately when socketReady succeeds, ensuring reaped is set only if the non-blocking wait reports the child exited. Update the logic around reaped, waitpid, socketReady, and pid so the function returns as soon as socketReady(socket_path) is true and only uses WNOHANG to detect/reap the child without blocking.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Surface.zig`:
- Around line 1325-1341: childDisconnected currently falls back to the generic
self.childExitedAbnormally(info) which triggers the terminal fallback UI;
instead invoke the zmx-specific disconnect renderer implemented in
src/termio/Zmx.zig. Replace the call to childExitedAbnormally(info) with a call
into the Zmx disconnect handler (for example, call the Zmx module/method exposed
on your termio or zmx object such as self.termio.zmx.childDisconnected(info) or
Zmx.childDisconnected(self.termio.zmx, info)), preserving the existing error
logging pattern (catch |err| log.err(...)). Ensure you reference the Zmx
disconnect symbol from Zmx.zig and remove the generic childExitedAbnormally
branch for zmx backends.
In `@src/termio/Zmx.zig`:
- Around line 121-133: SocketBuffer.next currently treats any hdr.len as
trustable and returns null for both "need more bytes" and "invalid/oversized
frame", causing indefinite buffering; add a sane maximum frame size constant
(e.g., MAX_IPC_FRAME) and, inside fn next (SocketBuffer.next) after computing
total = hdr_size + hdr.len, check if total > MAX_IPC_FRAME (or hdr.len >
MAX_IPC_FRAME - hdr_size) and if so treat the frame as invalid: perform a hard
disconnect/cleanup on the offending socket buffer (call the existing
disconnect/close path on the SocketBuffer/owner) and return null or an explicit
error if the surrounding API supports it; ensure you reference IpcHeader/hdr.len
and update any tests to expect a disconnect on oversized frames.
- Around line 54-73: ipcSend and writeAll perform direct posix.write calls which
can partial-write or hit EWOULDBLOCK and bypass queued frames; replace all
direct uses (calls from resize(), threadExit(), and the functions themselves) so
outbound IPC is enqueued via the serialized writer path (use queueWrite) instead
of calling posix.write directly. Update ipcSend to build the IpcHeader and then
enqueue the header bytes and payload together (or enqueue a single contiguous
buffer) through queueWrite, and remove/stop using writeAll for socket writes;
ensure error propagation matches the existing API and that queueWrite handles
WouldBlock/retry semantics so framing and ordering are preserved.
- Around line 177-180: The current early check always returns error.ZmxNotFound
when findZmxBinary() fails, preventing socket probing and breaking attach-only
use; change the logic to only require the zmx binary when create_if_missing is
true (i.e., if (create_if_missing && !findZmxBinary()) return
error.ZmxNotFound), so that when create_if_missing is false the code still
proceeds to probe the Unix socket and allow attaching to an existing session
(ensure this change is applied where findZmxBinary() and create_if_missing are
used in this module).
---
Duplicate comments:
In `@src/apprt/embedded.zig`:
- Around line 486-487: Change the default for the zmx_create field from true to
false so callers are not implicitly opted into session creation; locate the
zmx_create: bool = true initializer in the struct or options block in
embedded.zig and set it to false (i.e., zmx_create: bool = false) so callers
must explicitly enable session creation when using zmx_session.
In `@src/termio/Zmx.zig`:
- Around line 724-740: The code currently does a blocking posix.waitpid(pid, 0)
after socketReady(socket_path) succeeds, which can hang startup; replace that
blocking wait with a non-blocking reap (posix.waitpid(pid, std.c.W.NOHANG)) or
simply skip the blocking reap and return immediately when socketReady succeeds,
ensuring reaped is set only if the non-blocking wait reports the child exited.
Update the logic around reaped, waitpid, socketReady, and pid so the function
returns as soon as socketReady(socket_path) is true and only uses WNOHANG to
detect/reap the child without blocking.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 04c952e0-05da-4bb1-a2ab-4eb0a39bb402
📒 Files selected for processing (5)
src/Surface.zigsrc/apprt/embedded.zigsrc/apprt/surface.zigsrc/config/Config.zigsrc/termio/Zmx.zig
🚧 Files skipped from review as they are similar to previous changes (1)
- src/config/Config.zig
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/termio/Zmx.zig (2)
54-73:⚠️ Potential issue | 🔴 CriticalSerialize all outbound IPC through the queued writer path.
After Line 533 puts the shared socket into non-blocking mode,
ipcSend()/writeAll()can fail mid-frame withWouldBlock, and they also bypass earlier queuedInputwrites. That can corrupt framing or reorderResize/Detachrelative to user input.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/termio/Zmx.zig` around lines 54 - 73, ipcSend and writeAll perform direct blocking writes and can fail with WouldBlock after the socket is set non-blocking, bypassing the existing queued writer and breaking framing/order; change ipcSend to build the full frame (header_bytes + data) and enqueue that frame into the existing queued writer path instead of calling writeAll/posix.write directly so the writer code (which already handles partial writes and WouldBlock) serializes IPC frames with other Output/Input writes; remove or stop using writeAll for IPC sends and reference ipcSend and writeAll in your change so reviewers can verify the enqueueing.
121-133:⚠️ Potential issue | 🟠 MajorReject oversized IPC frames instead of buffering forever.
SocketBuffer.next()trustshdr.lencompletely. A stale or misbound socket can advertise an arbitrarily large payload, and this loop will keep reallocating until OOM because "need more bytes" and "invalid frame size" are both treated asnull.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/termio/Zmx.zig` around lines 121 - 133, SocketBuffer.next currently trusts hdr.len and keeps returning null for both "need more bytes" and "invalid huge length", allowing a malicious/stale socket to cause unbounded buffering; add an explicit validation after reading IpcHeader to reject absurd payload sizes: check hdr.len against a fixed sensible MAX_IPC_PAYLOAD constant and, if hdr.len > MAX_IPC_PAYLOAD (or hdr_size + hdr.len would overflow/self.buf bounds), treat it as a fatal/invalid frame (return an error variant or drop/reset the buffer and advance head past the header to resynchronize) instead of returning null; update SocketBuffer.next, referencing IpcHeader and hdr.len, to distinguish "need more bytes" (null) from "invalid frame size" (error/clear) so the loop cannot allocate forever.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/termio/Zmx.zig`:
- Around line 293-311: threadExit currently nulls self.socket_fd without closing
it, leaking the socket; after joining zmx_td.read_thread and before setting
self.socket_fd = null, call posix.close on zmx_td.socket_fd (handle errors
similarly to other posix calls) and then set the field to null. Apply the same
pattern in ThreadData.deinit (and any cleanup in Zmx.deinit): explicitly close
zmx_td.socket_fd (or the ThreadData's socket_fd) via posix.close with
appropriate error handling before clearing the file descriptor field.
---
Duplicate comments:
In `@src/termio/Zmx.zig`:
- Around line 54-73: ipcSend and writeAll perform direct blocking writes and can
fail with WouldBlock after the socket is set non-blocking, bypassing the
existing queued writer and breaking framing/order; change ipcSend to build the
full frame (header_bytes + data) and enqueue that frame into the existing queued
writer path instead of calling writeAll/posix.write directly so the writer code
(which already handles partial writes and WouldBlock) serializes IPC frames with
other Output/Input writes; remove or stop using writeAll for IPC sends and
reference ipcSend and writeAll in your change so reviewers can verify the
enqueueing.
- Around line 121-133: SocketBuffer.next currently trusts hdr.len and keeps
returning null for both "need more bytes" and "invalid huge length", allowing a
malicious/stale socket to cause unbounded buffering; add an explicit validation
after reading IpcHeader to reject absurd payload sizes: check hdr.len against a
fixed sensible MAX_IPC_PAYLOAD constant and, if hdr.len > MAX_IPC_PAYLOAD (or
hdr_size + hdr.len would overflow/self.buf bounds), treat it as a fatal/invalid
frame (return an error variant or drop/reset the buffer and advance head past
the header to resynchronize) instead of returning null; update
SocketBuffer.next, referencing IpcHeader and hdr.len, to distinguish "need more
bytes" (null) from "invalid frame size" (error/clear) so the loop cannot
allocate forever.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d6dcf67a-ae18-4071-b181-43faaffa7bf4
📒 Files selected for processing (3)
src/Surface.zigsrc/apprt/embedded.zigsrc/termio/Zmx.zig
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/termio/Zmx.zig (3)
878-883: Test assertion can be simplified.The current assertion checks
addr != error.NameTooLong, but sinceinitUnixonly returns that one error, a simpler approach is to usetrywhich will fail the test on any error. This also makes the intent clearer.Suggested change
// A reasonable path should succeed const short_path = try std.fmt.allocPrint(alloc, "/tmp/zmx/test-session", .{}); defer alloc.free(short_path); - const addr = std.net.Address.initUnix(short_path); - try std.testing.expect(addr != error.NameTooLong); + _ = try std.net.Address.initUnix(short_path);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/termio/Zmx.zig` around lines 878 - 883, The test currently allocates short_path and calls std.net.Address.initUnix into addr then asserts addr != error.NameTooLong using std.testing.expect; simplify by removing the explicit comparison and directly using try on std.net.Address.initUnix so any error (including NameTooLong) will fail the test—update the call site where const addr = std.net.Address.initUnix(short_path) and the subsequent std.testing.expect to instead use try to propagate/fail on error and keep the alloc/free behavior for short_path.
776-779: Consider logging chdir failures.The
chdircall silently ignores errors. If the working directory is invalid, the session will start in an unexpected directory. At minimum, log a warning to aid debugging.Suggested change
// Child: set working directory if provided if (self.working_directory) |wd| { - std.posix.chdir(wd) catch {}; + std.posix.chdir(wd) catch |err| { + // Log to stderr since log infrastructure may not be available in forked child + std.debug.print("zmx: failed to chdir to {s}: {}\n", .{ wd, err }); + }; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/termio/Zmx.zig` around lines 776 - 779, The chdir call currently swallows errors (std.posix.chdir(wd) catch {}), so change it to capture the error and log a warning including the target path and error (e.g., replace the silent catch with handling that logs via std.debug.warn or the module's logger). Update the block that checks self.working_directory to call std.posix.chdir(wd) and on failure log a warning mentioning wd and the error value (or errno) so failures to change directory are visible for debugging.
85-85: Misleading error for zero-byte write.A
write()returning 0 bytes is unusual (POSIX only allows this when count is 0). If this branch executes, it likely indicates a connection issue rather than disk quota. Consider using a more appropriate error or removing this branch if it's dead code.Suggested change
- if (n == 0) return error.DiskQuota; + if (n == 0) return error.BrokenPipe;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/termio/Zmx.zig` at line 85, The check returning error.DiskQuota when a write() returns n == 0 is misleading; update the branch in Zmx.zig to either remove the n == 0 case (if unreachable) or return a more appropriate error like error.ConnectionClosed or error.WriteWouldBlock depending on the surrounding I/O semantics; specifically, replace the `if (n == 0) return error.DiskQuota;` handling (which references `n` and `error.DiskQuota`) with a clearer return (or remove it) so zero-byte returns from write() are not misreported as a disk quota failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/termio/Zmx.zig`:
- Around line 878-883: The test currently allocates short_path and calls
std.net.Address.initUnix into addr then asserts addr != error.NameTooLong using
std.testing.expect; simplify by removing the explicit comparison and directly
using try on std.net.Address.initUnix so any error (including NameTooLong) will
fail the test—update the call site where const addr =
std.net.Address.initUnix(short_path) and the subsequent std.testing.expect to
instead use try to propagate/fail on error and keep the alloc/free behavior for
short_path.
- Around line 776-779: The chdir call currently swallows errors
(std.posix.chdir(wd) catch {}), so change it to capture the error and log a
warning including the target path and error (e.g., replace the silent catch with
handling that logs via std.debug.warn or the module's logger). Update the block
that checks self.working_directory to call std.posix.chdir(wd) and on failure
log a warning mentioning wd and the error value (or errno) so failures to change
directory are visible for debugging.
- Line 85: The check returning error.DiskQuota when a write() returns n == 0 is
misleading; update the branch in Zmx.zig to either remove the n == 0 case (if
unreachable) or return a more appropriate error like error.ConnectionClosed or
error.WriteWouldBlock depending on the surrounding I/O semantics; specifically,
replace the `if (n == 0) return error.DiskQuota;` handling (which references `n`
and `error.DiskQuota`) with a clearer return (or remove it) so zero-byte returns
from write() are not misreported as a disk quota failure.
There was a problem hiding this comment.
1 issue found across 8 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/apprt/embedded.zig">
<violation number="1" location="src/apprt/embedded.zig:487">
P2: `zmx_create` defaults to false for embedded surfaces, but the config default is true. Since `Surface.init` always applies `opts.zmx_create` when `zmx_session` is set, embedded callers that only set `zmx_session` will disable auto-create and fail to start missing sessions. Default this to true (or otherwise preserve the config default).</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Matches the config-level default. Without this, embedded callers that only set zmx_session get create_if_missing=false, failing to start missing sessions.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/apprt/embedded.zig`:
- Around line 482-492: The Surface.Options docs are misleading because init
treats an empty string ("") for zmx_session as "unset" (like null) and
zmx_mode=true will auto-generate a session in that case; update the comment on
zmx_session/zmx_mode (and optionally zmx_create) to explicitly state that an
empty string is treated as unset and that when zmx_mode is true and zmx_session
is null or the empty string, a session name will be auto-generated (whereas a
non-empty string attaches directly to that named session); reference
Surface.Options, the zmx_session, zmx_mode and zmx_create fields and the init
behavior in the text so callers understand the empty-string contract.
| /// zmx session name. When non-null, the surface connects to a zmx | ||
| /// daemon session instead of spawning a new shell process. | ||
| zmx_session: ?[*:0]const u8 = null, | ||
|
|
||
| /// Whether to create the zmx session if it doesn't exist. | ||
| zmx_create: bool = true, | ||
|
|
||
| /// Whether this surface should use zmx mode. When true and | ||
| /// zmx_session is null, a session name is auto-generated. | ||
| /// This is how zmx mode propagates across splits/tabs. | ||
| zmx_mode: bool = false, |
There was a problem hiding this comment.
Clarify the empty-string contract in Surface.Options.
init treats zmx_session="" as "clear/unset", and zmx_mode=true will still auto-generate a session from that state. The field docs still describe any non-null zmx_session as a direct attach and only mention auto-generation for null, so the public API contract is now a little misleading.
📝 Suggested doc update
- /// zmx session name. When non-null, the surface connects to a zmx
- /// daemon session instead of spawning a new shell process.
+ /// zmx session name. A non-empty value attaches to that zmx daemon
+ /// session instead of spawning a new shell process. An explicit empty
+ /// string clears any inherited session.
- /// Whether this surface should use zmx mode. When true and
- /// zmx_session is null, a session name is auto-generated.
+ /// Whether this surface should use zmx mode. When true and
+ /// zmx_session is null or empty, a session name is auto-generated.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// zmx session name. When non-null, the surface connects to a zmx | |
| /// daemon session instead of spawning a new shell process. | |
| zmx_session: ?[*:0]const u8 = null, | |
| /// Whether to create the zmx session if it doesn't exist. | |
| zmx_create: bool = true, | |
| /// Whether this surface should use zmx mode. When true and | |
| /// zmx_session is null, a session name is auto-generated. | |
| /// This is how zmx mode propagates across splits/tabs. | |
| zmx_mode: bool = false, | |
| /// zmx session name. A non-empty value attaches to that zmx daemon | |
| /// session instead of spawning a new shell process. An explicit empty | |
| /// string clears any inherited session. | |
| zmx_session: ?[*:0]const u8 = null, | |
| /// Whether to create the zmx session if it doesn't exist. | |
| zmx_create: bool = true, | |
| /// Whether this surface should use zmx mode. When true and | |
| /// zmx_session is null or empty, a session name is auto-generated. | |
| /// This is how zmx mode propagates across splits/tabs. | |
| zmx_mode: bool = false, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/apprt/embedded.zig` around lines 482 - 492, The Surface.Options docs are
misleading because init treats an empty string ("") for zmx_session as "unset"
(like null) and zmx_mode=true will auto-generate a session in that case; update
the comment on zmx_session/zmx_mode (and optionally zmx_create) to explicitly
state that an empty string is treated as unset and that when zmx_mode is true
and zmx_session is null or the empty string, a session name will be
auto-generated (whereas a non-empty string attaches directly to that named
session); reference Surface.Options, the zmx_session, zmx_mode and zmx_create
fields and the init behavior in the text so callers understand the empty-string
contract.
Summary
This PR adds a new
zmxtermio backend to Ghostty and wires it through config + embedded APIs so surfaces can attach to persistent zmx sessions instead of always spawning a direct PTY subprocess.Why
cmux persistence work (
manaflow-ai/cmux#143,manaflow-ai/cmux#480) needs terminal processes to outlive the app. The existingexecbackend ties process lifetime to the Ghostty surface, so restart always loses shell state. zmx provides a daemon-owned PTY/session model that can survive app restarts and reconnect cleanly.What Changed
1) New zmx backend in termio
src/termio/Zmx.zig.zmxvariant tosrc/termio/backend.zigunions/dispatch.src/termio.zig.Backend behavior:
zmx run <session>) when configured..child_exitedmetadata so surface UX remains coherent.2) Surface/backend selection and fallback
src/Surface.zigto choose zmx backend whenzmx-sessionis set.execwith warning (no crash/no hard failure).3) Config + embedded API surface
src/config/Config.zig:zmx-sessionzmx-createsrc/apprt/embedded.zigand synced C header:zmx_sessionzmx_createzmx_modeinclude/ghostty.hto keep C API struct in sync.zmx_modeis used for split/tab inheritance in embedded mode: new surfaces created from a zmx-backed surface stay in zmx mode, each with a fresh session.4) Build compatibility in this branch
uucodefallback vendoring path and compatibility wiring (deps-uucode, unicode build updates) to keep fork builds stable in environments where fetching unicode dependency sources is constrained.Design Rationale
exec, so behavior stays isolated and maintainable.exec.Validation
Zmx.zigincludes targeted tests for IPC framing/buffer behavior and disconnect metadata.Related
manaflow-ai/cmux#480manaflow-ai/cmux#143Summary by CodeRabbit
New Features
Behavior
Tests