diff --git a/.gitignore b/.gitignore index ea8c4bf..127321f 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,4 @@ /target +/.venv +__pycache__/ +.pytest_cache/ diff --git a/CLAUDE.md b/CLAUDE.md deleted file mode 100644 index 27fca05..0000000 --- a/CLAUDE.md +++ /dev/null @@ -1,41 +0,0 @@ -# Heimdall — Project Rules - -## What is it - -PTY session supervisor. Forks a child process in a pty, manages its lifecycle via process groups, and exposes a Unix socket for multi-client IPC with binary framing. - -## Quick Commands - -```bash -cargo build # dev build -cargo test # run tests -cargo clippy -- -W clippy::all # lint -cargo build --release # optimised binary at target/release/hm -``` - -## Architecture - -- **Binary name:** `hm` -- **Config:** `./heimdall.toml` (CWD first), then `~/.config/heimdall/heimdall.toml`, or `--config ` -- **Socket dir:** `~/.local/share/heimdall/sessions/` (default) -- **Binary framing:** `[type: u8][len: u32 BE][payload]` — 5-byte overhead per frame -- **Process lifecycle:** fork before tokio, setsid for new process group, kill(-pgid) for cleanup -- **State classifier:** pluggable via config — `claude` (full state machine), `simple` (idle/active), `none` -- **Scrollback:** ring buffer with configurable max bytes, replayed to late-joining subscribers - -## Module Map - -- `main.rs` — CLI (clap), subcommands: run, attach, status, ls, kill -- `config.rs` — TOML config loading with serde -- `pty.rs` — fork/exec, pre-exec seam, process group signals -- `socket.rs` — Unix socket server, per-client handler, subscribe mode -- `protocol.rs` — binary framing, pack/unpack helpers -- `broadcast.rs` — output fan-out, scrollback ring buffer -- `classify/` — StateClassifier trait + implementations (claude, simple, none) - -## Conventions - -- Rust 2024 edition -- clippy clean with `-W clippy::all` -- Single-threaded tokio runtime (fork safety) -- All signal handling via process groups (kill -pgid), not individual PIDs diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..452a6db --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,125 @@ +# Contributing to Heimdall + +## Prerequisites + +- Rust stable toolchain (rustup recommended) +- [just](https://github.com/casey/just) command runner +- [uv](https://docs.astral.sh/uv/) (for Python attach tests) +- Optional: `cargo-llvm-cov` for coverage reports + +Run `just doctor` to verify your environment. + +## Setup + +```bash +git clone https://github.com/nazq/heimdall.git +cd heimdall +cargo build +uv sync # install Python test deps (pexpect, pytest) +just doctor +``` + +## Just targets + +| Target | Description | +|--------------------|--------------------------------------------------| +| `just check` | Run all quality checks (clippy + fmt + tests) | +| `just test` | Run unit and Rust integration tests | +| `just test-attach` | Run Python attach tests (requires `uv sync`) | +| `just test-all` | Full test suite (Rust + Python) | +| `just fmt` | Format code | +| `just fmt-check` | Check formatting without modifying files | +| `just clippy` | Lint with clippy | +| `just build` | Debug build | +| `just release` | Release build | +| `just install` | Build release and install `hm` to `~/.local/bin` | +| `just cov` | Generate coverage report (requires cargo-llvm-cov)| + +## Running locally + +```bash +# Start a supervised session +just run my-session bash + +# Attach to it from another terminal +just attach my-session + +# List running sessions +just ls + +# Check session status +just status my-session + +# Kill a session +just kill my-session +``` + +## Test expectations + +All PRs must pass `just check` (clippy + format check + full test suite). + +### Testing philosophy + +Tests exist to prove the system works, not to prove the code compiles. Every +test must satisfy three criteria: + +1. **Setup is correct** — the test creates the right preconditions and waits + for them (e.g. socket appears before connecting). +2. **The operation runs** — the test actually exercises the code path it claims + to test, not a happy-path shortcut. +3. **All invariants are asserted** — don't assert one field when the response + has five. If a STATUS_RESP has pid, idle_ms, alive, state, and state_ms, + assert the ones that have known-good values. Skipping fields hides bugs. + +### Wire-level protocol tests + +The protocol is documented in [`docs/protocol.md`](docs/protocol.md). Protocol +tests come in two flavours, and both are required: + +- **Round-trip tests** — pack through `pack_*`, parse through `read_frame`, + verify fields match. These catch regressions but have a blind spot: if pack + and parse have the same bug (e.g. both swap two fields), the test passes + while the wire format is silently wrong. +- **Golden byte tests** — assert that a known input produces an exact byte + sequence. These pin the wire format to the documented spec and catch + symmetric pack/parse bugs that round-trip tests cannot. + +When adding a new frame type or modifying a payload layout, add both. + +### Integration tests + +- **Rust** (`tests/integration.rs`) — spawn real `hm` processes, connect over + Unix sockets, send/receive protocol frames, assert responses byte-by-byte. + These test the supervisor end-to-end without mocks. +- **Python** (`tests/test_attach.py`) — use pexpect over real PTYs to verify + the terminal UX: alt screen, status bar, detach, signal forwarding, resize. + Run via `just test-attach` (requires `uv sync`). + +Both suites use temp directories for socket isolation and clean up processes +in fixtures/teardown. + +## Commit style + +Conventional commits. One logical change per commit. + +``` +feat: add scrollback size config option +fix: handle SIGCHLD race on rapid child exit +deps: bump nix to 0.29 +ci: add aarch64-linux to release matrix +docs: clarify process group signaling in ARCH.md +``` + +## Code style + +- `cargo fmt` -- all code must be formatted. +- `cargo clippy -- -W clippy::all` -- no warnings allowed. + +## Documentation + +See `docs/` for detailed documentation: + +- [Architecture](docs/ARCH.md) -- design principles, process lifecycle, data flow +- [Protocol](docs/protocol.md) -- wire format and message types +- [Classifiers](docs/classifiers.md) -- state detection and custom classifiers +- [Configuration](docs/configuration.md) -- all config options diff --git a/Cargo.lock b/Cargo.lock index 800c2ec..d6c289d 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -159,6 +159,17 @@ version = "2.3.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "37909eebbb50d72f9059c3b6d82c0463f2ff062c9e95845c43a6c9c0355411be" +[[package]] +name = "filetime" +version = "0.2.27" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f98844151eee8917efc50bd9e8318cb963ae8b297431495d3f758616ea5c57db" +dependencies = [ + "cfg-if", + "libc", + "libredox", +] + [[package]] name = "foldhash" version = "0.1.5" @@ -206,6 +217,7 @@ dependencies = [ "anyhow", "bytes", "clap", + "filetime", "nix", "serde", "tempfile", @@ -263,6 +275,18 @@ version = "0.2.183" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "b5b646652bf6661599e1da8901b3b9522896f01e736bad5f723fe7a3a27f899d" +[[package]] +name = "libredox" +version = "0.1.14" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1744e39d1d6a9948f4f388969627434e31128196de472883b39f148769bfe30a" +dependencies = [ + "bitflags", + "libc", + "plain", + "redox_syscall", +] + [[package]] name = "linux-raw-sys" version = "0.12.1" @@ -340,6 +364,12 @@ version = "0.2.17" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "a89322df9ebe1c1578d689c92318e070967d1042b512afbe49518723f4e6d5cd" +[[package]] +name = "plain" +version = "0.2.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b4596b6d070b27117e987119b4dac604f3c58cfb0b191112e24771b2faeac1a6" + [[package]] name = "prettyplease" version = "0.2.37" @@ -374,6 +404,15 @@ version = "6.0.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "f8dcc9c7d52a811697d2151c701e0d08956f92b0e24136cf4cf27b57a6a0d9bf" +[[package]] +name = "redox_syscall" +version = "0.7.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6ce70a74e890531977d37e532c34d45e9055d2409ed08ddba14529471ed0be16" +dependencies = [ + "bitflags", +] + [[package]] name = "regex-automata" version = "0.4.14" diff --git a/Cargo.toml b/Cargo.toml index d67b5e8..496908e 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -25,6 +25,7 @@ serde = { version = "1", features = ["derive"] } [dev-dependencies] tempfile = "3" +filetime = "0.2" [profile.release] strip = true diff --git a/README.md b/README.md index 279f063..42a2e52 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,8 @@ Fork. Watch. Control. From anywhere. +*Named for the Norse guardian who watches over Bifrost — Heimdall sees all, hears all, and nothing escapes on his watch.* + [![CI](https://github.com/nazq/heimdall/actions/workflows/ci.yml/badge.svg)](https://github.com/nazq/heimdall/actions/workflows/ci.yml) [![codecov](https://codecov.io/gh/nazq/heimdall/graph/badge.svg)](https://codecov.io/gh/nazq/heimdall) [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE) @@ -35,7 +37,7 @@ tools, different jobs. |---|---|---|---|---| | Terminal multiplexer (splits, tabs) | Yes | Yes | Yes | No | | PTY supervision (fork, own, reap) | Side effect | Side effect | Side effect | Core purpose | -| Process group kill (`kill -pgid`) | No | No | No | Yes | +| Process group kill (`kill -pgid`) | No | No | No | Yes (default, configurable) | | Multi-client attach (concurrent) | One at a time | One at a time | One at a time | Unlimited | | Binary socket protocol (5-byte frames) | No | No | No | Yes | | Scrollback replay for late joiners | Per-pane buffer | Per-window | Per-pane | Ring buffer, streamed on subscribe | @@ -44,7 +46,7 @@ tools, different jobs. | Pre-exec seam (env, workdir, future: cgroups) | Limited | Limited | No | Full control of fork/exec boundary | | Config per project | `.tmux.conf` | `.screenrc` | `config.kdl` | `./heimdall.toml` | | Zero dependencies at runtime | Needs server | Needs server | Needs server | Single static binary | -| Grandchild cleanup on kill | No | No | No | Yes (`setsid` + `-pgid`) | +| Grandchild cleanup on kill | No | No | No | Yes (default; set `kill_process_group = false` to disable) | Heimdall doesn't replace tmux — it replaces the part of tmux you were misusing as a process supervisor. @@ -118,13 +120,24 @@ hm kill my-session # SIGTERM to entire process group, SIGKILL after 5s ### Configure (optional) -Drop a `heimdall.toml` in your project directory, or at -`~/.config/heimdall/heimdall.toml` for global defaults: +Heimdall resolves configuration using a waterfall — the first file found wins: + +1. **`--config `** — explicit path passed on the command line +2. **`./heimdall.toml`** — in the current working directory (project-local) +3. **`~/.config/heimdall/heimdall.toml`** — global user defaults +4. **Built-in defaults** — sensible values if no file is found ```toml -classifier = "claude" # "claude", "simple", or "none" -idle_threshold_ms = 3000 scrollback_bytes = 65536 +kill_process_group = true # set to false to only signal the direct child + +# Classifier as a string (uses defaults): +classifier = "simple" + +# Or with custom parameters: +# [classifier.claude] +# idle_threshold_ms = 3000 +# debounce_ms = 200 [[env]] name = "MY_API_KEY" @@ -135,18 +148,16 @@ See [`heimdall.example.toml`](heimdall.example.toml) for all options. ## How it works -``` -hm run --id foo -- bash -│ -├─ fork() before async runtime (single-threaded safety) -│ ├─ child: setsid → new process group, pty slave → stdio, exec -│ └─ parent: own master fd, write PID file -├─ tokio event loop (single-threaded) -│ ├─ pty read → scrollback + broadcast to subscribers -│ ├─ SIGCHLD → reap, broadcast EXIT -│ └─ SIGTERM → kill(-pgid), reap, broadcast EXIT -└─ cleanup: remove socket + PID, exit with child's code -``` +Heimdall acts as a middleman between you and the process you want to supervise. Think of it like a bodyguard that starts your program, keeps it alive, lets visitors talk to it, and handles the cleanup when it's done. + +Here's what happens when you run `hm run --id foo -- bash`: + +- **Launches as its own session leader.** The supervisor calls `setsid` to become the leader of a new process session. This means even if you close the terminal window that started it, the supervised process keeps running. You can always reattach later with `hm attach`. +- **Starts your command inside a virtual terminal.** Your program thinks it's running in a normal terminal, so interactive tools (editors, TUIs, colored output) all work as expected. +- **Owns the entire process tree.** The supervised command and everything it spawns belong to one process group. When you `hm kill`, the signal reaches every descendant — no orphaned grandchildren left behind. This is the default behavior; set `kill_process_group = false` in your config if you want only the direct child to receive signals. +- **Opens a Unix socket for clients.** Any number of terminals can attach simultaneously to watch output, send input, or query status. Late joiners get the scrollback buffer replayed so they don't miss anything. +- **Sets a session ID environment variable.** The child process (and everything it spawns) inherits `HEIMDALL_SESSION_ID=foo`. Scripts and hooks can read this to know which supervised session they belong to. +- **Cleans up on exit.** When the supervised process ends, Heimdall reaps it, removes the socket and PID file, and exits with the child's exit code. Clients connect via Unix socket at `~/.local/share/heimdall/sessions/.sock`. The binary framing protocol is 5 bytes overhead per message — trivial to diff --git a/docs/ARCH.md b/docs/ARCH.md index e3f238e..478f04a 100644 --- a/docs/ARCH.md +++ b/docs/ARCH.md @@ -11,28 +11,38 @@ session concurrently. child. Multiple sessions means multiple `hm` processes. No daemon, no multiplexer state to corrupt. - **Fork before async.** The child is forked before the Tokio runtime starts. - This satisfies the single-threaded requirement for safe `fork()` and keeps + This satisfies the single-threaded requirement for safe + [`fork()`](https://docs.rs/nix/latest/nix/unistd/fn.fork.html) and keeps the async runtime free of pre-fork state. - **Process groups, not PIDs.** The child calls `setsid()` making its PID the PGID. All signals (`SIGTERM`, `SIGKILL`) target `-pgid`, killing the entire - tree — grandchildren included. + tree — grandchildren included. This behavior is configurable via + `kill_process_group` in `heimdall.toml` (default: `true`). When disabled, + signals target only the direct child PID. - **Clients are just socket connections.** The built-in `attach`, `status`, - `ls`, and `kill` subcommands are thin clients over the same Unix socket + `ls`, and `kill` subcommands are thin clients over the same + [Unix domain socket](https://man7.org/linux/man-pages/man7/unix.7.html) protocol. Any program that speaks the binary framing protocol is a first-class client. ## Process lifecycle +When you run `hm run --id foo -- bash`, the supervisor walks through these +steps in order. Understanding the sequence matters because each step depends +on the previous one, and the fork/exec boundary is where the child's +environment is permanently set. + ``` hm run --id foo -- bash │ ├─ parse CLI args ├─ load config (heimdall.toml) -├─ check PID file (abort if session already running) +├─ flock PID file (abort if locked or PIDs alive) +├─ write supervisor PID (line 1) ├─ openpty() — allocate master/slave pair ├─ fork() │ ├─ [child] setsid, dup2 slave → stdio, set env, chdir, execvp -│ └─ [parent] close slave fd, write PID file +│ └─ [parent] close slave fd, write child PID (line 2) ├─ start tokio runtime (single-threaded) ├─ bind Unix socket ├─ event loop: @@ -42,15 +52,152 @@ hm run --id foo -- bash └─ cleanup: remove socket + PID file, exit with child's code ``` +### What each step does + +1. **Parse CLI args** — Clap extracts the session ID, the command to run, and + any flags. Nothing interesting happens here. + +2. **Load config** — Reads `heimdall.toml` (CWD, then `~/.config/heimdall/`, + or `--config`). Config controls socket directory, scrollback size, + classifier selection (with per-classifier parameters), process group + kill behaviour, and extra environment variables. CLI flags override + config file values. + +3. **Check PID file** — Each session writes a two-line PID file + (`\n`) protected by an `flock`. If the lock + is held by another process, the supervisor aborts with diagnostics + (holder's PID, uptime, command line from `/proc`). If the lock is + available but the file contains PIDs that are still alive, the + supervisor aborts to prevent two supervisors fighting over the same + socket and process group. Stale PID files (dead processes) are + overwritten. + +4. **openpty()** — Allocates a pseudo-terminal pair: a *master* fd and a + *slave* fd. The master side is what the supervisor reads/writes. The slave + side becomes the child's terminal — its stdin, stdout, and stderr all + point to it, so the child thinks it's running in a real terminal. + +5. **fork()** — Splits the process into parent and child. This happens + *before* the async runtime starts, because `fork()` in a multi-threaded + process is unsafe (locks held by other threads become permanently locked + in the child). + +6. **[child] setsid** — The child calls `setsid()` to create a new session + and process group. This detaches it from the parent's terminal and makes + the child's PID the process group leader. All processes spawned by the + child inherit this group, so the supervisor can signal the entire tree + at once with `kill(-pgid, signal)`. + +7. **[child] dup2 slave to stdio** — The child uses `dup2()` to replace its + stdin (fd 0), stdout (fd 1), and stderr (fd 2) with the slave fd. After + this, anything the child prints goes through the PTY, and anything the + supervisor writes to the master fd appears as the child's input. + +8. **[child] set env, chdir, execvp** — Environment variables are set + (including the session ID), the working directory is changed if + configured, and `execvp()` replaces the child process image with the + requested command. From this point, the child IS the command (e.g. bash). + +9. **[parent] close slave fd, write PID file** — The parent doesn't need the + slave side (only the child uses it). Closing it ensures the PTY sends EOF + properly when the child exits. The supervisor PID is written to line 1 + of the PID file before fork; the child PID is appended to line 2 after + fork. Both are protected by the flock acquired at step 3. + +10. **Start Tokio runtime** — A single-threaded async runtime starts. It's + single-threaded because the fork already happened — there's no need for + multiple OS threads, and it keeps the supervisor lightweight. + +11. **Bind Unix socket** — The socket is created at + `/.sock`. Clients connect here to subscribe to + output, send input, or query status. + +12. **Event loop** — The supervisor multiplexes three concerns: reading PTY + output (and fanning it out to subscribers), catching `SIGCHLD` (child + exited), and catching `SIGTERM` (someone asked the supervisor to stop). + +13. **Cleanup** — After the child exits (or the supervisor is told to stop), + the socket and PID file are removed, and the supervisor exits with the + child's exit code. + +## Run modes and the two-process model + +`hm run` has two modes that determine the process architecture: + +### `hm run --id foo -- bash` (default: launch and attach) + +This spawns **two independent processes**: + +1. The original process re-execs itself as `hm run --detach --id foo -- bash`, + which starts the supervisor in the background. +2. Once the supervisor's socket appears, the original process becomes a pure + **attach client** — connecting to the socket, setting up raw terminal mode, + and running the terminal passthrough loop. + +``` +hm run --id foo -- bash +│ +├─ spawn "hm run --detach --id foo -- bash" (background) +│ └─ supervisor process (owns pty, binds socket, event loop) +│ +├─ poll for socket to appear +│ +└─ attach client (raw mode, status bar, select loop) + └─ connects to supervisor via Unix socket +``` + +The supervisor calls `setsid()` before starting, placing itself in a new +process session. This is the key isolation boundary: the supervisor and its +child belong to a different session from the user's terminal. + +If you close the terminal window (which sends `SIGHUP` to all processes in +the terminal's session), only the attach client dies. The supervisor and its +child keep running. You can reattach later with `hm attach foo`. + +### `hm run --detach --id foo -- bash` (headless) + +A single process: the supervisor runs in the foreground with no terminal UI. +Used when something else manages the lifecycle — a web dashboard, systemd, +CI, or scripts that interact via the socket API. + +### `hm attach foo` (reconnect) + +Connects to an already-running supervisor. Identical to the attach phase of +the default mode — same terminal passthrough, same status bar, same signals. +Multiple clients can attach to the same session simultaneously. + +This is the same basic mechanism that `tmux` and `screen` use, but without +the multiplexer complexity. Each session is one supervisor process, and +clients are just socket connections. + +## Session ID environment variable + +The supervised child process (and all its descendants) receive an environment +variable containing the session ID. By default this variable is named +`HEIMDALL_SESSION_ID`, but the name is configurable in `heimdall.toml`. + +This serves as the identity mechanism for the entire process tree. Hooks, +plugins, and child processes can read this variable to determine which +Heimdall session they belong to. For example, a post-exit hook script can +use `$HEIMDALL_SESSION_ID` to report results to the correct session, or a +long-running child can use it for logging and metrics attribution. + ## Module map ``` src/ -├── main.rs CLI (clap), subcommands, supervisor event loop +├── main.rs Entry point: CLI parse, config merge, dispatch +├── cli.rs Clap structs, RunArgs, SessionParams, config merge +├── supervisor.rs Fork child, PID lock, event loop, cleanup guard +├── attach.rs launch_and_attach, terminal passthrough, signal handling +├── commands.rs Status, list, kill subcommands +├── terminal.rs ANSI consts, status bar rendering, termios guard ├── config.rs TOML config loading + resolution ├── pty.rs fork/exec, pre-exec seam, process group signals ├── socket.rs Unix socket server, per-client handler ├── protocol.rs Binary framing: pack/unpack/read/write +├── pidfile.rs PID file abstraction: read, write, liveness checks +├── util.rs Shared helpers: session_socket, with_runtime ├── broadcast.rs Output fan-out, scrollback ring buffer └── classify/ ├── mod.rs StateClassifier trait + factory @@ -72,8 +219,8 @@ src/ └────┬─────┘ │ read loop ┌────┴─────┐ - │broadcast │──→ state classifier - │ │──→ scrollback ring buffer + │broadcast │──> state classifier + │ │──> scrollback ring buffer └────┬─────┘ │ tokio broadcast channel ┌──────────┼──────────┐ @@ -94,7 +241,7 @@ src/ ## Pre-exec seam -The gap between `fork()` and `execvp()` is the most powerful boundary in the +The gap between `fork()` and `execvp()` is the most key boundary in the supervisor. Everything set here is inherited by the child and its entire process tree. diff --git a/docs/classifiers.md b/docs/classifiers.md index 12b09b5..b0ef023 100644 --- a/docs/classifiers.md +++ b/docs/classifiers.md @@ -43,9 +43,18 @@ classifier produced it. The `none` classifier always reports Idle. ## Built-in classifiers -### `claude` (default) +### `simple` (default) -Full state machine tuned for Claude Code's terminal output patterns. +Binary idle/active classifier. Reports: +- **Idle** when silence exceeds the threshold. +- **Active** when there's been recent output. + +No pattern analysis. This is the recommended default for general use — it +works with any program and has negligible overhead. + +### `claude` + +Specialized state machine tuned for Claude Code's terminal output patterns. Uses a sliding window of the last 20 output events. For each new event it: @@ -61,15 +70,6 @@ Uses a sliding window of the last 20 output events. For each new event it: State transitions are **debounced** (`debounce_ms`, default 200ms) to prevent rapid flickering. Idle transitions are instant since silence is unambiguous. -### `simple` - -Binary idle/active classifier. Reports: -- **Idle** when silence exceeds the threshold. -- **Active** when there's been recent output. - -No pattern analysis. Useful for processes where you only care whether -something is happening or not. - ### `none` Null classifier. Always reports Idle. Use when you only need pty supervision, @@ -77,13 +77,49 @@ scrollback, and socket IPC — no state inference. ## Configuration -Set the classifier in `heimdall.toml`: +Set the classifier in `heimdall.toml`. Two forms are supported: + +**String shorthand** (all built-in defaults for the classifier): ```toml -classifier = "claude" # or "simple" or "none" +classifier = "simple" # or "claude" or "none" (default: "simple") +``` + +**Table form** (per-classifier parameters): + +```toml +[classifier.simple] +idle_threshold_ms = 3000 + +# or + +[classifier.claude] idle_threshold_ms = 3000 debounce_ms = 200 + +# or + +[classifier.none] +``` + +Each classifier carries its own parameters: + +| Parameter | Classifiers | Default | Description | +|---|---|---|---| +| `idle_threshold_ms` | simple, claude | 3000 | Silence duration (ms) before transitioning to Idle | +| `debounce_ms` | claude | 200 | Minimum time (ms) a non-idle state must persist before it's committed | + +The `none` classifier has no parameters — it always reports Idle. + +### CLI overrides + +All classifier parameters can be overridden on the command line: + +```bash +hm run --id foo --classifier claude --idle-threshold-ms 5000 --debounce-ms 100 -- bash ``` -The `idle_threshold_ms` and `debounce_ms` values are passed to whichever -classifier is selected. The `none` classifier ignores both. +When `--classifier` is given, a fresh classifier is created with the specified +(or default) parameters. When only `--idle-threshold-ms` or `--debounce-ms` +are given without `--classifier`, they override the corresponding parameter on +whatever classifier the config file selected. diff --git a/docs/configuration.md b/docs/configuration.md index ef9fcd3..d46e83f 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -18,6 +18,16 @@ Heimdall resolves config in this order: The first file found wins. Files are not merged — if `./heimdall.toml` exists, the global config is not read. +## Precedence + +Within a single run, configuration follows a three-tier precedence: + +**CLI flags > config file > built-in defaults** + +For example, `hm run --idle-threshold-ms 5000` overrides whatever +`idle_threshold_ms` is set in the config file, which itself overrides the +built-in default of 3000. + ## All options ```toml @@ -34,28 +44,68 @@ scrollback_bytes = 65536 # Default: HEIMDALL_SESSION_ID session_env_var = "HEIMDALL_SESSION_ID" -# State classifier: "claude", "simple", or "none". -# See docs/classifiers.md for details. -# Default: "claude" -classifier = "claude" +# Signal the entire process group on kill/shutdown. +# When true (default), SIGTERM/SIGKILL reach all grandchild processes. +# Set to false to only signal the direct child. +kill_process_group = true + +# Log file path for the supervisor. +# Default: /.log. Set to /dev/null to disable. +# log_file = "/var/log/heimdall/session.log" -# Milliseconds of silence before the classifier transitions to Idle. -# Default: 3000 -idle_threshold_ms = 3000 +# Log level for the supervisor (trace, debug, info, warn, error). +# RUST_LOG env var overrides this. +# Default: info +log_level = "info" -# Milliseconds a candidate state must persist before the classifier commits -# to the transition. Prevents rapid flickering on ambiguous output. -# Default: 200 -debounce_ms = 200 +# Detach key byte for the attach client. +# Default: 28 (0x1C, Ctrl-\). Set to 0 to disable. +# detach_key = 28 + +# State classifier — string shorthand (built-in defaults): +classifier = "simple" # or "claude" or "none" + +# Or table form with per-classifier parameters: +# [classifier.simple] +# idle_threshold_ms = 3000 + +# [classifier.claude] +# idle_threshold_ms = 3000 +# debounce_ms = 200 + +# [classifier.none] # Extra environment variables injected into the child process. -# These are set in the pre-exec seam after fork(), so they're inherited by -# the child's entire process tree. +# Set in the pre-exec seam after fork(), inherited by the child's entire tree. [[env]] name = "MY_VAR" value = "my_value" ``` +## Classifier parameters + +Each classifier carries its own parameters. See +[classifiers.md](classifiers.md) for full details. + +| Parameter | Classifiers | Default | Description | +|---|---|---|---| +| `idle_threshold_ms` | simple, claude | 3000 | Silence duration (ms) before idle | +| `debounce_ms` | claude | 200 | State transition debounce (ms) | + +## CLI flags for `hm run` + +| Flag | Overrides | Description | +|---|---|---| +| `--socket-dir` | `socket_dir` | Socket/PID directory | +| `--scrollback-bytes` | `scrollback_bytes` | Scrollback buffer size | +| `--session-env-var` | `session_env_var` | Env var name for session ID | +| `--kill-process-group` | `kill_process_group` | Process group signalling | +| `--classifier` | `classifier` | Classifier type (simple/claude/none) | +| `--idle-threshold-ms` | per-classifier | Idle detection threshold | +| `--debounce-ms` | per-classifier | State transition debounce | +| `--log-file` | `log_file` | Supervisor log file (default: `/.log`) | +| `--log-level` | `log_level` | Log level: trace, debug, info, warn, error | + ## Per-project config Drop a `heimdall.toml` in your project root. When you run `hm run` from that diff --git a/docs/protocol.md b/docs/protocol.md index c7aee38..eaccaa9 100644 --- a/docs/protocol.md +++ b/docs/protocol.md @@ -22,9 +22,17 @@ Total overhead per frame: 5 bytes. On connect, the supervisor immediately writes a single **mode byte**: -| Byte | Meaning | -|--------|----------------| -| `0x00` | Binary framing | +| Byte | Meaning | +|--------|----------------------------| +| `0x00` | Binary framing (active) | +| `0x01` | Text/debug mode (reserved) | + +The mode byte exists so that future versions can offer a human-readable text +protocol (mode `0x01`) where you could connect with `socat` or `netcat` and +interact without a custom client. Today only binary framing (`0x00`) is +implemented — the supervisor always sends `0x00`, and clients should assert +this value. Mode `0x01` is reserved for future use and is not handled by the +supervisor or any built-in client. The client must read this byte before sending any frames. This byte is not framed — it's a raw single byte on the wire. @@ -116,3 +124,79 @@ loop: Any language with Unix socket support and the ability to read/write bytes can be a heimdall client. + +## Example: minimal Go client + +A self-contained subscriber that connects to a heimdall session, prints pty +output to stdout, and exits with the child's exit code. + +```go +package main + +import ( + "encoding/binary" + "fmt" + "io" + "net" + "os" +) + +func main() { + if len(os.Args) < 2 { + fmt.Fprintf(os.Stderr, "usage: %s \n", os.Args[0]) + os.Exit(1) + } + + conn, err := net.Dial("unix", os.Args[1]) + if err != nil { + fmt.Fprintf(os.Stderr, "connect: %v\n", err) + os.Exit(1) + } + defer conn.Close() + + // Read mode byte — must be 0x00 (binary framing). + mode := make([]byte, 1) + if _, err := io.ReadFull(conn, mode); err != nil { + fmt.Fprintf(os.Stderr, "read mode byte: %v\n", err) + os.Exit(1) + } + if mode[0] != 0x00 { + fmt.Fprintf(os.Stderr, "unsupported mode: 0x%02x\n", mode[0]) + os.Exit(1) + } + + // Send SUBSCRIBE frame: type=0x02, length=0. + subscribe := []byte{0x02, 0x00, 0x00, 0x00, 0x00} + if _, err := conn.Write(subscribe); err != nil { + fmt.Fprintf(os.Stderr, "send subscribe: %v\n", err) + os.Exit(1) + } + + // Read frames until EXIT. + header := make([]byte, 5) + for { + if _, err := io.ReadFull(conn, header); err != nil { + fmt.Fprintf(os.Stderr, "read frame: %v\n", err) + os.Exit(1) + } + msgType := header[0] + length := binary.BigEndian.Uint32(header[1:5]) + + payload := make([]byte, length) + if length > 0 { + if _, err := io.ReadFull(conn, payload); err != nil { + fmt.Fprintf(os.Stderr, "read payload: %v\n", err) + os.Exit(1) + } + } + + switch msgType { + case 0x81: // OUTPUT — write pty data to stdout. + os.Stdout.Write(payload) + case 0x83: // EXIT — child exited, payload is i32 BE exit code. + code := int32(binary.BigEndian.Uint32(payload)) + os.Exit(int(code)) + } + } +} +``` diff --git a/docs/pty-primer.md b/docs/pty-primer.md new file mode 100644 index 0000000..e797545 --- /dev/null +++ b/docs/pty-primer.md @@ -0,0 +1,256 @@ +# PTY Primer + +A gentle introduction to pseudo-terminals for developers who use them +every day without thinking about it. + +--- + +## What is a PTY? + +When you open a terminal and type `ls`, what actually happens? + +You might assume your keystrokes go straight to bash and bash prints +directly to your screen. The reality is more interesting. Between your +terminal emulator (Alacritty, kitty, Terminal.app, whatever you use) and +the shell, there is a kernel-managed device called a **pseudo-terminal**, +or PTY. + +A PTY is a pair of virtual devices that simulate a hardware terminal. In +the old days, you had a physical device -- a VT100, a teletype -- wired +to a serial port. The kernel's terminal subsystem (the "TTY layer") was +built to talk to those devices. When physical terminals disappeared, the +kernel kept the same interface but made it virtual. That virtual version +is the PTY. + +The key insight: a PTY is not one thing, it is a *pair*. One end is +called the **master** (or, in newer POSIX terminology, the +"multiplexor"). The other is the **slave** (or "subsidiary"). They are +connected: bytes written to one come out the other, but with the +kernel's terminal line discipline sitting in the middle, processing them. + + +## Master and slave + +Think of it like a two-way mirror between a control room and a stage. + +The **master** side is held by whatever is pretending to be the terminal +-- your terminal emulator, tmux, screen, or (in our case) heimdall. It +represents the "keyboard and screen" side of the old physical terminal. + +The **slave** side is what the program (bash, vim, python) opens as its +stdin, stdout, and stderr. From the program's perspective, it looks +exactly like a real terminal device. It shows up as `/dev/pts/N` in the +filesystem. + +The flow works like this: + +1. You press a key. Your terminal emulator writes that byte to the + master fd. +2. The kernel's line discipline receives it. In "cooked" mode, it might + buffer it, handle backspace, echo it back. In "raw" mode, it passes + it through immediately. +3. The byte appears on the slave side. Bash (or whatever is running) + reads it from stdin. +4. Bash runs a command and writes output to stdout (the slave fd). +5. The output travels back through the line discipline to the master. +6. The terminal emulator reads from the master and renders the text on + screen. + +Every character you see in your terminal has made this round trip +through the kernel. + +``` + ┌───────────────────┐ ┌──────────────────┐ ┌──────────────┐ + │ Terminal Emulator │ │ Kernel │ │ Shell │ + │ (alacritty, etc) │ │ Line Discipline │ │ (bash, zsh) │ + │ │ │ │ │ │ + │ screen ◄─────────┼── read ─┤ master ← slave ┼─ write─┤ stdout/err │ + │ │ │ │ │ │ + │ keyboard ────────┼─ write ─┤ master → slave ┼─ read ─┤ stdin │ + └───────────────────┘ └──────────────────┘ └──────────────┘ + PTY pair +``` + +The master and slave are connected through the kernel's line discipline, +which handles echoing, line editing (cooked mode), and signal generation. + + +## Why not just pipes? + +If a PTY is just a bidirectional byte stream, why not use a pair of +pipes? You could connect stdin/stdout to pipes and call it a day. + +For simple programs like `cat` or `echo`, that actually works. But a +surprising number of programs depend on properties that only a real +terminal (or pseudo-terminal) provides: + +**Terminal size.** Programs like vim, less, and htop need to know how +many rows and columns they have. They get this by calling the +`TIOCGWINSZ` ioctl on their stdout fd. A pipe does not have a terminal +size. A PTY does. + +**Raw vs cooked mode.** When you type in bash, you can use backspace, +Ctrl-A, Ctrl-E, and other line-editing keys. That is the line +discipline doing "cooked mode" processing. When vim starts, it switches +the terminal to "raw mode" so it can handle every keystroke itself. +Pipes do not have modes. + +**`isatty()` detection.** Many programs change their behavior depending +on whether stdout is a terminal. `ls` uses colors when connected to a +terminal, plain output when piped. `grep` does the same. Python +disables output buffering in interactive mode. These programs call +`isatty()`, which returns true for a PTY slave and false for a pipe. + +**Job control signals.** When you press Ctrl-C, the kernel's terminal +driver sends SIGINT to the foreground process group. Ctrl-Z sends +SIGTSTP. This mechanism is tied to the controlling terminal, which must +be a TTY device, not a pipe. + +**The controlling terminal.** Each session has at most one controlling +terminal. It is what allows the kernel to deliver SIGHUP when the +terminal disconnects, and it is how job control (`fg`, `bg`, `jobs`) +works. You cannot get a controlling terminal from a pipe. + +In short: if you want a program to believe it is running interactively +in a real terminal, you need a PTY. + + +## What heimdall does with PTYs + +heimdall is a session supervisor. It sits in the position that a +terminal emulator normally occupies: it holds the master end of the PTY. + +The supervised process -- Claude Code, bash, or whatever command you +configure -- gets the slave end. From its perspective, nothing is +unusual. It calls `isatty()` and gets true. It queries the terminal size +and gets real dimensions. It can switch to raw mode, use colors, draw +full-screen TUIs. It has no idea it is being supervised. + +This is the fundamental trick. By holding the master fd, heimdall can: + +- **Read all output** the child produces, feeding it to connected + clients through the Unix socket. +- **Write input** from any connected client, as if someone were typing + on a keyboard. +- **Multiplex** -- multiple clients can connect to the same session via + the Unix socket and see the same output, like a shared screen session. +- **Forward resize events** -- when a client's terminal changes size, + heimdall calls `TIOCSWINSZ` on the master fd and sends `SIGWINCH` to + the child, so it redraws at the correct dimensions. +- **Classify output** -- because all bytes flow through heimdall, it + can run classifiers on the stream to detect idle states, prompts, or + other patterns. + +``` + ┌────────────┐ + │ Client A │──┐ + └────────────┘ │ Unix + ┌────────────┐ │ socket ┌───────────┐ ┌─────────────────┐ + │ Client B │──┼────────────► │ heimdall │ master ──┤ PTY │ slave │──► child process + └────────────┘ │ │ supervisor│◄── fd ───┤ │ │ (claude, bash) + ┌────────────┐ │ └───────────┘ └─────────────────┘ + │ Client C │──┘ │ + └────────────┘ scrollback buffer + + idle classifier +``` + +Multiple clients share the same session. Each sees the full scrollback +on connect, then receives live output via broadcast. + + +## The fork-exec dance + +How does heimdall actually spawn the supervised process? The sequence +is the classic Unix pattern, with PTY-specific steps mixed in: + +**1. `openpty()`** -- Create the master/slave pair. This returns two +file descriptors: one for the master, one for the slave. Under the hood +this allocates a `/dev/pts/N` entry. + +**2. `fork()`** -- Create a copy of the current process. After this +call, there are two processes running the same code. The return value +tells you which one you are: zero means you are the child, nonzero +means you are the parent. + +**3. In the child:** + +- Call `setsid()` to create a new session and become the session leader. + This detaches from the parent's controlling terminal. +- Open the slave fd (or use the ioctl `TIOCSCTTY`) to make it the + controlling terminal for this new session. +- Duplicate the slave fd onto stdin (fd 0), stdout (fd 1), and stderr + (fd 2) using `dup2()`. +- Close the master fd -- the child has no business with it. +- Close the original slave fd -- it is already duplicated onto 0/1/2. +- Set any environment variables (like `DRASILL_SESSION_ID`). +- Call `exec()` to replace this process image with the target command + (e.g., `claude`). After exec, the child *is* the target command. The + Rust code, the fork setup, all of it is gone -- replaced by the new + program. + +**4. In the parent (heimdall):** + +- Close the slave fd -- only the child needs it. +- Keep the master fd. +- Start the event loop: read from the master (child output), write to + the master (client input), accept socket connections, and watch for + the child process to exit. + +The full sequence: + +``` + openpty() + │ + ▼ + ┌──────────────────────────────────────────────────┐ + │ Parent process (heimdall) │ + │ master_fd ◄──────────────────────┐ │ + │ slave_fd │ │ + └──────────┬─────────────────────┬──┘ │ + │ fork() │ │ + ▼ │ │ + ┌──────────────────────┐ │ │ + │ Child (pid == 0) │ │ │ + │ │ │ │ + │ 1. setsid() │ ┌─────┴──────────────┐ │ + │ 2. TIOCSCTTY │ │ Parent (pid > 0) │ │ + │ 3. dup2(slave → 0) │ │ │ │ + │ dup2(slave → 1) │ │ 1. close(slave_fd) │ │ + │ dup2(slave → 2) │ │ 2. keep master_fd │ │ + │ 4. close(master_fd) │ │ 3. event loop: │ │ + │ 5. close(slave_fd) │ │ read master │ │ + │ 6. set env vars │ │ accept sockets │ │ + │ 7. exec(command) │ │ watch SIGCHLD │ │ + │ ─── becomes ─── │ └────────────────────┘ │ + │ claude / bash │ │ + └──────────────────────┘ │ + │ │ + └──── bytes flow through PTY ───────────┘ +``` + +This is not unique to heimdall. Every terminal emulator, every `sshd` +session, every `script(1)` invocation does roughly the same thing. The +PTY is one of Unix's oldest and most reliable abstractions. + + +## A note on terminology + +POSIX has been moving away from "master/slave" terminology in favor of +"multiplexor/subsidiary" (or just "ptmx/pts"). The system calls still +use the old names (`openpty`, `forkpty`, `/dev/ptmx`), and most +documentation and man pages do too. This document uses both +interchangeably -- you will encounter both in the wild. + + +## Further reading + +- **`man 7 pty`** -- The Linux man page for pseudo-terminals. Covers + `posix_openpt()`, `grantpt()`, `unlockpt()`, and the `/dev/ptmx` + interface. +- **[The TTY Demystified](https://www.linusakesson.net/programming/tty/)** + -- Linus Akesson's excellent deep dive into the full TTY subsystem, + including the line discipline, session management, and job control. + The best single resource on the topic. +- **`ARCH.md`** in this repository -- heimdall's architecture document, + which covers how the PTY integration fits into the broader supervisor + design. diff --git a/heimdall.example.toml b/heimdall.example.toml index 0891b2f..5423b8c 100644 --- a/heimdall.example.toml +++ b/heimdall.example.toml @@ -10,23 +10,50 @@ # Environment variable name injected into child processes with the session ID. # session_env_var = "HEIMDALL_SESSION_ID" -# State classifier: "claude", "simple", or "none". +# Signal the entire process group on kill/shutdown. +# When true (default), SIGTERM/SIGKILL reach all grandchild processes. +# Set to false to only signal the direct child. +# kill_process_group = true + +# Log file path for the supervisor. +# Default: /.log. Set to /dev/null to disable. +# log_file = "/var/log/heimdall/session.log" + +# Log level for the supervisor (trace, debug, info, warn, error). +# RUST_LOG env var overrides this. +# log_level = "info" + +# Detach key byte for the attach client. +# Default: 28 (0x1C, Ctrl-\). Set to 0 to disable detaching. +# detach_key = 28 + +# --- State classifier --- +# +# String shorthand (uses built-in defaults for the classifier): +# classifier = "simple" +# +# Or table form with per-classifier parameters: # +# [classifier.simple] +# idle_threshold_ms = 3000 +# +# [classifier.claude] +# idle_threshold_ms = 3000 +# debounce_ms = 200 +# +# Available classifiers: +# +# simple — binary idle/active based on silence threshold (default). # claude — full state machine (idle/thinking/streaming/tool_use) # tuned for Claude Code spinner and token output patterns. -# simple — binary idle/active based on silence threshold. # none — no classification, always reports idle. -# -# classifier = "claude" - -# Idle detection threshold in milliseconds. -# idle_threshold_ms = 3000 -# State debounce period in milliseconds. -# Prevents rapid state flickering on ambiguous output. -# debounce_ms = 200 +# Default: simple with 3000ms idle threshold. +# classifier = "simple" -# Extra environment variables injected into the child process. +# --- Extra environment variables --- +# Injected into the child process at launch. +# # [[env]] # name = "MY_API_KEY" # value = "sk-..." diff --git a/justfile b/justfile index c908d43..cd5abdc 100644 --- a/justfile +++ b/justfile @@ -29,9 +29,9 @@ check: clippy fmt-check test-all test *args: cargo test {{ args }} -# Run Python attach tests (requires pexpect) +# Run Python attach tests test-attach: build - python3 tests/test_attach.py + uv run pytest tests/ -v # Run full integration suite (Rust + Python attach tests) test-all: test test-attach @@ -81,7 +81,7 @@ doctor: @echo "Checking dependencies..." @which cargo >/dev/null 2>&1 && echo " cargo: $(cargo --version)" || echo " cargo: MISSING" @which just >/dev/null 2>&1 && echo " just: $(just --version)" || echo " just: MISSING" - @python3 -c "import pexpect" 2>/dev/null && echo " pexpect: ok" || echo " pexpect: MISSING (pip install pexpect — needed for attach tests)" + @which uv >/dev/null 2>&1 && echo " uv: $(uv --version)" || echo " uv: MISSING (needed for Python attach tests)" @which cargo-llvm-cov >/dev/null 2>&1 && echo " cargo-llvm-cov: ok" || echo " cargo-llvm-cov: MISSING (optional, for coverage)" @echo "Done." diff --git a/pyproject.toml b/pyproject.toml new file mode 100644 index 0000000..325acd7 --- /dev/null +++ b/pyproject.toml @@ -0,0 +1,16 @@ +[project] +name = "heimdall-tests" +version = "0.1.0" +description = "Python integration tests for heimdall" +requires-python = ">=3.13" +dependencies = [] + +[dependency-groups] +dev = [ + "pexpect>=4.9", + "pytest>=8", +] + +[tool.pytest.ini_options] +testpaths = ["tests"] +python_files = ["test_*.py"] diff --git a/rust-toolchain.toml b/rust-toolchain.toml new file mode 100644 index 0000000..292fe49 --- /dev/null +++ b/rust-toolchain.toml @@ -0,0 +1,2 @@ +[toolchain] +channel = "stable" diff --git a/src/attach.rs b/src/attach.rs new file mode 100644 index 0000000..cac935c --- /dev/null +++ b/src/attach.rs @@ -0,0 +1,223 @@ +//! Attach subcommand: terminal passthrough with status bar. + +use crate::cli::SessionParams; +use crate::config; +use crate::protocol; +use crate::terminal::{ + RestoreTermios, StatusInfo, draw_status_bar, reset_scroll_region, resize_status_bar, + setup_status_bar, terminal_size, +}; +use crate::util; +use bytes::Bytes; +use nix::sys::termios; +use std::os::fd::{AsRawFd, BorrowedFd}; +use std::os::unix::process::CommandExt; +use tokio::io::{AsyncReadExt, AsyncWriteExt}; + +// Num of seconds to wait for the supervisor socket to appear before giving up. +const SOCKET_DEADLINE_SECS: u64 = 5; +const SOCKET_DEADLINE: std::time::Duration = std::time::Duration::from_secs(SOCKET_DEADLINE_SECS); + +// Interval for polling supervisor socket before giving up in SOCKET_DEADLINE. +const SOCKET_POLL_INTERVAL_MS: u64 = 5; +const SOCKET_POLL_INTERVAL: std::time::Duration = + std::time::Duration::from_millis(SOCKET_POLL_INTERVAL_MS); + +// How often to poll the supervisor for status bar updates. +const STATUS_POLL_INTERVAL_MS: u64 = 1000; +const STATUS_POLL_INTERVAL: std::time::Duration = + std::time::Duration::from_millis(STATUS_POLL_INTERVAL_MS); + +/// Launch the supervisor as a background process and attach to it. +/// +/// Spawns `hm run --detach` as a child process, waits for the socket to +/// appear, then runs the normal attach flow. When the attach disconnects +/// (Ctrl-\ or session exit), the supervisor keeps running in the background. +pub fn launch_and_attach(params: SessionParams) -> anyhow::Result<()> { + let exe = std::env::current_exe()?; + let child_args = params.to_detach_args(); + + // Spawn supervisor in background with setsid() for terminal independence. + let _supervisor = unsafe { + std::process::Command::new(exe) + .args(&child_args) + .stdin(std::process::Stdio::null()) + .stdout(std::process::Stdio::null()) + .stderr(std::process::Stdio::null()) + .pre_exec(|| { + nix::unistd::setsid().map_err(|e| std::io::Error::from_raw_os_error(e as i32))?; + Ok(()) + }) + .spawn()? + }; + + // Wait for the socket to appear. + let socket_path = crate::util::socket_path(¶ms.socket_dir, ¶ms.id); + let deadline = std::time::Instant::now() + SOCKET_DEADLINE; + while !socket_path.exists() { + if std::time::Instant::now() > deadline { + anyhow::bail!( + "Timed out waiting for supervisor socket at {}", + socket_path.display() + ); + } + // No tokio runtime yet, so just sleep the thread. The supervisor + // should be up and creating the socket within a few milliseconds. + std::thread::sleep(SOCKET_POLL_INTERVAL); + } + + attach(params.id, params.socket_dir, ¶ms.cfg) +} + +pub fn attach( + id: String, + socket_dir: std::path::PathBuf, + cfg: &config::Config, +) -> anyhow::Result<()> { + let detach_key = cfg.detach_key; + let socket_path = util::session_socket(&id, &socket_dir); + + util::with_runtime(async move { + let mut sess = protocol::Session::connect(&socket_path).await?; + + // Save terminal state and set raw mode + let stdin_raw_fd = std::io::stdin().as_raw_fd(); + let stdin_borrowed = unsafe { BorrowedFd::borrow_raw(stdin_raw_fd) }; + let original_termios = termios::tcgetattr(stdin_borrowed) + .map_err(|e| std::io::Error::from_raw_os_error(e as i32))?; + let mut raw = original_termios.clone(); + termios::cfmakeraw(&mut raw); + termios::tcsetattr(stdin_borrowed, termios::SetArg::TCSANOW, &raw) + .map_err(|e| std::io::Error::from_raw_os_error(e as i32))?; + + let _restore = RestoreTermios { + fd: stdin_raw_fd, + original: original_termios, + }; + + let mut stdout = tokio::io::stdout(); + + // Set up status bar: reserve the bottom line via scroll region. + let (cols, rows) = terminal_size()?; + let inner_rows = setup_status_bar(&mut stdout, &id, cols, rows, None).await?; + + // Send RESIZE with inner_rows so the child sees the reduced height, + // then subscribe — order matters so scrollback replays at the right size. + sess.send_resize(cols, inner_rows).await?; + sess.subscribe().await?; + + // Destructure into raw halves for the select loop (borrow checker + // requires independent borrows of reader and writer across arms). + let protocol::Session { + reader: mut main_reader, + writer: mut main_writer, + } = sess; + + // Signal handlers + let mut sigwinch = + tokio::signal::unix::signal(tokio::signal::unix::SignalKind::window_change())?; + let mut sighup = tokio::signal::unix::signal(tokio::signal::unix::SignalKind::hangup())?; + let mut sigterm = + tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate())?; + let mut sigint = tokio::signal::unix::signal(tokio::signal::unix::SignalKind::interrupt())?; + + // Periodic status poll for the status bar. + let mut status_tick = tokio::time::interval(STATUS_POLL_INTERVAL); + status_tick.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Skip); + + let mut cur_cols = cols; + let mut cur_rows = rows; + + // Async stdin reader + let stdin = tokio::io::stdin(); + let mut stdin_reader = tokio::io::BufReader::new(stdin); + let mut stdin_buf = [0u8; 1024]; + + // Second socket for STATUS polling (the main socket is in SUBSCRIBE mode). + let mut status_sess = protocol::Session::connect(&socket_path).await?; + + loop { + tokio::select! { + // Supervisor → client: child pty output or session exit notification. + result = protocol::read_frame(&mut main_reader) => { + let (msg_type, payload): (u8, Bytes) = result?; + match msg_type { + protocol::OUTPUT => { + // Write raw child output to the terminal. + stdout.write_all(&payload).await?; + stdout.flush().await?; + } + protocol::EXIT => { + let code = protocol::parse_exit_code(&payload)?; + reset_scroll_region(&mut stdout).await?; + drop(_restore); // terminal back to cooked mode + eprintln!("[session exited with code {code}]"); + std::process::exit(code); + } + _ => {} + } + } + + // Client → supervisor: User input available on stdin — check for detach key, then forward to supervisor. + n = stdin_reader.read(&mut stdin_buf) => { + let n = n?; + if n == 0 { + break; + } + // Detach key (default: Ctrl-\, configurable via detach_key). + // Only trigger on a lone keypress — ignore detach bytes buried in pastes. + if detach_key != 0 && n == 1 && stdin_buf[0] == detach_key { + reset_scroll_region(&mut stdout).await?; + drop(_restore); // terminal back to cooked mode + eprintln!("[detached from session {id}]"); + std::process::exit(0); + } + protocol::write_frame(&mut main_writer, protocol::INPUT, &stdin_buf[..n]).await?; + + } + + // Terminal resized — update scroll region, status bar, and notify supervisor. + _ = sigwinch.recv() => { + let (new_cols, new_rows) = terminal_size()?; + cur_cols = new_cols; + cur_rows = new_rows; + let inner_rows = resize_status_bar(&mut stdout, &id, new_cols, new_rows, None).await?; + + protocol::write_frame(&mut main_writer, protocol::RESIZE, &protocol::pack_resize(new_cols, inner_rows)).await?; + } + + // Periodic status poll (STATUS_POLL_INTERVAL) to refresh the status bar. + _ = status_tick.tick() => { + if let Ok(status) = status_sess.recv_status().await { + let info = StatusInfo { + state_byte: status.state, + state_ms: status.state_ms, + }; + draw_status_bar(&mut stdout, &id, cur_cols, cur_rows, Some(&info)).await?; + } + } + + // Terminal gone (SSH disconnect, window closed) — nothing to clean up visually. + _ = sighup.recv() => { + drop(_restore); + std::process::exit(0); + } + // Explicit kill of the attach client — terminal still exists, reset it. + _ = sigterm.recv() => { + let _ = reset_scroll_region(&mut stdout).await; + drop(_restore); // terminal back to cooked mode + eprintln!("[terminated]"); + std::process::exit(0); + } + // Raw mode swallows Ctrl-C; forward it as input to the child. + _ = sigint.recv() => { + protocol::write_frame(&mut main_writer, protocol::INPUT, &[0x03]).await?; + } + } + } + + reset_scroll_region(&mut stdout).await?; + + Ok(()) + }) +} diff --git a/src/broadcast.rs b/src/broadcast.rs index d05c3de..43ae791 100644 --- a/src/broadcast.rs +++ b/src/broadcast.rs @@ -39,11 +39,7 @@ pub struct OutputState { impl OutputState { pub fn new(config: &Config) -> Self { let (tx, _) = broadcast::channel(256); - let classifier = classify::from_config( - &config.classifier, - config.idle_threshold_ms, - config.debounce_ms, - ); + let classifier = classify::from_config(&config.classifier); Self { tx, last_output_at: AtomicU64::new(now_millis()), @@ -66,7 +62,8 @@ impl OutputState { pub fn idle_ms(&self) -> u32 { let last = self.last_output_at.load(Ordering::Relaxed); let now = now_millis(); - (now.saturating_sub(last)) as u32 + // Saturate at u32::MAX (~49 days) rather than wrapping. + now.saturating_sub(last).min(u32::MAX as u64) as u32 } /// Current process state. diff --git a/src/classify/claude.rs b/src/classify/claude.rs index 0a5f228..0e418ab 100644 --- a/src/classify/claude.rs +++ b/src/classify/claude.rs @@ -54,36 +54,56 @@ impl ClaudeClassifier { } // Check for tool use pattern: pause > 200ms followed by burst > 1KB. + // Iterate pairs directly from the deque — no Vec allocation needed. if self.window.len() >= 2 { - let events: Vec<&OutputEvent> = self.window.iter().rev().take(5).collect(); - for pair in events.windows(2) { - let newer = pair[0]; - let older = pair[1]; - let gap = newer.timestamp_ms.saturating_sub(older.timestamp_ms); - if gap > 200 && newer.byte_count > 1024 { - return ProcessState::ToolUse; + let skip = self.window.len().saturating_sub(5); + let mut iter = self.window.iter().skip(skip); + if let Some(mut prev) = iter.next() { + for event in iter { + let gap = event.timestamp_ms.saturating_sub(prev.timestamp_ms); + if gap > 200 && event.byte_count > 1024 { + return ProcessState::ToolUse; + } + prev = event; } } } // Analyse last 10 bursts for spinner vs streaming. - let recent: Vec<&OutputEvent> = self.window.iter().rev().take(10).collect(); - if recent.len() >= 5 { - let sizes: Vec = recent.iter().map(|e| e.byte_count as f64).collect(); - let mean_size = sizes.iter().sum::() / sizes.len() as f64; - let variance = - sizes.iter().map(|s| (s - mean_size).powi(2)).sum::() / sizes.len() as f64; - let stddev = variance.sqrt(); - - // Compute inter-burst gaps. - let gaps: Vec = recent - .windows(2) - .map(|pair| pair[0].timestamp_ms.saturating_sub(pair[1].timestamp_ms) as f64) - .collect(); - let mean_gap = if gaps.is_empty() { - 0.0 + // Compute statistics in a single pass — no Vec allocations. + let skip = self.window.len().saturating_sub(10); + let recent_count = self.window.len() - skip; + if recent_count >= 5 { + // Single pass: accumulate size sum, size sum-of-squares, gap sum. + let mut size_sum = 0.0_f64; + let mut size_sq_sum = 0.0_f64; + let mut gap_sum = 0.0_f64; + let mut gap_count = 0u32; + let mut prev_ts = 0u64; + let mut first = true; + + for event in self.window.iter().skip(skip) { + let s = event.byte_count as f64; + size_sum += s; + size_sq_sum += s * s; + if !first { + gap_sum += event.timestamp_ms.saturating_sub(prev_ts) as f64; + gap_count += 1; + } + prev_ts = event.timestamp_ms; + first = false; + } + + let n = recent_count as f64; + let mean_size = size_sum / n; + // Var = E[X^2] - E[X]^2 (numerically stable enough for our ranges). + let variance = (size_sq_sum / n) - (mean_size * mean_size); + let stddev = if variance > 0.0 { variance.sqrt() } else { 0.0 }; + + let mean_gap = if gap_count > 0 { + gap_sum / gap_count as f64 } else { - gaps.iter().sum::() / gaps.len() as f64 + 0.0 }; // Spinner: uniform small bursts (40-120 bytes), regular intervals (30-200ms). diff --git a/src/classify/mod.rs b/src/classify/mod.rs index 10d5a46..2822a72 100644 --- a/src/classify/mod.rs +++ b/src/classify/mod.rs @@ -60,17 +60,15 @@ pub trait StateClassifier: Send { } /// Create a classifier from config. -pub fn from_config( - config: &ClassifierConfig, - idle_threshold_ms: u64, - debounce_ms: u64, -) -> Box { +pub fn from_config(config: &ClassifierConfig) -> Box { match config { - ClassifierConfig::Claude => Box::new(claude::ClaudeClassifier::new( - idle_threshold_ms, - debounce_ms, + ClassifierConfig::Claude { .. } => Box::new(claude::ClaudeClassifier::new( + config.idle_threshold_ms(), + config.debounce_ms(), )), - ClassifierConfig::Simple => Box::new(simple::SimpleClassifier::new(idle_threshold_ms)), + ClassifierConfig::Simple { .. } => { + Box::new(simple::SimpleClassifier::new(config.idle_threshold_ms())) + } ClassifierConfig::None => Box::new(none::NoneClassifier), } } @@ -93,14 +91,19 @@ mod tests { /// from_config produces the correct classifier type for each variant. #[test] fn from_config_claude() { - let c = from_config(&ClassifierConfig::Claude, 3000, 200); + let c = from_config(&ClassifierConfig::Claude { + idle_threshold_ms: 3000, + debounce_ms: 200, + }); assert_eq!(c.state(), ProcessState::Idle); assert_eq!(c.state_name(0x01), "thinking"); // only claude knows "thinking" } #[test] fn from_config_simple() { - let c = from_config(&ClassifierConfig::Simple, 3000, 200); + let c = from_config(&ClassifierConfig::Simple { + idle_threshold_ms: 3000, + }); assert_eq!(c.state(), ProcessState::Idle); assert_eq!(c.state_name(0x04), "active"); // simple knows "active" assert_eq!(c.state_name(0x01), "unknown"); // simple doesn't know "thinking" @@ -108,7 +111,7 @@ mod tests { #[test] fn from_config_none() { - let c = from_config(&ClassifierConfig::None, 3000, 200); + let c = from_config(&ClassifierConfig::None); assert_eq!(c.state(), ProcessState::Idle); assert_eq!(c.state_name(0x01), "idle"); // none reports everything as idle } @@ -116,12 +119,17 @@ mod tests { /// Issue #6: classifier orthogonality — simple uses Active, claude never does. #[test] fn classifier_orthogonality() { - let mut simple = from_config(&ClassifierConfig::Simple, 3000, 200); + let mut simple = from_config(&ClassifierConfig::Simple { + idle_threshold_ms: 3000, + }); simple.record(100, 1000); assert_eq!(simple.state(), ProcessState::Active); // Claude with output does NOT go to Active — it goes to Thinking or similar. - let mut claude = from_config(&ClassifierConfig::Claude, 3000, 0); + let mut claude = from_config(&ClassifierConfig::Claude { + idle_threshold_ms: 3000, + debounce_ms: 0, + }); claude.record(100, 1000); assert_ne!(claude.state(), ProcessState::Active); } diff --git a/src/cli.rs b/src/cli.rs new file mode 100644 index 0000000..a8841a8 --- /dev/null +++ b/src/cli.rs @@ -0,0 +1,277 @@ +//! CLI definition (clap structs) and config merge logic. + +use crate::config; +use clap::{Parser, Subcommand}; +use std::path::PathBuf; + +#[derive(Parser)] +#[command( + name = "hm", + about = "PTY session supervisor", + version = concat!(env!("CARGO_PKG_VERSION"), " (", env!("HM_BUILD_TIME"), ")") +)] +pub struct Cli { + #[command(subcommand)] + pub command: Command, + + /// Path to config file. + #[arg(long, global = true)] + pub config: Option, +} + +#[derive(Subcommand)] +#[allow(clippy::large_enum_variant)] // CLI enum parsed once at startup +pub enum Command { + /// Launch a supervised session and attach to it. + Run { + /// Session identifier (used for socket filename). + #[arg(long)] + id: String, + /// Working directory for the child process. + #[arg(long, default_value = ".")] + workdir: PathBuf, + /// Directory for socket and pid files (overrides config). + #[arg(long)] + socket_dir: Option, + /// Terminal columns. + #[arg(long, default_value_t = 220)] + cols: u16, + /// Terminal rows. + #[arg(long, default_value_t = 50)] + rows: u16, + /// Run the supervisor in the background without attaching. + #[arg(long)] + detach: bool, + /// Log file path (overrides config). + /// Defaults to /.log. Set to /dev/null to disable. + #[arg(long)] + log_file: Option, + /// Log level: trace, debug, info, warn, error (overrides config). + #[arg(long)] + log_level: Option, + /// Additional tracing filter directives for dependency crates. + /// Uses EnvFilter syntax, e.g. "tokio=warn,nix=error". + #[arg(long)] + log_filter: Option, + /// Scrollback buffer size in bytes (overrides config). + #[arg(long)] + scrollback_bytes: Option, + /// State classifier: simple, claude, or none (overrides config). + /// When combined with --idle-threshold-ms / --debounce-ms, the + /// classifier is created with those params; otherwise it uses + /// the config file values or built-in defaults. + #[arg(long)] + classifier: Option, + /// Idle detection threshold in milliseconds (overrides classifier config). + #[arg(long)] + idle_threshold_ms: Option, + /// State debounce period in milliseconds (overrides classifier config). + /// Only meaningful for the claude classifier. + #[arg(long)] + debounce_ms: Option, + /// Signal the process group on kill (overrides config). + /// Use --kill-process-group or --no-kill-process-group. + #[arg(long, action = clap::ArgAction::Set, num_args = 0..=1, default_missing_value = "true")] + kill_process_group: Option, + /// Environment variable name for the session ID (overrides config). + #[arg(long)] + session_env_var: Option, + /// Child command and arguments (everything after --). + #[arg(trailing_var_arg = true, required = true)] + cmd: Vec, + }, + /// Attach to a running session (terminal passthrough). + Attach { + /// Session identifier to attach to. + id: String, + /// Directory for socket files (overrides config). + #[arg(long)] + socket_dir: Option, + }, + /// Query status of a session. + Status { + /// Session identifier. + id: String, + /// Directory for socket files (overrides config). + #[arg(long)] + socket_dir: Option, + }, + /// List active sessions. + #[command(name = "ls")] + List { + /// Directory for socket files (overrides config). + #[arg(long)] + socket_dir: Option, + }, + /// Kill a session (graceful shutdown). + Kill { + /// Session identifier. + id: String, + /// Directory for socket files (overrides config). + #[arg(long)] + socket_dir: Option, + }, + /// Remove orphaned log files from dead sessions. + /// Dry-run by default — pass --force to actually delete. + Clean { + /// Directory for session files (overrides config). + #[arg(long)] + socket_dir: Option, + /// Keep logs modified within this duration (e.g. 24h, 7d, 1h). + /// Default: 24h. + #[arg(long, default_value = "24h")] + older_than: String, + /// Actually delete files (default is dry-run). + #[arg(long)] + force: bool, + }, +} + +/// Bundled parameters for session launch. +pub struct SessionParams { + pub id: String, + pub workdir: PathBuf, + pub socket_dir: PathBuf, + pub cols: u16, + pub rows: u16, + pub cmd: Vec, + pub cfg: config::Config, + pub log_file: PathBuf, +} + +impl SessionParams { + /// Produce CLI arguments for `hm run --detach` re-exec. + /// + /// Used by `launch_and_attach` to spawn the supervisor as a background + /// process with the same resolved parameters. + pub fn to_detach_args(&self) -> Vec { + let mut args = vec![ + "run".into(), + "--id".into(), + self.id.clone(), + "--workdir".into(), + self.workdir.to_string_lossy().into_owned(), + "--socket-dir".into(), + self.socket_dir.to_string_lossy().into_owned(), + "--cols".into(), + self.cols.to_string(), + "--rows".into(), + self.rows.to_string(), + "--detach".into(), + "--log-file".into(), + self.log_file.to_string_lossy().into_owned(), + "--".into(), + ]; + args.extend(self.cmd.iter().cloned()); + args + } +} + +/// Raw CLI arguments for the `run` subcommand before config merge. +pub struct RunArgs { + pub id: String, + pub workdir: PathBuf, + pub socket_dir: Option, + pub cols: u16, + pub rows: u16, + pub log_file: Option, + pub log_level: Option, + pub log_filter: Option, + pub scrollback_bytes: Option, + pub classifier: Option, + pub idle_threshold_ms: Option, + pub debounce_ms: Option, + pub kill_process_group: Option, + pub session_env_var: Option, + pub cmd: Vec, +} + +/// Apply CLI overrides to a loaded config, returning `SessionParams`. +/// +/// Classifier merge logic: `--classifier` switches the type (fresh defaults); +/// `--idle-threshold-ms` / `--debounce-ms` override per-classifier params. +pub fn merge_run_args(cfg: config::Config, args: RunArgs) -> anyhow::Result { + let RunArgs { + id, + workdir, + socket_dir, + cols, + rows, + log_file, + log_level, + log_filter, + scrollback_bytes, + classifier, + idle_threshold_ms, + debounce_ms, + kill_process_group, + session_env_var, + cmd, + } = args; + let mut cfg = cfg; + let dir = socket_dir.unwrap_or_else(|| cfg.socket_dir.clone()); + + if let Some(v) = scrollback_bytes { + cfg.scrollback_bytes = v; + } + if let Some(v) = kill_process_group { + cfg.kill_process_group = v; + } + if let Some(v) = session_env_var { + cfg.session_env_var = v; + } + if let Some(v) = log_level { + cfg.log_level = v; + } + if let Some(v) = log_filter { + cfg.log_filter = Some(v); + } + + if let Some(v) = classifier { + cfg.classifier = match v.as_str() { + "simple" => config::ClassifierConfig::Simple { + idle_threshold_ms: idle_threshold_ms.unwrap_or(config::DEFAULT_IDLE_THRESHOLD_MS), + }, + "claude" => config::ClassifierConfig::Claude { + idle_threshold_ms: idle_threshold_ms.unwrap_or(config::DEFAULT_IDLE_THRESHOLD_MS), + debounce_ms: debounce_ms.unwrap_or(config::DEFAULT_DEBOUNCE_MS), + }, + "none" => config::ClassifierConfig::None, + other => { + anyhow::bail!("unknown classifier: {other} (expected simple, claude, or none)") + } + }; + } else { + cfg.classifier = match cfg.classifier { + config::ClassifierConfig::Simple { + idle_threshold_ms: existing, + } => config::ClassifierConfig::Simple { + idle_threshold_ms: idle_threshold_ms.unwrap_or(existing), + }, + config::ClassifierConfig::Claude { + idle_threshold_ms: existing_idle, + debounce_ms: existing_debounce, + } => config::ClassifierConfig::Claude { + idle_threshold_ms: idle_threshold_ms.unwrap_or(existing_idle), + debounce_ms: debounce_ms.unwrap_or(existing_debounce), + }, + config::ClassifierConfig::None => config::ClassifierConfig::None, + }; + } + + // log_file precedence: CLI > config > /.log + let log = log_file + .or(cfg.log_file.take()) + .unwrap_or_else(|| dir.join(format!("{id}.log"))); + + Ok(SessionParams { + id, + workdir, + socket_dir: dir, + cols, + rows, + cmd, + cfg, + log_file: log, + }) +} diff --git a/src/commands.rs b/src/commands.rs new file mode 100644 index 0000000..c9a69e1 --- /dev/null +++ b/src/commands.rs @@ -0,0 +1,259 @@ +//! Simple subcommands: status, list, kill, clean. + +use crate::util::{session_socket, with_runtime}; +use crate::{classify, config, protocol}; +use std::path::PathBuf; +use std::time::Duration; + +pub fn status(id: String, socket_dir: PathBuf, cfg: &config::Config) -> anyhow::Result<()> { + let socket_path = session_socket(&id, &socket_dir); + let classifier_config = cfg.classifier.clone(); + + with_runtime(async move { + let mut sess = protocol::Session::connect(&socket_path).await?; + let status = sess.recv_status().await?; + + let cls = classify::from_config(&classifier_config); + let state_name = cls.state_name(status.state); + + println!("session: {id}"); + println!("pid: {}", status.pid); + println!("idle_ms: {}", status.idle_ms); + println!("alive: {}", status.alive); + println!("classifier: {}", classifier_config.name()); + println!("state: {state_name} ({}ms)", status.state_ms); + + Ok(()) + }) +} + +pub fn list(socket_dir: PathBuf) -> anyhow::Result<()> { + if !socket_dir.exists() { + println!("No sessions directory found at {}", socket_dir.display()); + return Ok(()); + } + + let mut sessions: Vec = std::fs::read_dir(&socket_dir)? + .filter_map(|entry| { + let entry = entry.ok()?; + let path = entry.path(); + if path.extension().is_some_and(|ext| ext == "sock") { + path.file_stem().and_then(|s| s.to_str()).map(String::from) + } else { + None + } + }) + .collect(); + + sessions.sort(); + + // Filter to live sessions: check PID file liveness, clean up stale entries. + let mut live = Vec::new(); + for id in &sessions { + let pid_path = crate::util::pid_path(&socket_dir, id); + let socket_path = crate::util::socket_path(&socket_dir, id); + + let is_alive = crate::pidfile::PidFile::read(&pid_path).is_some_and(|pf| pf.any_alive()); + + if is_alive { + live.push(id.clone()); + } else { + let _ = std::fs::remove_file(&socket_path); + let _ = std::fs::remove_file(&pid_path); + } + } + + if live.is_empty() { + println!("No active sessions"); + } else { + println!("Active sessions:"); + for id in &live { + println!(" {id}"); + } + } + + Ok(()) +} + +pub fn kill(id: String, socket_dir: PathBuf) -> anyhow::Result<()> { + let socket_path = session_socket(&id, &socket_dir); + + with_runtime(async move { + let mut sess = protocol::Session::connect(&socket_path).await?; + + sess.send_kill().await?; + println!("Kill signal sent to session {id}"); + + Ok(()) + }) +} + +/// Parse a human-friendly duration string like "24h", "7d", "30m", "2h30m". +fn parse_duration(s: &str) -> anyhow::Result { + let mut total_secs: u64 = 0; + let mut num_buf = String::new(); + + for ch in s.chars() { + if ch.is_ascii_digit() { + num_buf.push(ch); + } else { + let n: u64 = num_buf + .parse() + .map_err(|_| anyhow::anyhow!("invalid duration: {s}"))?; + num_buf.clear(); + total_secs += match ch { + 's' => n, + 'm' => n * 60, + 'h' => n * 3600, + 'd' => n * 86400, + _ => anyhow::bail!("unknown duration unit '{ch}' in \"{s}\" (use s/m/h/d)"), + }; + } + } + + if !num_buf.is_empty() { + anyhow::bail!("trailing number without unit in \"{s}\" (use s/m/h/d)"); + } + if total_secs == 0 { + anyhow::bail!("duration must be greater than zero: \"{s}\""); + } + + Ok(Duration::from_secs(total_secs)) +} + +pub fn clean(socket_dir: PathBuf, older_than: &str, dry_run: bool) -> anyhow::Result<()> { + let max_age = parse_duration(older_than)?; + + if !socket_dir.exists() { + println!("No sessions directory found at {}", socket_dir.display()); + return Ok(()); + } + + let now = std::time::SystemTime::now(); + let mut removed = 0u32; + let mut skipped_live = 0u32; + let mut skipped_young = 0u32; + + for entry in std::fs::read_dir(&socket_dir)? { + let entry = entry?; + let path = entry.path(); + + // Only consider .log files. + if path.extension().is_none_or(|ext| ext != "log") { + continue; + } + + let id = match path.file_stem().and_then(|s| s.to_str()) { + Some(id) => id.to_string(), + None => continue, + }; + + // Skip if there's a live session for this ID. + let pid_path = crate::util::pid_path(&socket_dir, &id); + if crate::pidfile::PidFile::read(&pid_path).is_some_and(|pf| pf.any_alive()) { + skipped_live += 1; + continue; + } + + // Skip if modified within the retention window. + if let Ok(meta) = entry.metadata() + && let Ok(modified) = meta.modified() + && let Ok(age) = now.duration_since(modified) + && age < max_age + { + skipped_young += 1; + continue; + } + + if dry_run { + println!("would remove: {}", path.display()); + } else { + match std::fs::remove_file(&path) { + Ok(()) => { + println!("removed: {}", path.display()); + removed += 1; + } + Err(e) => { + eprintln!("warning: failed to remove {}: {e}", path.display()); + } + } + } + } + + if dry_run { + println!( + "Dry run complete. {} live, {} within retention window.", + skipped_live, skipped_young + ); + } else if removed == 0 { + println!("Nothing to clean."); + } else { + println!( + "Cleaned {removed} log file{}.", + if removed == 1 { "" } else { "s" } + ); + } + + Ok(()) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn parse_duration_hours() { + assert_eq!(parse_duration("24h").unwrap(), Duration::from_secs(86400)); + } + + #[test] + fn parse_duration_days() { + assert_eq!(parse_duration("7d").unwrap(), Duration::from_secs(604800)); + } + + #[test] + fn parse_duration_minutes() { + assert_eq!(parse_duration("30m").unwrap(), Duration::from_secs(1800)); + } + + #[test] + fn parse_duration_seconds() { + assert_eq!(parse_duration("90s").unwrap(), Duration::from_secs(90)); + } + + #[test] + fn parse_duration_compound() { + assert_eq!( + parse_duration("2h30m").unwrap(), + Duration::from_secs(2 * 3600 + 30 * 60) + ); + } + + #[test] + fn parse_duration_compound_days_hours() { + assert_eq!( + parse_duration("1d12h").unwrap(), + Duration::from_secs(86400 + 12 * 3600) + ); + } + + #[test] + fn parse_duration_trailing_number_errors() { + assert!(parse_duration("24").is_err()); + } + + #[test] + fn parse_duration_unknown_unit_errors() { + assert!(parse_duration("24w").is_err()); + } + + #[test] + fn parse_duration_empty_errors() { + assert!(parse_duration("").is_err()); + } + + #[test] + fn parse_duration_zero_errors() { + assert!(parse_duration("0h").is_err()); + } +} diff --git a/src/config.rs b/src/config.rs index fb8d28b..28d4cb1 100644 --- a/src/config.rs +++ b/src/config.rs @@ -9,6 +9,12 @@ use serde::Deserialize; use std::path::{Path, PathBuf}; +/// Default log level for the supervisor. +pub const DEFAULT_LOG_LEVEL: &str = "info"; + +/// Default detach key: Ctrl-\ (0x1C). +pub const DEFAULT_DETACH_KEY: u8 = 0x1C; + /// Root configuration. #[derive(Debug, Clone, Deserialize)] #[serde(default)] @@ -19,12 +25,8 @@ pub struct Config { pub scrollback_bytes: usize, /// Environment variable name set on child processes with the session ID. pub session_env_var: String, - /// State classifier to use. + /// State classifier to use, with per-classifier parameters. pub classifier: ClassifierConfig, - /// Idle detection threshold in milliseconds. - pub idle_threshold_ms: u64, - /// State debounce period in milliseconds. - pub debounce_ms: u64, /// Whether to signal the entire process group on kill/shutdown. /// /// When `true` (the default), SIGTERM/SIGKILL are sent to the process @@ -32,6 +34,19 @@ pub struct Config { /// Set to `false` to signal only the direct child, letting it manage its /// own descendants. pub kill_process_group: bool, + /// Log file path. When not set, defaults to `/.log`. + /// Set to `/dev/null` to disable logging. + pub log_file: Option, + /// Log level for heimdall's own messages (trace, debug, info, warn, error). + pub log_level: String, + /// Additional tracing filter directives for dependency crates. + /// Uses `tracing_subscriber::EnvFilter` syntax, e.g. "tokio=warn,nix=error". + /// `RUST_LOG` env var takes precedence over both `log_level` and `log_filter`. + pub log_filter: Option, + /// Detach key byte. When this byte appears in stdin input, the attach + /// client disconnects and the session keeps running in the background. + /// Default: `0x1C` (Ctrl-\). Set to `0` to disable. + pub detach_key: u8, /// Extra environment variables to inject into the child process. pub env: Vec, } @@ -43,28 +58,191 @@ impl Default for Config { scrollback_bytes: 64 * 1024, session_env_var: "HEIMDALL_SESSION_ID".into(), classifier: ClassifierConfig::default(), - idle_threshold_ms: 3000, - debounce_ms: 200, kill_process_group: true, + log_file: None, + log_level: DEFAULT_LOG_LEVEL.into(), + log_filter: None, + detach_key: DEFAULT_DETACH_KEY, env: Vec::new(), } } } -/// State classifier selection. -#[derive(Debug, Clone, Deserialize, Default)] -#[serde(rename_all = "lowercase")] +/// State classifier selection with per-classifier parameters. +/// +/// Supports two TOML representations: +/// +/// **String shorthand** (all defaults for the classifier): +/// ```toml +/// classifier = "simple" +/// ``` +/// +/// **Table form** (custom parameters): +/// ```toml +/// [classifier.claude] +/// idle_threshold_ms = 5000 +/// debounce_ms = 100 +/// ``` +#[derive(Debug, Clone, Deserialize)] +#[serde(try_from = "ClassifierRaw")] pub enum ClassifierConfig { /// Full state machine: idle, thinking, streaming, tool_use. /// Tuned for Claude Code's output patterns. - #[default] - Claude, + Claude { + idle_threshold_ms: u64, + debounce_ms: u64, + }, /// Simple binary: idle or active. - Simple, + Simple { idle_threshold_ms: u64 }, /// No state classification — always reports idle. None, } +impl Default for ClassifierConfig { + fn default() -> Self { + Self::Simple { + idle_threshold_ms: DEFAULT_IDLE_THRESHOLD_MS, + } + } +} + +impl ClassifierConfig { + /// Idle detection threshold in milliseconds. + pub fn idle_threshold_ms(&self) -> u64 { + match self { + Self::Claude { + idle_threshold_ms, .. + } => *idle_threshold_ms, + Self::Simple { idle_threshold_ms } => *idle_threshold_ms, + Self::None => 0, + } + } + + /// State debounce period in milliseconds (only meaningful for claude). + pub fn debounce_ms(&self) -> u64 { + match self { + Self::Claude { debounce_ms, .. } => *debounce_ms, + Self::Simple { .. } | Self::None => 0, + } + } + + /// The classifier type name as a string. + pub fn name(&self) -> &'static str { + match self { + Self::Claude { .. } => "claude", + Self::Simple { .. } => "simple", + Self::None => "none", + } + } +} + +/// Default idle detection threshold. +pub const DEFAULT_IDLE_THRESHOLD_MS: u64 = 3000; +/// Default debounce period. +pub const DEFAULT_DEBOUNCE_MS: u64 = 200; + +// -- Serde intermediate types for flexible TOML parsing -- + +/// Raw deserialization target that accepts both string and table forms. +#[derive(Deserialize)] +#[serde(untagged)] +enum ClassifierRaw { + /// `classifier = "simple"` — string shorthand with defaults. + Name(String), + /// `[classifier.claude]\nidle_threshold_ms = 5000` — table with one key. + Table(ClassifierTable), +} + +/// Table form: exactly one key (the classifier name) mapping to its params. +#[derive(Deserialize)] +struct ClassifierTable { + #[serde(default)] + claude: Option, + #[serde(default)] + simple: Option, + #[serde(default)] + none: Option, +} + +#[derive(Deserialize)] +#[serde(default)] +struct ClaudeParams { + idle_threshold_ms: u64, + debounce_ms: u64, +} + +impl Default for ClaudeParams { + fn default() -> Self { + Self { + idle_threshold_ms: DEFAULT_IDLE_THRESHOLD_MS, + debounce_ms: DEFAULT_DEBOUNCE_MS, + } + } +} + +#[derive(Deserialize)] +#[serde(default)] +struct SimpleParams { + idle_threshold_ms: u64, +} + +impl Default for SimpleParams { + fn default() -> Self { + Self { + idle_threshold_ms: DEFAULT_IDLE_THRESHOLD_MS, + } + } +} + +/// None classifier has no parameters, but we accept an empty table. +#[derive(Deserialize, Default)] +#[serde(default)] +struct NoneParams {} + +impl TryFrom for ClassifierConfig { + type Error = String; + + fn try_from(raw: ClassifierRaw) -> Result { + match raw { + ClassifierRaw::Name(name) => match name.as_str() { + "claude" => Ok(Self::Claude { + idle_threshold_ms: DEFAULT_IDLE_THRESHOLD_MS, + debounce_ms: DEFAULT_DEBOUNCE_MS, + }), + "simple" => Ok(Self::Simple { + idle_threshold_ms: DEFAULT_IDLE_THRESHOLD_MS, + }), + "none" => Ok(Self::None), + other => Err(format!( + "unknown classifier: {other} (expected simple, claude, or none)" + )), + }, + ClassifierRaw::Table(table) => { + let count = table.claude.is_some() as u8 + + table.simple.is_some() as u8 + + table.none.is_some() as u8; + if count != 1 { + return Err(format!( + "classifier table must have exactly one key (claude, simple, or none), got {count}" + )); + } + if let Some(p) = table.claude { + Ok(Self::Claude { + idle_threshold_ms: p.idle_threshold_ms, + debounce_ms: p.debounce_ms, + }) + } else if let Some(p) = table.simple { + Ok(Self::Simple { + idle_threshold_ms: p.idle_threshold_ms, + }) + } else { + Ok(Self::None) + } + } + } + } +} + /// An extra environment variable to inject into the child. #[derive(Debug, Clone, Deserialize)] pub struct EnvVar { @@ -162,8 +340,10 @@ mod tests { let config = load(None).unwrap(); assert_eq!(config.scrollback_bytes, 64 * 1024); assert_eq!(config.session_env_var, "HEIMDALL_SESSION_ID"); - assert_eq!(config.idle_threshold_ms, 3000); - assert_eq!(config.debounce_ms, 200); + assert_eq!(config.classifier.idle_threshold_ms(), 3000); + assert_eq!(config.classifier.debounce_ms(), 0); // simple has no debounce + assert!(config.log_file.is_none()); + assert_eq!(config.log_level, "info"); assert!(config.env.is_empty()); std::env::set_current_dir(original).unwrap(); @@ -190,35 +370,80 @@ mod tests { std::env::set_current_dir(original).unwrap(); } - /// Issue #4: classifier = "claude" deserializes correctly. + // -- String shorthand tests -- + + #[test] + fn deserialize_classifier_claude_string() { + let config: Config = toml::from_str(r#"classifier = "claude""#).unwrap(); + assert!(matches!(config.classifier, ClassifierConfig::Claude { .. })); + assert_eq!(config.classifier.idle_threshold_ms(), 3000); + assert_eq!(config.classifier.debounce_ms(), 200); + } + + #[test] + fn deserialize_classifier_simple_string() { + let config: Config = toml::from_str(r#"classifier = "simple""#).unwrap(); + assert!(matches!(config.classifier, ClassifierConfig::Simple { .. })); + assert_eq!(config.classifier.idle_threshold_ms(), 3000); + } + + #[test] + fn deserialize_classifier_none_string() { + let config: Config = toml::from_str(r#"classifier = "none""#).unwrap(); + assert!(matches!(config.classifier, ClassifierConfig::None)); + } + + // -- Table form tests -- + + #[test] + fn deserialize_classifier_claude_table() { + let toml_str = r#" +[classifier.claude] +idle_threshold_ms = 5000 +debounce_ms = 100 +"#; + let config: Config = toml::from_str(toml_str).unwrap(); + assert!(matches!(config.classifier, ClassifierConfig::Claude { .. })); + assert_eq!(config.classifier.idle_threshold_ms(), 5000); + assert_eq!(config.classifier.debounce_ms(), 100); + } + #[test] - fn deserialize_classifier_claude() { - let toml_str = r#"classifier = "claude""#; + fn deserialize_classifier_simple_table() { + let toml_str = r#" +[classifier.simple] +idle_threshold_ms = 7000 +"#; let config: Config = toml::from_str(toml_str).unwrap(); - assert!(matches!(config.classifier, ClassifierConfig::Claude)); + assert!(matches!(config.classifier, ClassifierConfig::Simple { .. })); + assert_eq!(config.classifier.idle_threshold_ms(), 7000); } - /// Issue #4: classifier = "simple" deserializes correctly. #[test] - fn deserialize_classifier_simple() { - let toml_str = r#"classifier = "simple""#; + fn deserialize_classifier_simple_table_defaults() { + let toml_str = r#" +[classifier.simple] +"#; let config: Config = toml::from_str(toml_str).unwrap(); - assert!(matches!(config.classifier, ClassifierConfig::Simple)); + assert!(matches!(config.classifier, ClassifierConfig::Simple { .. })); + assert_eq!(config.classifier.idle_threshold_ms(), 3000); } - /// Issue #4: classifier = "none" deserializes correctly. #[test] - fn deserialize_classifier_none() { - let toml_str = r#"classifier = "none""#; + fn deserialize_classifier_none_table() { + let toml_str = r#" +[classifier.none] +"#; let config: Config = toml::from_str(toml_str).unwrap(); assert!(matches!(config.classifier, ClassifierConfig::None)); } - /// Issue #4: default classifier is Claude. + /// Default classifier is Simple (general-purpose). #[test] - fn default_classifier_is_claude() { + fn default_classifier_is_simple() { let config = Config::default(); - assert!(matches!(config.classifier, ClassifierConfig::Claude)); + assert!(matches!(config.classifier, ClassifierConfig::Simple { .. })); + assert_eq!(config.classifier.idle_threshold_ms(), 3000); } /// Issue #4: env var injection config deserializes. @@ -248,9 +473,9 @@ value = "debug" socket_dir = "/tmp/test" scrollback_bytes = 4096 session_env_var = "MY_SESSION" -classifier = "simple" + +[classifier.simple] idle_threshold_ms = 5000 -debounce_ms = 100 [[env]] name = "FOO" @@ -260,17 +485,15 @@ value = "bar" assert_eq!(config.socket_dir, PathBuf::from("/tmp/test")); assert_eq!(config.scrollback_bytes, 4096); assert_eq!(config.session_env_var, "MY_SESSION"); - assert!(matches!(config.classifier, ClassifierConfig::Simple)); - assert_eq!(config.idle_threshold_ms, 5000); - assert_eq!(config.debounce_ms, 100); + assert!(matches!(config.classifier, ClassifierConfig::Simple { .. })); + assert_eq!(config.classifier.idle_threshold_ms(), 5000); assert_eq!(config.env.len(), 1); } /// Invalid classifier value produces an error. #[test] fn invalid_classifier_errors() { - let toml_str = r#"classifier = "bogus""#; - let result: Result = toml::from_str(toml_str); + let result: Result = toml::from_str(r#"classifier = "bogus""#); assert!(result.is_err()); } @@ -279,7 +502,7 @@ value = "bar" fn empty_toml_is_defaults() { let config: Config = toml::from_str("").unwrap(); assert_eq!(config.scrollback_bytes, 64 * 1024); - assert!(matches!(config.classifier, ClassifierConfig::Claude)); + assert!(matches!(config.classifier, ClassifierConfig::Simple { .. })); assert!(config.env.is_empty()); } @@ -293,11 +516,10 @@ value = "bar" /// Partial config: only one field set, rest defaults. #[test] fn partial_config_fills_defaults() { - let config: Config = toml::from_str("debounce_ms = 999").unwrap(); - assert_eq!(config.debounce_ms, 999); - assert_eq!(config.idle_threshold_ms, 3000); // default - assert_eq!(config.scrollback_bytes, 64 * 1024); // default - assert!(matches!(config.classifier, ClassifierConfig::Claude)); // default + let config: Config = toml::from_str("scrollback_bytes = 999").unwrap(); + assert_eq!(config.scrollback_bytes, 999); + assert_eq!(config.classifier.idle_threshold_ms(), 3000); // default + assert!(matches!(config.classifier, ClassifierConfig::Simple { .. })); // default } /// Zero scrollback_bytes is valid. @@ -319,9 +541,13 @@ value = "bar" fn load_explicit_path_works() { let tmp = tempfile::tempdir().unwrap(); let config_path = tmp.path().join("custom.toml"); - std::fs::write(&config_path, "idle_threshold_ms = 7777").unwrap(); + std::fs::write( + &config_path, + "[classifier.simple]\nidle_threshold_ms = 7777", + ) + .unwrap(); let config = load(Some(&config_path)).unwrap(); - assert_eq!(config.idle_threshold_ms, 7777); + assert_eq!(config.classifier.idle_threshold_ms(), 7777); } /// kill_process_group defaults to true. @@ -344,4 +570,73 @@ value = "bar" let result: Result = toml::from_str("scrollback_bytes = -1"); assert!(result.is_err()); } + + /// Classifier name() method returns correct strings. + #[test] + fn classifier_name_method() { + assert_eq!(ClassifierConfig::default().name(), "simple"); + assert_eq!( + ClassifierConfig::Claude { + idle_threshold_ms: 3000, + debounce_ms: 200 + } + .name(), + "claude" + ); + assert_eq!(ClassifierConfig::None.name(), "none"); + } + + /// log_file and log_level deserialize from config. + #[test] + fn deserialize_log_file_and_level() { + let toml_str = r#" +log_file = "/var/log/hm.log" +log_level = "debug" +"#; + let config: Config = toml::from_str(toml_str).unwrap(); + assert_eq!(config.log_file, Some(PathBuf::from("/var/log/hm.log"))); + assert_eq!(config.log_level, "debug"); + } + + /// log_file defaults to None, log_level defaults to "info". + #[test] + fn log_defaults() { + let config: Config = toml::from_str("").unwrap(); + assert!(config.log_file.is_none()); + assert_eq!(config.log_level, "info"); + } + + /// detach_key defaults to 0x1C (Ctrl-\). + #[test] + fn detach_key_default() { + let config: Config = toml::from_str("").unwrap(); + assert_eq!(config.detach_key, 0x1C); + } + + /// detach_key can be set to 0 to disable. + #[test] + fn detach_key_disabled() { + let config: Config = toml::from_str("detach_key = 0").unwrap(); + assert_eq!(config.detach_key, 0); + } + + /// detach_key can be set to a custom value. + #[test] + fn detach_key_custom() { + let config: Config = toml::from_str("detach_key = 17").unwrap(); + assert_eq!(config.detach_key, 17); // Ctrl-Q + } + + /// Multiple classifier keys in table form is an error. + #[test] + fn multiple_classifier_keys_errors() { + let toml_str = r#" +[classifier.claude] +idle_threshold_ms = 3000 +[classifier.simple] +idle_threshold_ms = 3000 +"#; + let result: Result = toml::from_str(toml_str); + assert!(result.is_err()); + } } diff --git a/src/main.rs b/src/main.rs index e83b1af..68182da 100644 --- a/src/main.rs +++ b/src/main.rs @@ -3,893 +3,95 @@ //! Owns the pty, manages process lifecycle, exposes a Unix socket for IPC. //! Everything else is a client. +mod attach; mod broadcast; mod classify; +mod cli; +mod commands; mod config; +mod pidfile; mod protocol; mod pty; mod socket; +mod supervisor; +mod terminal; +mod util; -use broadcast::OutputState; -use bytes::Bytes; -use clap::{Parser, Subcommand}; -use nix::sys::termios; -use socket::ServerState; -use std::os::fd::{AsFd, AsRawFd, BorrowedFd, FromRawFd, IntoRawFd}; -use std::os::unix::process::CommandExt; -use std::path::PathBuf; -use std::sync::Arc; -use std::sync::atomic::{AtomicBool, Ordering}; -use tokio::io::{AsyncReadExt, AsyncWriteExt}; -use tokio::net::UnixListener; - -#[derive(Parser)] -#[command( - name = "hm", - about = "PTY session supervisor", - version = concat!(env!("CARGO_PKG_VERSION"), " (", env!("HM_BUILD_TIME"), ")") -)] -struct Cli { - #[command(subcommand)] - command: Command, - - /// Path to config file. - #[arg(long, global = true)] - config: Option, -} - -#[derive(Subcommand)] -enum Command { - /// Launch a supervised session and attach to it. - Run { - /// Session identifier (used for socket filename). - #[arg(long)] - id: String, - /// Working directory for the child process. - #[arg(long, default_value = ".")] - workdir: PathBuf, - /// Directory for socket and pid files (overrides config). - #[arg(long)] - socket_dir: Option, - /// Terminal columns. - #[arg(long, default_value_t = 220)] - cols: u16, - /// Terminal rows. - #[arg(long, default_value_t = 50)] - rows: u16, - /// Run the supervisor in the background without attaching. - #[arg(long)] - detach: bool, - /// Child command and arguments (everything after --). - #[arg(trailing_var_arg = true, required = true)] - cmd: Vec, - }, - /// Attach to a running session (terminal passthrough). - Attach { - /// Session identifier to attach to. - id: String, - /// Directory for socket files (overrides config). - #[arg(long)] - socket_dir: Option, - }, - /// Query status of a session. - Status { - /// Session identifier. - id: String, - /// Directory for socket files (overrides config). - #[arg(long)] - socket_dir: Option, - }, - /// List active sessions. - #[command(name = "ls")] - List { - /// Directory for socket files (overrides config). - #[arg(long)] - socket_dir: Option, - }, - /// Kill a session (graceful shutdown). - Kill { - /// Session identifier. - id: String, - /// Directory for socket files (overrides config). - #[arg(long)] - socket_dir: Option, - }, -} +use clap::Parser; fn main() -> anyhow::Result<()> { // Parse CLI before any Tokio runtime — fork() must happen single-threaded. - let cli = Cli::parse(); - let cfg = config::load(cli.config.as_deref())?; + let args = cli::Cli::parse(); + let cfg = config::load(args.config.as_deref())?; - match cli.command { - Command::Run { + match args.command { + cli::Command::Run { id, workdir, socket_dir, cols, rows, detach, + log_file, + log_level, + log_filter, + scrollback_bytes, + classifier, + idle_threshold_ms, + debounce_ms, + kill_process_group, + session_env_var, cmd, } => { - let dir = socket_dir.unwrap_or_else(|| cfg.socket_dir.clone()); + let params = cli::merge_run_args( + cfg, + cli::RunArgs { + id, + workdir, + socket_dir, + cols, + rows, + log_file, + log_level, + log_filter, + scrollback_bytes, + classifier, + idle_threshold_ms, + debounce_ms, + kill_process_group, + session_env_var, + cmd, + }, + )?; if detach { - run_supervisor(id, workdir, dir, cols, rows, cmd, cfg) + supervisor::supervise(params) } else { - run_and_attach(id, workdir, dir, cols, rows, cmd, cfg) + attach::launch_and_attach(params) } } - Command::Attach { id, socket_dir } => { + cli::Command::Attach { id, socket_dir } => { let dir = socket_dir.unwrap_or_else(|| cfg.socket_dir.clone()); - run_attach(id, dir, &cfg) + attach::attach(id, dir, &cfg) } - Command::Status { id, socket_dir } => { + cli::Command::Status { id, socket_dir } => { let dir = socket_dir.unwrap_or_else(|| cfg.socket_dir.clone()); - run_status(id, dir, &cfg) + commands::status(id, dir, &cfg) } - Command::List { socket_dir } => { + cli::Command::List { socket_dir } => { let dir = socket_dir.unwrap_or_else(|| cfg.socket_dir.clone()); - run_list(dir) + commands::list(dir) } - Command::Kill { id, socket_dir } => { + cli::Command::Kill { id, socket_dir } => { let dir = socket_dir.unwrap_or_else(|| cfg.socket_dir.clone()); - run_kill(id, dir) - } - } -} - -/// Launch the supervisor as a background process and attach to it. -/// -/// Spawns `hm run --detach` as a child process, waits for the socket to -/// appear, then runs the normal attach flow. When the attach disconnects -/// (Ctrl-\ or session exit), the supervisor keeps running in the background. -fn run_and_attach( - id: String, - workdir: PathBuf, - socket_dir: PathBuf, - cols: u16, - rows: u16, - cmd: Vec, - cfg: config::Config, -) -> anyhow::Result<()> { - let exe = std::env::current_exe()?; - let mut child_args = vec![ - "run".to_string(), - "--id".to_string(), - id.clone(), - "--workdir".to_string(), - workdir.to_string_lossy().into_owned(), - "--socket-dir".to_string(), - socket_dir.to_string_lossy().into_owned(), - "--cols".to_string(), - cols.to_string(), - "--rows".to_string(), - rows.to_string(), - "--detach".to_string(), - "--".to_string(), - ]; - child_args.extend(cmd); - - // Spawn supervisor in background. Redirect stdio to /dev/null and - // call setsid() in the child so it becomes its own session leader. - // Without setsid(), closing the terminal (X button) sends SIGHUP to - // the entire session, killing the supervisor along with the attach client. - let _supervisor = unsafe { - std::process::Command::new(exe) - .args(&child_args) - .stdin(std::process::Stdio::null()) - .stdout(std::process::Stdio::null()) - .stderr(std::process::Stdio::null()) - .pre_exec(|| { - // Create a new session so the supervisor isn't killed when - // the parent terminal closes. - nix::unistd::setsid().map_err(|e| std::io::Error::from_raw_os_error(e as i32))?; - Ok(()) - }) - .spawn()? - }; - - // Wait for the socket to appear (supervisor needs a moment to bind). - let socket_path = socket_dir.join(format!("{id}.sock")); - let deadline = std::time::Instant::now() + std::time::Duration::from_secs(5); - while !socket_path.exists() { - if std::time::Instant::now() > deadline { - anyhow::bail!( - "Timed out waiting for supervisor socket at {}", - socket_path.display() - ); - } - std::thread::sleep(std::time::Duration::from_millis(20)); - } - - // Attach to the now-running session. - run_attach(id, socket_dir, &cfg) -} - -/// RAII guard that removes socket and PID files on drop. -/// Ensures cleanup even on panic or early `?` return. -struct CleanupGuard { - socket_path: PathBuf, - pid_path: PathBuf, -} - -impl Drop for CleanupGuard { - fn drop(&mut self) { - let _ = std::fs::remove_file(&self.socket_path); - let _ = std::fs::remove_file(&self.pid_path); - } -} - -fn run_supervisor( - id: String, - workdir: PathBuf, - socket_dir: PathBuf, - cols: u16, - rows: u16, - cmd: Vec, - cfg: config::Config, -) -> anyhow::Result<()> { - std::fs::create_dir_all(&socket_dir)?; - - let socket_path = socket_dir.join(format!("{id}.sock")); - let pid_path = socket_dir.join(format!("{id}.pid")); - - // Acquire exclusive lock on PID file to prevent TOCTOU races. - // Two `hm run` with the same ID will serialize on this lock. - use std::io::Write; - let pid_file = std::fs::OpenOptions::new() - .create(true) - .write(true) - .truncate(false) - .open(&pid_path)?; - use nix::fcntl::{Flock, FlockArg}; - let mut lock = match Flock::lock(pid_file, FlockArg::LockExclusiveNonblock) { - Ok(lock) => lock, - Err(_) => { - eprintln!( - "Session '{id}' is already running (PID file locked). \ - Use `hm kill {id}` first.", - ); - std::process::exit(1); - } - }; - - // Check if existing PID in the file is still alive. - if let Ok(contents) = std::fs::read_to_string(&pid_path) - && let Ok(pid) = contents.trim().parse::() - { - let alive = unsafe { nix::libc::kill(pid, 0) } == 0; - if alive { - eprintln!( - "Session '{id}' is already running (pid {pid}). \ - Use `hm kill {id}` first.", - ); - std::process::exit(1); + commands::kill(id, dir) } - } - - // Clean up stale socket - if socket_path.exists() { - std::fs::remove_file(&socket_path)?; - } - - let workdir = workdir.canonicalize()?; - - // Fork child BEFORE starting Tokio runtime (single-threaded requirement). - let pty_child = pty::spawn(&cmd, &workdir, &id, cols, rows, &cfg)?; - let child_pid = pty_child.pid; - let master_raw_fd = pty_child.master.as_raw_fd(); - - // Write PID to the locked file. - // Flock derefs to File, so we can use it directly. - { - use std::io::Seek; - let f: &mut std::fs::File = &mut lock; - f.set_len(0)?; - f.seek(std::io::SeekFrom::Start(0))?; - write!(f, "{}", child_pid.as_raw())?; - } - - // RAII cleanup — removes socket + PID on drop (panic, early return, normal exit). - let _cleanup = CleanupGuard { - socket_path: socket_path.clone(), - pid_path: pid_path.clone(), - }; - - tracing_subscriber::fmt() - .with_env_filter( - tracing_subscriber::EnvFilter::from_default_env() - .add_directive("heimdall=info".parse().unwrap()), - ) - .with_target(false) - .init(); - - tracing::info!( - session_id = %id, - child_pid = child_pid.as_raw(), - socket = %socket_path.display(), - "supervisor started" - ); - - // Single-threaded runtime — sufficient for our I/O workload. - let rt = tokio::runtime::Builder::new_current_thread() - .enable_all() - .build()?; - - let exit_code = rt.block_on(async move { - let listener = UnixListener::bind(&socket_path)?; - - let output = Arc::new(OutputState::new(&cfg)); - let alive = Arc::new(AtomicBool::new(true)); - - let server_state = Arc::new(ServerState { - output: Arc::clone(&output), - child_pid, - master_fd: master_raw_fd, - alive: Arc::clone(&alive), - exit_code: std::sync::atomic::AtomicI32::new(0), - kill_process_group: cfg.kill_process_group, - }); - - // Transfer ownership of the master fd to AsyncFd. - // into_raw_fd() consumes the OwnedFd without closing it. - let owned_fd = pty_child.master.into_raw_fd(); - // SAFETY: we just consumed the only owner; no double-close possible. - let owned = unsafe { std::os::fd::OwnedFd::from_raw_fd(owned_fd) }; - - // AsyncFd requires the fd to be non-blocking. openpty() returns - // blocking fds, so set O_NONBLOCK before wrapping. Without this, - // libc::read inside try_io blocks the entire runtime when the pty - // has no data (instead of returning EAGAIN). - let raw = owned.as_fd().as_raw_fd(); - let flags = unsafe { nix::libc::fcntl(raw, nix::libc::F_GETFL) }; - if flags == -1 { - return Err(std::io::Error::last_os_error()); - } - if unsafe { nix::libc::fcntl(raw, nix::libc::F_SETFL, flags | nix::libc::O_NONBLOCK) } == -1 - { - return Err(std::io::Error::last_os_error()); - } - - let master_async = tokio::io::unix::AsyncFd::new(owned)?; - - // Signal handlers - let mut sigchld = tokio::signal::unix::signal(tokio::signal::unix::SignalKind::child())?; - let mut sigterm = - tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate())?; - - // Spawn socket server - let server_state_clone = Arc::clone(&server_state); - tokio::spawn(async move { - socket::serve(listener, server_state_clone).await; - }); - - // Main event loop - let mut buf = [0u8; 4096]; - let exit_code: i32; - - loop { - tokio::select! { - // Read from pty master - ready = master_async.readable() => { - let mut guard = ready?; - match guard.try_io(|inner| { - let fd = inner.as_raw_fd(); - let n = unsafe { - nix::libc::read(fd, buf.as_mut_ptr().cast(), buf.len()) - }; - if n < 0 { - Err(std::io::Error::last_os_error()) - } else { - Ok(n as usize) - } - }) { - Ok(Ok(0)) => { - tracing::info!("pty EOF"); - exit_code = pty::wait_child(child_pid).unwrap_or(-1); - break; - } - Ok(Ok(n)) => { - let chunk = Bytes::copy_from_slice(&buf[..n]); - output.push(chunk); - } - Ok(Err(e)) => { - if e.raw_os_error() == Some(nix::libc::EIO) { - tracing::info!("pty EIO (child exited)"); - } else { - tracing::error!("pty read error: {e}"); - } - exit_code = pty::wait_child(child_pid).unwrap_or(-1); - break; - } - Err(_would_block) => { - continue; - } - } - } - - // SIGCHLD — child exited - _ = sigchld.recv() => { - tracing::info!("SIGCHLD received"); - exit_code = pty::wait_child(child_pid).unwrap_or(-1); - break; - } - - // SIGTERM — graceful shutdown - _ = sigterm.recv() => { - tracing::info!("SIGTERM received, shutting down"); - let _ = pty::send_sigterm(child_pid, cfg.kill_process_group); - tokio::time::sleep(std::time::Duration::from_secs(5)).await; - let _ = pty::send_sigkill(child_pid, cfg.kill_process_group); - exit_code = pty::wait_child(child_pid).unwrap_or(-1); - break; - } - } - } - - // Mark as dead and broadcast exit - alive.store(false, Ordering::Relaxed); - output.set_dead(); - socket::broadcast_exit(&server_state, exit_code).await; - - // Brief delay for clients to receive the exit frame - tokio::time::sleep(std::time::Duration::from_millis(100)).await; - - tracing::info!(exit_code, "supervisor exiting"); - - Ok::(exit_code) - })?; - - // Explicit cleanup before process::exit (which skips destructors). - // The CleanupGuard is a safety net for panics and early `?` returns only. - drop(_cleanup); - std::process::exit(exit_code); -} - -// ---- Attach subcommand ---- - -fn run_attach(id: String, socket_dir: PathBuf, _cfg: &config::Config) -> anyhow::Result<()> { - let socket_path = socket_dir.join(format!("{id}.sock")); - if !socket_path.exists() { - eprintln!("No session found: {id}"); - eprintln!("Socket not found at {}", socket_path.display()); - std::process::exit(1); - } - - let rt = tokio::runtime::Builder::new_current_thread() - .enable_all() - .build()?; - - rt.block_on(async move { - let stream = tokio::net::UnixStream::connect(&socket_path).await?; - let (read_half, mut write_half) = stream.into_split(); - let mut reader = tokio::io::BufReader::new(read_half); - - // Read mode byte - let mut mode = [0u8; 1]; - reader.read_exact(&mut mode).await?; - assert_eq!(mode[0], protocol::MODE_BINARY, "expected binary mode"); - - // Save terminal state and set raw mode - let stdin_raw_fd = std::io::stdin().as_raw_fd(); - let stdin_borrowed = unsafe { BorrowedFd::borrow_raw(stdin_raw_fd) }; - let original_termios = termios::tcgetattr(stdin_borrowed) - .map_err(|e| std::io::Error::from_raw_os_error(e as i32))?; - let mut raw = original_termios.clone(); - termios::cfmakeraw(&mut raw); - termios::tcsetattr(stdin_borrowed, termios::SetArg::TCSANOW, &raw) - .map_err(|e| std::io::Error::from_raw_os_error(e as i32))?; - - // Restore terminal on exit - let _restore = RestoreTermios(stdin_raw_fd, original_termios); - - let mut stdout = tokio::io::stdout(); - - // Set up status bar: reserve the bottom line via scroll region. - let (cols, rows) = terminal_size(); - let inner_rows = rows.saturating_sub(1).max(1); - setup_status_bar(&mut stdout, &id, cols, rows, None).await?; - - // Send RESIZE with inner_rows so the child sees the reduced height. - let mut resize_payload = [0u8; 4]; - resize_payload[0..2].copy_from_slice(&cols.to_be_bytes()); - resize_payload[2..4].copy_from_slice(&inner_rows.to_be_bytes()); - protocol::write_frame(&mut write_half, protocol::RESIZE, &resize_payload).await?; - - // Now subscribe — scrollback replay happens at the correct size. - protocol::write_frame(&mut write_half, protocol::SUBSCRIBE, &[]).await?; - - // Signal handlers - let mut sigwinch = - tokio::signal::unix::signal(tokio::signal::unix::SignalKind::window_change())?; - let mut sighup = - tokio::signal::unix::signal(tokio::signal::unix::SignalKind::hangup())?; - let mut sigterm = - tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate())?; - let mut sigint = - tokio::signal::unix::signal(tokio::signal::unix::SignalKind::interrupt())?; - - // Periodic status poll for the status bar (1 second). - let mut status_tick = tokio::time::interval(std::time::Duration::from_secs(1)); - status_tick.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Skip); - - // Track current terminal dimensions for status bar redraws. - let mut cur_cols = cols; - let mut cur_rows = rows; - - // Async stdin reader - let stdin = tokio::io::stdin(); - let mut stdin_reader = tokio::io::BufReader::new(stdin); - let mut stdin_buf = [0u8; 1024]; - - // Second socket for STATUS polling (the main socket is in SUBSCRIBE mode). - let status_stream = tokio::net::UnixStream::connect(&socket_path).await?; - let (status_read, mut status_write) = status_stream.into_split(); - let mut status_reader = tokio::io::BufReader::new(status_read); - // Read mode byte. - let mut smode = [0u8; 1]; - status_reader.read_exact(&mut smode).await?; - - loop { - tokio::select! { - // Socket -> stdout (pty output) - result = protocol::read_frame(&mut reader) => { - let (msg_type, payload): (u8, Bytes) = result?; - match msg_type { - protocol::OUTPUT => { - stdout.write_all(&payload).await?; - stdout.flush().await?; - } - protocol::EXIT => { - let code = if payload.len() >= 4 { - i32::from_be_bytes([payload[0], payload[1], payload[2], payload[3]]) - } else { - 0 - }; - // Reset scroll region before exiting - reset_scroll_region(&mut stdout).await?; - drop(_restore); - eprintln!("\r\n[session exited with code {code}]"); - std::process::exit(code); - } - _ => {} - } - } - - // stdin -> socket (user input) - n = stdin_reader.read(&mut stdin_buf) => { - let n = n?; - if n == 0 { - break; - } - // Detach key: Ctrl-\ (0x1C) - if stdin_buf[..n].contains(&0x1C) { - reset_scroll_region(&mut stdout).await?; - drop(_restore); - eprintln!("\r\n[detached from session {id}]"); - std::process::exit(0); - } - protocol::write_frame(&mut write_half, protocol::INPUT, &stdin_buf[..n]).await?; - } - - // SIGWINCH -> resize - _ = sigwinch.recv() => { - let (new_cols, new_rows) = terminal_size(); - cur_cols = new_cols; - cur_rows = new_rows; - let inner_rows = new_rows.saturating_sub(1).max(1); - setup_status_bar(&mut stdout, &id, new_cols, new_rows, None).await?; - - let mut payload = [0u8; 4]; - payload[0..2].copy_from_slice(&new_cols.to_be_bytes()); - payload[2..4].copy_from_slice(&inner_rows.to_be_bytes()); - protocol::write_frame(&mut write_half, protocol::RESIZE, &payload).await?; - } - - // Periodic status poll -> update status bar - _ = status_tick.tick() => { - // Send STATUS request on the dedicated connection. - if protocol::write_frame(&mut status_write, protocol::STATUS, &[]).await.is_ok() - && let Ok((msg_type, payload)) = protocol::read_frame(&mut status_reader).await - && msg_type == protocol::STATUS_RESP - && payload.len() >= 14 - { - let state_byte = payload[9]; - let state_ms = u32::from_be_bytes([ - payload[10], payload[11], payload[12], payload[13], - ]); - let info = StatusInfo { state_byte, state_ms }; - draw_status_bar(&mut stdout, &id, cur_cols, cur_rows, Some(&info)).await?; - } - } - - // SIGHUP/SIGTERM/SIGINT — terminal closed, killed, or interrupted. - _ = sighup.recv() => { - // Terminal is gone (X close) — can't write to stdout. - // Just restore termios and exit. - drop(_restore); - std::process::exit(0); - } - _ = sigterm.recv() => { - let _ = reset_scroll_region(&mut stdout).await; - drop(_restore); - eprintln!("\r\n[terminated]"); - std::process::exit(0); - } - _ = sigint.recv() => { - // Forward Ctrl-C to the session instead of exiting. - protocol::write_frame(&mut write_half, protocol::INPUT, &[0x03]).await?; - } - } - } - - // Clean up: reset scroll region - reset_scroll_region(&mut stdout).await?; - - Ok::<(), std::io::Error>(()) - })?; - - Ok(()) -} - -/// Info from a STATUS_RESP used to render the right side of the bar. -struct StatusInfo { - state_byte: u8, - state_ms: u32, -} - -/// Set up the scroll region, alt screen, and draw the initial status bar. -async fn setup_status_bar( - stdout: &mut tokio::io::Stdout, - session_id: &str, - cols: u16, - rows: u16, - info: Option<&StatusInfo>, -) -> std::io::Result<()> { - let inner_rows = rows.saturating_sub(1).max(1); - - // Switch to alternate screen buffer, clear, home cursor, set scroll region. - let setup = format!("\x1b[?1049h\x1b[2J\x1b[H\x1b[1;{inner_rows}r"); - stdout.write_all(setup.as_bytes()).await?; - - draw_status_bar(stdout, session_id, cols, rows, info).await -} - -/// Draw (or redraw) the status bar on the last line. -/// -/// Layout: -/// Left (green bg): [hm] session-id -/// Right (state color): state-name duration -/// Middle: dark fill -async fn draw_status_bar( - stdout: &mut tokio::io::Stdout, - session_id: &str, - cols: u16, - rows: u16, - info: Option<&StatusInfo>, -) -> std::io::Result<()> { - // Left segment: green background, black text. - let left = format!(" [hm] {session_id} "); - - // Right segment: state with colored background. - let (state_name, state_color) = match info { - Some(si) => match si.state_byte { - 0x00 => ("idle", "\x1b[42;30m"), // green bg - 0x01 => ("thinking", "\x1b[43;30m"), // yellow bg - 0x02 => ("streaming", "\x1b[44;37m"), // blue bg - 0x03 => ("tool_use", "\x1b[45;37m"), // magenta bg - 0x04 => ("active", "\x1b[46;30m"), // cyan bg - 0xFF => ("dead", "\x1b[41;37m"), // red bg - _ => ("unknown", "\x1b[47;30m"), // white bg - }, - None => ("...", "\x1b[100;37m"), // gray, waiting for first poll - }; - - let duration = info.map_or(String::new(), |si| { - let secs = si.state_ms / 1000; - if secs >= 60 { - format!(" {}m{}s ", secs / 60, secs % 60) - } else { - format!(" {}s ", secs) - } - }); - - let right = format!(" {state_name}{duration}"); - let right_visible_len = right.len(); - let left_visible_len = left.len(); - - // Middle fill: dark background. - let fill_len = (cols as usize).saturating_sub(left_visible_len + right_visible_len); - let fill = " ".repeat(fill_len); - - // Compose: save cursor, jump to last line, draw segments, restore cursor. - // \x1b[42;30m = green bg + black fg (left) - // \x1b[0m\x1b[48;5;236m = reset then dark gray bg (middle) - // state_color (right) - // \x1b[0m = reset - let bar = format!( - "\x1b7\x1b[{rows};1H\x1b[42;30m{left}\x1b[0m\x1b[48;5;236;37m{fill}\x1b[0m{state_color}{right}\x1b[0m\x1b8" - ); - - stdout.write_all(bar.as_bytes()).await?; - stdout.flush().await?; - Ok(()) -} - -/// Reset scroll region and switch back to the main screen buffer. -async fn reset_scroll_region(stdout: &mut tokio::io::Stdout) -> std::io::Result<()> { - // Reset scroll region, then leave alternate screen buffer. - // The original terminal content is restored (like exiting vim/tmux). - stdout.write_all(b"\x1b[r\x1b[?1049l").await?; - stdout.flush().await?; - Ok(()) -} - -/// RAII guard to restore terminal settings on drop. -struct RestoreTermios(i32, termios::Termios); - -impl Drop for RestoreTermios { - fn drop(&mut self) { - let fd = unsafe { BorrowedFd::borrow_raw(self.0) }; - let _ = termios::tcsetattr(fd, termios::SetArg::TCSANOW, &self.1); - } -} - -/// Get current terminal size via ioctl. -fn terminal_size() -> (u16, u16) { - unsafe { - let mut ws: nix::libc::winsize = std::mem::zeroed(); - if nix::libc::ioctl(std::io::stdin().as_raw_fd(), nix::libc::TIOCGWINSZ, &mut ws) == 0 { - (ws.ws_col, ws.ws_row) - } else { - (80, 24) - } - } -} - -// ---- Status subcommand ---- - -fn run_status(id: String, socket_dir: PathBuf, cfg: &config::Config) -> anyhow::Result<()> { - let socket_path = socket_dir.join(format!("{id}.sock")); - if !socket_path.exists() { - eprintln!("No session found: {id}"); - std::process::exit(1); - } - - let classifier_config = cfg.classifier.clone(); - let idle_threshold = cfg.idle_threshold_ms; - let debounce = cfg.debounce_ms; - - let rt = tokio::runtime::Builder::new_current_thread() - .enable_all() - .build()?; - - rt.block_on(async move { - let stream = tokio::net::UnixStream::connect(&socket_path).await?; - let (read_half, mut write_half) = stream.into_split(); - let mut reader = tokio::io::BufReader::new(read_half); - - let mut mode = [0u8; 1]; - reader.read_exact(&mut mode).await?; - - protocol::write_frame(&mut write_half, protocol::STATUS, &[]).await?; - - let (msg_type, payload) = protocol::read_frame(&mut reader).await?; - if msg_type == protocol::STATUS_RESP && payload.len() >= 9 { - let pid = u32::from_be_bytes([payload[0], payload[1], payload[2], payload[3]]); - let idle_ms = u32::from_be_bytes([payload[4], payload[5], payload[6], payload[7]]); - let alive = payload[8] != 0; - println!("session: {id}"); - println!("pid: {pid}"); - println!("idle_ms: {idle_ms}"); - println!("alive: {alive}"); - - // Use the configured classifier for state name resolution. - let cls = classify::from_config(&classifier_config, idle_threshold, debounce); - - if payload.len() >= 15 { - let state_byte = payload[9]; - let state_ms = - u32::from_be_bytes([payload[10], payload[11], payload[12], payload[13]]); - let state_name = cls.state_name(state_byte); - println!("state: {state_name} ({state_ms}ms)"); - } else if idle_ms > 3000 { - println!("state: idle"); - } else { - println!("state: active"); - } - } else { - eprintln!("unexpected response"); - } - - Ok::<(), std::io::Error>(()) - })?; - - Ok(()) -} - -// ---- List subcommand ---- - -fn run_list(socket_dir: PathBuf) -> anyhow::Result<()> { - if !socket_dir.exists() { - println!("No sessions directory found at {}", socket_dir.display()); - return Ok(()); - } - - let mut sessions: Vec = std::fs::read_dir(&socket_dir)? - .filter_map(|entry| { - let entry = entry.ok()?; - let path = entry.path(); - if path.extension().is_some_and(|ext| ext == "sock") { - path.file_stem().and_then(|s| s.to_str()).map(String::from) - } else { - None - } - }) - .collect(); - - sessions.sort(); - - // Filter to live sessions: check PID file liveness, clean up stale entries. - let mut live = Vec::new(); - for id in &sessions { - let pid_path = socket_dir.join(format!("{id}.pid")); - let socket_path = socket_dir.join(format!("{id}.sock")); - - let is_alive = pid_path - .exists() - .then(|| std::fs::read_to_string(&pid_path).ok()) - .flatten() - .and_then(|s| s.trim().parse::().ok()) - .is_some_and(|pid| unsafe { nix::libc::kill(pid, 0) } == 0); - - if is_alive { - live.push(id.clone()); - } else { - // Clean up stale socket + PID. - let _ = std::fs::remove_file(&socket_path); - let _ = std::fs::remove_file(&pid_path); - } - } - - if live.is_empty() { - println!("No active sessions"); - } else { - println!("Active sessions:"); - for id in &live { - println!(" {id}"); + cli::Command::Clean { + socket_dir, + older_than, + force, + } => { + let dir = socket_dir.unwrap_or_else(|| cfg.socket_dir.clone()); + commands::clean(dir, &older_than, !force) } } - - Ok(()) -} - -// ---- Kill subcommand ---- - -fn run_kill(id: String, socket_dir: PathBuf) -> anyhow::Result<()> { - let socket_path = socket_dir.join(format!("{id}.sock")); - if !socket_path.exists() { - eprintln!("No session found: {id}"); - std::process::exit(1); - } - - let rt = tokio::runtime::Builder::new_current_thread() - .enable_all() - .build()?; - - rt.block_on(async move { - let stream = tokio::net::UnixStream::connect(&socket_path).await?; - let (_read_half, mut write_half) = stream.into_split(); - - let mut mode = [0u8; 1]; - let mut reader = tokio::io::BufReader::new(_read_half); - reader.read_exact(&mut mode).await?; - - protocol::write_frame(&mut write_half, protocol::KILL, &[]).await?; - println!("Kill signal sent to session {id}"); - - Ok::<(), std::io::Error>(()) - })?; - - Ok(()) } diff --git a/src/pidfile.rs b/src/pidfile.rs new file mode 100644 index 0000000..4e45ae1 --- /dev/null +++ b/src/pidfile.rs @@ -0,0 +1,187 @@ +//! PID file abstraction. +//! +//! The PID file stores two PIDs, one per line: +//! +//! ```text +//! +//! +//! ``` +//! +//! Line 1 is the supervisor (the `hm` process that holds the flock). +//! Line 2 is the child (the pty-forked process: claude, bash, etc.). +//! Between fork and the child PID write, only line 1 is present. + +use std::io::{Seek, Write}; +use std::path::Path; + +/// Parsed contents of a PID file. +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct PidFile { + pub supervisor: i32, + pub child: Option, +} + +impl PidFile { + /// Parse a PID file from disk. Returns `None` if the file is missing, + /// empty, or does not contain a valid supervisor PID on line 1. + pub fn read(path: &Path) -> Option { + let contents = std::fs::read_to_string(path).ok()?; + Self::parse(&contents) + } + + /// Parse PID file contents from a string. + fn parse(contents: &str) -> Option { + let mut lines = contents.lines(); + + let supervisor: i32 = lines.next()?.trim().parse().ok()?; + if supervisor <= 0 { + return None; + } + + let child = lines + .next() + .and_then(|line| line.trim().parse::().ok()) + .filter(|&pid| pid > 0); + + Some(Self { supervisor, child }) + } + + /// Write the supervisor PID (line 1). Call this immediately after + /// acquiring the flock, before fork. + pub fn write_supervisor(f: &mut std::fs::File, pid: u32) -> std::io::Result<()> { + f.set_len(0)?; + f.seek(std::io::SeekFrom::Start(0))?; + writeln!(f, "{pid}")?; + f.flush() + } + + /// Append the child PID (line 2). Call this after fork. + pub fn write_child(f: &mut std::fs::File, pid: i32) -> std::io::Result<()> { + write!(f, "{pid}")?; + f.flush() + } + + /// Check whether a PID is alive using `kill(pid, 0)`. + /// + /// kill(pid, 0) is a POSIX probe: signal 0 is never delivered. The kernel + /// runs all existence and permission checks, then does nothing. + /// + /// Returns true if the process exists (even if owned by another user). + pub fn is_pid_alive(pid: i32) -> bool { + let ret = unsafe { nix::libc::kill(pid, 0) }; + // ret == 0 → process exists, we can signal it + // ret == -1/EPERM → process exists, different owner + // ret == -1/ESRCH → no such process + ret == 0 + || (ret == -1 + && std::io::Error::last_os_error().raw_os_error() == Some(nix::libc::EPERM)) + } + + /// Check whether the supervisor process is still alive. + pub fn supervisor_alive(&self) -> bool { + Self::is_pid_alive(self.supervisor) + } + + /// Check whether the child process is still alive (false if no child PID). + pub fn child_alive(&self) -> bool { + self.child.is_some_and(Self::is_pid_alive) + } + + /// Check whether either the supervisor or child is alive. + pub fn any_alive(&self) -> bool { + self.supervisor_alive() || self.child_alive() + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn parse_both_pids() { + let pf = PidFile::parse("1234\n5678").unwrap(); + assert_eq!(pf.supervisor, 1234); + assert_eq!(pf.child, Some(5678)); + } + + #[test] + fn parse_supervisor_only() { + let pf = PidFile::parse("1234\n").unwrap(); + assert_eq!(pf.supervisor, 1234); + assert_eq!(pf.child, None); + } + + #[test] + fn parse_supervisor_only_no_newline() { + let pf = PidFile::parse("1234").unwrap(); + assert_eq!(pf.supervisor, 1234); + assert_eq!(pf.child, None); + } + + #[test] + fn parse_trailing_newline() { + let pf = PidFile::parse("1234\n5678\n").unwrap(); + assert_eq!(pf.supervisor, 1234); + assert_eq!(pf.child, Some(5678)); + } + + #[test] + fn parse_whitespace() { + let pf = PidFile::parse(" 1234 \n 5678 \n").unwrap(); + assert_eq!(pf.supervisor, 1234); + assert_eq!(pf.child, Some(5678)); + } + + #[test] + fn parse_empty_returns_none() { + assert!(PidFile::parse("").is_none()); + } + + #[test] + fn parse_garbage_returns_none() { + assert!(PidFile::parse("not_a_pid").is_none()); + } + + #[test] + fn parse_zero_supervisor_returns_none() { + assert!(PidFile::parse("0\n5678").is_none()); + } + + #[test] + fn parse_negative_supervisor_returns_none() { + assert!(PidFile::parse("-1\n5678").is_none()); + } + + #[test] + fn parse_zero_child_treated_as_none() { + let pf = PidFile::parse("1234\n0").unwrap(); + assert_eq!(pf.supervisor, 1234); + assert_eq!(pf.child, None); + } + + #[test] + fn parse_negative_child_treated_as_none() { + let pf = PidFile::parse("1234\n-1").unwrap(); + assert_eq!(pf.supervisor, 1234); + assert_eq!(pf.child, None); + } + + #[test] + fn parse_garbage_child_treated_as_none() { + let pf = PidFile::parse("1234\ngarbage").unwrap(); + assert_eq!(pf.supervisor, 1234); + assert_eq!(pf.child, None); + } + + #[test] + fn is_pid_alive_current_process() { + let pid = std::process::id() as i32; + assert!(PidFile::is_pid_alive(pid)); + } + + #[test] + fn is_pid_alive_nonexistent() { + // PID 2^22 - 1 is almost certainly not in use. + assert!(!PidFile::is_pid_alive(4_194_303)); + } +} diff --git a/src/protocol.rs b/src/protocol.rs index 458c99e..4ed0f8b 100644 --- a/src/protocol.rs +++ b/src/protocol.rs @@ -59,11 +59,55 @@ pub fn pack_status(pid: u32, idle_ms: u32, alive: bool, state: u8, state_ms: u32 pack_frame(STATUS_RESP, &payload) } +/// Pack a resize payload: `[cols: u16 BE][rows: u16 BE]`. +pub fn pack_resize(cols: u16, rows: u16) -> [u8; 4] { + let mut payload = [0u8; 4]; + payload[0..2].copy_from_slice(&cols.to_be_bytes()); + payload[2..4].copy_from_slice(&rows.to_be_bytes()); + payload +} + /// Pack an exit notification payload. pub fn pack_exit(code: i32) -> Bytes { pack_frame(EXIT, &code.to_be_bytes()) } +/// Parse a RESIZE frame payload into `(cols, rows)`. +/// +/// Payload must be exactly 4 bytes per the protocol spec. +pub fn parse_resize(payload: &[u8]) -> io::Result<(u16, u16)> { + if payload.len() != 4 { + return Err(io::Error::new( + io::ErrorKind::InvalidData, + format!( + "RESIZE payload must be exactly 4 bytes, got {}", + payload.len() + ), + )); + } + let cols = u16::from_be_bytes([payload[0], payload[1]]); + let rows = u16::from_be_bytes([payload[2], payload[3]]); + Ok((cols, rows)) +} + +/// Parse an EXIT frame payload into the exit code. +/// +/// Payload must be exactly 4 bytes per the protocol spec. +pub fn parse_exit_code(payload: &[u8]) -> io::Result { + if payload.len() != 4 { + return Err(io::Error::new( + io::ErrorKind::InvalidData, + format!( + "EXIT payload must be exactly 4 bytes, got {}", + payload.len() + ), + )); + } + Ok(i32::from_be_bytes([ + payload[0], payload[1], payload[2], payload[3], + ])) +} + /// Maximum frame payload size (1 MB). Frames larger than this are rejected /// to prevent OOM from malicious or buggy clients. pub const MAX_FRAME_SIZE: usize = 1 << 20; @@ -74,7 +118,9 @@ pub const MAX_FRAME_SIZE: usize = 1 << 20; pub async fn read_frame(reader: &mut R) -> io::Result<(u8, Bytes)> { let mut header = [0u8; 5]; reader.read_exact(&mut header).await?; + // First byte is the message type. let msg_type = header[0]; + // Length of the payload in bytes (big-endian). let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]) as usize; if len == 0 { return Ok((msg_type, Bytes::new())); @@ -85,19 +131,147 @@ pub async fn read_frame(reader: &mut R) -> io::Result<( format!("frame payload {len} bytes exceeds maximum {MAX_FRAME_SIZE}"), )); } + // allocate a buffer and read the payload, we already know the length from the header, `len` let mut payload = vec![0u8; len]; reader.read_exact(&mut payload).await?; Ok((msg_type, Bytes::from(payload))) } /// Write one frame to an async writer. +/// +/// Writes the 5-byte header and payload separately rather than allocating +/// a combined buffer via `pack_frame`. Tokio's buffered writers coalesce +/// these into a single syscall. pub async fn write_frame( writer: &mut W, msg_type: u8, payload: &[u8], ) -> io::Result<()> { - let frame = pack_frame(msg_type, payload); - writer.write_all(&frame).await + let len = payload.len() as u32; + let mut header = [0u8; 5]; + header[0] = msg_type; + header[1..5].copy_from_slice(&len.to_be_bytes()); + writer.write_all(&header).await?; + writer.write_all(payload).await +} + +// -- Client session -- + +use std::path::Path; +use tokio::io::BufReader; +use tokio::net::unix::{OwnedReadHalf, OwnedWriteHalf}; + +/// Client-side session: connected socket with mode-byte handshake complete. +/// +/// For simple request-response commands, use [`send`](Self::send) and +/// [`recv`](Self::recv). For long-lived connections (subscribe + select loop), +/// access the raw `reader` and `writer` fields directly. +pub struct Session { + pub reader: BufReader, + pub writer: OwnedWriteHalf, +} + +impl Session { + /// Connect to a session's Unix socket and perform the mode-byte handshake. + pub async fn connect(socket_path: &Path) -> io::Result { + let stream = tokio::net::UnixStream::connect(socket_path).await?; + let (read_half, writer) = stream.into_split(); + let mut reader = BufReader::new(read_half); + + let mut mode = [0u8; 1]; + reader.read_exact(&mut mode).await?; + if mode[0] != MODE_BINARY { + return Err(io::Error::new( + io::ErrorKind::InvalidData, + format!( + "expected binary mode (0x{MODE_BINARY:02x}), got 0x{:02x}", + mode[0] + ), + )); + } + + Ok(Self { reader, writer }) + } + + /// Send a frame to the supervisor. + pub async fn send(&mut self, msg_type: u8, payload: &[u8]) -> io::Result<()> { + write_frame(&mut self.writer, msg_type, payload).await + } + + /// Read one frame from the supervisor. + pub async fn recv(&mut self) -> io::Result<(u8, Bytes)> { + read_frame(&mut self.reader).await + } + + /// Send a RESIZE frame. + pub async fn send_resize(&mut self, cols: u16, rows: u16) -> io::Result<()> { + self.send(RESIZE, &pack_resize(cols, rows)).await + } + + /// Send a SUBSCRIBE frame to start receiving pty output. + pub async fn subscribe(&mut self) -> io::Result<()> { + self.send(SUBSCRIBE, &[]).await + } + + /// Request a status response. + pub async fn send_status(&mut self) -> io::Result<()> { + self.send(STATUS, &[]).await + } + + /// Send STATUS and parse the response into a [`StatusResponse`]. + pub async fn recv_status(&mut self) -> io::Result { + self.send_status().await?; + let (msg_type, payload) = self.recv().await?; + if msg_type != STATUS_RESP { + return Err(io::Error::new( + io::ErrorKind::InvalidData, + format!("expected STATUS_RESP (0x{STATUS_RESP:02x}), got 0x{msg_type:02x}"), + )); + } + StatusResponse::parse(&payload) + } + + /// Send a KILL frame to terminate the session. + pub async fn send_kill(&mut self) -> io::Result<()> { + self.send(KILL, &[]).await + } +} + +/// Parsed status response from the supervisor. +#[derive(Debug, Clone)] +pub struct StatusResponse { + pub pid: u32, + pub idle_ms: u32, + pub alive: bool, + pub state: u8, + pub state_ms: u32, +} + +impl StatusResponse { + /// Parse from a STATUS_RESP payload (minimum 14 bytes). + fn parse(payload: &[u8]) -> io::Result { + if payload.len() < 14 { + return Err(io::Error::new( + io::ErrorKind::InvalidData, + format!( + "STATUS_RESP payload too short: {} bytes (need 14)", + payload.len() + ), + )); + } + let pid = u32::from_be_bytes([payload[0], payload[1], payload[2], payload[3]]); + let idle_ms = u32::from_be_bytes([payload[4], payload[5], payload[6], payload[7]]); + let alive = payload[8] != 0; + let state = payload[9]; + let state_ms = u32::from_be_bytes([payload[10], payload[11], payload[12], payload[13]]); + Ok(Self { + pid, + idle_ms, + alive, + state, + state_ms, + }) + } } #[cfg(test)] @@ -352,4 +526,150 @@ mod tests { assert_eq!(frame[0], STATUS); assert_eq!(&frame[1..5], &[0, 0, 0, 0]); } + + // -- Wire-level golden byte tests -- + // These assert exact byte sequences to catch symmetric pack/parse bugs. + // If both sides have the same field-swap bug, round-trip tests pass + // but the wire format is silently wrong. + + /// pack_frame produces exact wire bytes for a known input. + #[test] + fn pack_frame_golden_bytes() { + let frame = pack_frame(INPUT, b"\xDE\xAD"); + // [type=0x01][len=0x00000002][payload=0xDEAD] + assert_eq!(frame.as_ref(), &[0x01, 0x00, 0x00, 0x00, 0x02, 0xDE, 0xAD]); + } + + /// pack_exit produces exact 9-byte wire format. + #[test] + fn pack_exit_golden_bytes() { + let frame = pack_exit(137); // 128 + SIGKILL(9) + // [type=0x83][len=0x00000004][code=0x00000089] + assert_eq!( + frame.as_ref(), + &[0x83, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00, 0x89] + ); + } + + /// pack_exit with negative code produces correct two's complement bytes. + #[test] + fn pack_exit_negative_golden_bytes() { + let frame = pack_exit(-1); + // [type=0x83][len=0x00000004][code=0xFFFFFFFF] + assert_eq!( + frame.as_ref(), + &[0x83, 0x00, 0x00, 0x00, 0x04, 0xFF, 0xFF, 0xFF, 0xFF] + ); + } + + /// pack_resize produces exact 4-byte payload layout. + #[test] + fn pack_resize_golden_bytes() { + let payload = pack_resize(120, 40); + // [cols=0x0078][rows=0x0028] + assert_eq!(payload, [0x00, 0x78, 0x00, 0x28]); + } + + /// pack_status produces exact 20-byte wire format with every field at a known offset. + #[test] + fn pack_status_golden_bytes() { + let frame = pack_status( + 0x0000_1234, // pid + 0x0000_0042, // idle_ms = 66 + true, // alive + 0x02, // state = streaming + 0x0000_04D2, // state_ms = 1234 + ); + assert_eq!(frame.len(), 20); + #[rustfmt::skip] + let expected: [u8; 20] = [ + 0x82, // type = STATUS_RESP + 0x00, 0x00, 0x00, 0x0F, // len = 15 + 0x00, 0x00, 0x12, 0x34, // pid + 0x00, 0x00, 0x00, 0x42, // idle_ms + 0x01, // alive + 0x02, // state + 0x00, 0x00, 0x04, 0xD2, // state_ms + 0x00, // reserved + ]; + assert_eq!(frame.as_ref(), &expected); + } + + /// parse_resize against hand-constructed bytes (not from pack_resize). + #[test] + fn parse_resize_from_raw_bytes() { + // 80 cols = 0x0050, 24 rows = 0x0018 + let raw = [0x00, 0x50, 0x00, 0x18]; + let (cols, rows) = parse_resize(&raw).unwrap(); + assert_eq!(cols, 80); + assert_eq!(rows, 24); + } + + /// parse_exit_code against hand-constructed bytes (not from pack_exit). + #[test] + fn parse_exit_code_from_raw_bytes() { + // exit code 42 = 0x0000002A + let raw = [0x00, 0x00, 0x00, 0x2A]; + let code = parse_exit_code(&raw).unwrap(); + assert_eq!(code, 42); + } + + /// parse_exit_code with signal death (negative via two's complement). + #[test] + fn parse_exit_code_negative_from_raw_bytes() { + // -9 in two's complement = 0xFFFFFFF7 + let raw = [0xFF, 0xFF, 0xFF, 0xF7]; + let code = parse_exit_code(&raw).unwrap(); + assert_eq!(code, -9); + } + + /// parse_resize rejects short payload. + #[test] + fn parse_resize_rejects_short() { + assert!(parse_resize(&[0x00, 0x50]).is_err()); + assert!(parse_resize(&[]).is_err()); + } + + /// parse_resize rejects oversized payload (spec says exactly 4). + #[test] + fn parse_resize_rejects_oversized() { + assert!(parse_resize(&[0x00, 0x50, 0x00, 0x18, 0xFF]).is_err()); + } + + /// parse_exit_code rejects short payload. + #[test] + fn parse_exit_code_rejects_short() { + assert!(parse_exit_code(&[0x00, 0x00]).is_err()); + assert!(parse_exit_code(&[]).is_err()); + } + + /// parse_exit_code rejects oversized payload (spec says exactly 4). + #[test] + fn parse_exit_code_rejects_oversized() { + assert!(parse_exit_code(&[0x00, 0x00, 0x00, 0x2A, 0xFF]).is_err()); + } + + /// Dead process golden bytes: alive=0, state=0xFF at correct offsets. + #[test] + fn pack_status_dead_golden_bytes() { + let frame = pack_status( + 0x0000_002A, // pid = 42 + 0x0000_270F, // idle_ms = 9999 + false, // alive = dead + 0xFF, // state = Dead + 0x0000_01F4, // state_ms = 500 + ); + #[rustfmt::skip] + let expected: [u8; 20] = [ + 0x82, // type = STATUS_RESP + 0x00, 0x00, 0x00, 0x0F, // len = 15 + 0x00, 0x00, 0x00, 0x2A, // pid = 42 + 0x00, 0x00, 0x27, 0x0F, // idle_ms = 9999 + 0x00, // alive = false + 0xFF, // state = Dead + 0x00, 0x00, 0x01, 0xF4, // state_ms = 500 + 0x00, // reserved + ]; + assert_eq!(frame.as_ref(), &expected); + } } diff --git a/src/socket.rs b/src/socket.rs index 9fd8151..cf5afe8 100644 --- a/src/socket.rs +++ b/src/socket.rs @@ -2,6 +2,7 @@ use crate::broadcast::OutputState; use crate::protocol::{self, INPUT, KILL, OUTPUT, RESIZE, STATUS, SUBSCRIBE}; +use bytes::Bytes; use nix::unistd::Pid; use std::sync::Arc; use std::sync::atomic::{AtomicBool, AtomicI32, Ordering}; @@ -55,7 +56,7 @@ async fn handle_client(stream: UnixStream, state: &ServerState) -> std::io::Resu match msg_type { INPUT => { - write_to_pty(state, payload.to_vec()).await?; + write_to_pty(state, payload).await?; } SUBSCRIBE => { handle_subscribed(state, reader, &mut write_half).await?; @@ -99,7 +100,7 @@ async fn handle_subscribed( result = rx.recv() => { match result { Ok(chunk) => { - if !state.alive.load(Ordering::Relaxed) { + if !state.alive.load(Ordering::Acquire) { // After alive=false, the only broadcast is the EXIT frame. // Write it directly (it's already fully framed). writer.write_all(&chunk).await?; @@ -112,8 +113,9 @@ async fn handle_subscribed( } Err(broadcast::error::RecvError::Closed) => { // Channel closed — send EXIT frame if we haven't already. - let code = state.exit_code.load(Ordering::Relaxed); - if !state.alive.load(Ordering::Relaxed) { + // Use Acquire so the exit_code store in broadcast_exit is visible. + let code = state.exit_code.load(Ordering::Acquire); + if !state.alive.load(Ordering::Acquire) { let exit_frame = protocol::pack_exit(code); let _ = writer.write_all(&exit_frame).await; } @@ -127,7 +129,7 @@ async fn handle_subscribed( let (msg_type, payload) = result?; match msg_type { INPUT => { - write_to_pty(state, payload.to_vec()).await?; + write_to_pty(state, payload).await?; } RESIZE => { handle_resize(state, &payload)?; @@ -149,8 +151,9 @@ async fn handle_subscribed( /// Write raw bytes to the pty master fd. /// Checks `alive` before writing to avoid writing to a closed/reused fd. -async fn write_to_pty(state: &ServerState, data: Vec) -> std::io::Result<()> { - if !state.alive.load(Ordering::Relaxed) { +/// Accepts `Bytes` to avoid copying the payload out of the frame buffer. +async fn write_to_pty(state: &ServerState, data: Bytes) -> std::io::Result<()> { + if !state.alive.load(Ordering::Acquire) { return Err(std::io::Error::new( std::io::ErrorKind::BrokenPipe, "child process has exited", @@ -173,7 +176,7 @@ async fn write_to_pty(state: &ServerState, data: Vec) -> std::io::Result<()> async fn send_status(state: &ServerState, writer: &mut OwnedWriteHalf) -> std::io::Result<()> { let pid = state.child_pid.as_raw() as u32; let idle_ms = state.output.idle_ms(); - let alive = state.alive.load(Ordering::Relaxed); + let alive = state.alive.load(Ordering::Acquire); let process_state = state.output.process_state() as u8; let state_ms = state.output.state_ms(); let frame = protocol::pack_status(pid, idle_ms, alive, process_state, state_ms); @@ -182,20 +185,8 @@ async fn send_status(state: &ServerState, writer: &mut OwnedWriteHalf) -> std::i /// Handle a RESIZE frame. fn handle_resize(state: &ServerState, payload: &[u8]) -> std::io::Result<()> { - if payload.len() >= 4 { - let cols = u16::from_be_bytes([payload[0], payload[1]]); - let rows = u16::from_be_bytes([payload[2], payload[3]]); - crate::pty::set_winsize_raw(state.master_fd, cols, rows)?; - let _ = crate::pty::send_sigwinch(state.child_pid); - } + let (cols, rows) = protocol::parse_resize(payload)?; + crate::pty::set_winsize_raw(state.master_fd, cols, rows)?; + let _ = crate::pty::send_sigwinch(state.child_pid); Ok(()) } - -/// Broadcast an exit notification to all subscribers. -/// -/// Stores the exit code and sends the fully-framed EXIT message through the -/// broadcast channel. Subscribers write it directly (not wrapped in OUTPUT). -pub async fn broadcast_exit(state: &Arc, exit_code: i32) { - state.exit_code.store(exit_code, Ordering::Relaxed); - let _ = state.output.tx.send(protocol::pack_exit(exit_code)); -} diff --git a/src/supervisor.rs b/src/supervisor.rs new file mode 100644 index 0000000..4a3ac30 --- /dev/null +++ b/src/supervisor.rs @@ -0,0 +1,421 @@ +//! Supervisor event loop: fork child, bind socket, multiplex I/O. + +use crate::broadcast::OutputState; +use crate::cli::SessionParams; +use crate::socket::ServerState; +use crate::{pty, socket}; +use bytes::BytesMut; +use std::os::fd::{AsFd, AsRawFd, FromRawFd, IntoRawFd}; +use std::path::{Path, PathBuf}; +use std::sync::Arc; +use std::sync::atomic::{AtomicBool, Ordering}; +use tokio::net::UnixListener; + +/// Grace period between SIGTERM and SIGKILL on shutdown. +const SIGKILL_GRACE: std::time::Duration = std::time::Duration::from_secs(5); +/// Delay after broadcasting exit to let in-flight socket writes drain. +const EXIT_DRAIN_DELAY: std::time::Duration = std::time::Duration::from_millis(100); + +/// Look up process details from `/proc` for a diagnostic message. +/// Returns a human-readable string like `"uptime 2h 15m, cmd: hm run --id foo -- bash"`. +/// Returns empty string if procfs is unavailable (non-Linux, permission denied, etc). +fn proc_detail(pid: i32) -> String { + let proc = PathBuf::from(format!("/proc/{pid}")); + let mut parts = Vec::new(); + + if let Some(uptime) = proc_uptime(&proc) { + parts.push(format_uptime(uptime)); + } + if let Some(cmd) = proc_cmdline(&proc) { + parts.push(format!("cmd: {cmd}")); + } + + parts.join(", ") +} + +/// Read process uptime in seconds from `/proc//stat` and `/proc/uptime`. +fn proc_uptime(proc: &Path) -> Option { + let stat = std::fs::read_to_string(proc.join("stat")).ok()?; + let uptime_s = std::fs::read_to_string("/proc/uptime").ok()?; + + let ticks_per_sec = unsafe { nix::libc::sysconf(nix::libc::_SC_CLK_TCK) }; + if ticks_per_sec <= 0 { + return None; + } + + // Field 22 (1-indexed) is starttime in clock ticks since boot. + // Field 2 (comm) can contain spaces/parens, so find the closing ')' first. + let after_comm = stat.rfind(')')? + 2; + let fields: Vec<&str> = stat[after_comm..].split_whitespace().collect(); + // After ')': field 3 = index 0, so field 22 = index 19. + let starttime: u64 = fields.get(19)?.parse().ok()?; + + let boot_secs = uptime_s.split_whitespace().next()?.parse::().ok()? as u64; + let start_secs = starttime / ticks_per_sec as u64; + + Some(boot_secs.saturating_sub(start_secs)) +} + +/// Read the command line from `/proc//cmdline`. +fn proc_cmdline(proc: &Path) -> Option { + let raw = std::fs::read_to_string(proc.join("cmdline")).ok()?; + let cmd = raw.replace('\0', " ").trim().to_string(); + if cmd.is_empty() { None } else { Some(cmd) } +} + +/// Format seconds into a human-readable uptime string. +fn format_uptime(secs: u64) -> String { + let days = secs / 86400; + let hours = (secs % 86400) / 3600; + let mins = (secs % 3600) / 60; + if days > 0 { + format!("uptime {days}d {hours}h") + } else if hours > 0 { + format!("uptime {hours}h {mins}m") + } else { + format!("uptime {mins}m") + } +} + +/// Called when flock fails — another supervisor holds the PID lock. +/// Retries reading the PID file (the holder may not have written yet), +/// then prints diagnostics and exits. +fn die_session_locked(id: &str, pid_path: &Path) -> ! { + use crate::pidfile::PidFile; + + // The lock holder may not have written the PID yet — retry briefly. + let mut pids = None; + for _ in 0..20 { + if let Some(pf) = PidFile::read(pid_path) { + pids = Some(pf); + break; + } + std::thread::sleep(std::time::Duration::from_millis(50)); + } + + match pids { + Some(pf) => { + let detail = proc_detail(pf.supervisor); + if detail.is_empty() { + eprintln!( + "Session '{id}' is locked by supervisor pid {}. \ + Use `hm kill {id}` first.", + pf.supervisor, + ); + } else { + eprintln!( + "Session '{id}' is locked by supervisor pid {} ({detail}). \ + Use `hm kill {id}` first.", + pf.supervisor, + ); + } + } + None => { + eprintln!( + "Session '{id}' is locked by another process \ + (could not read PID after 1s). Use `hm kill {id}` first.", + ); + } + } + + std::process::exit(1); +} + +/// RAII guard that removes socket and PID files on drop. +/// Ensures cleanup even on panic or early `?` return. +struct CleanupGuard { + socket_path: PathBuf, + pid_path: PathBuf, +} + +impl Drop for CleanupGuard { + fn drop(&mut self) { + let _ = std::fs::remove_file(&self.socket_path); + let _ = std::fs::remove_file(&self.pid_path); + } +} + +pub fn supervise(params: SessionParams) -> anyhow::Result<()> { + let SessionParams { + id, + workdir, + socket_dir, + cols, + rows, + cmd, + cfg, + log_file, + } = params; + std::fs::create_dir_all(&socket_dir)?; + + let socket_path = crate::util::socket_path(&socket_dir, &id); + let pid_path = crate::util::pid_path(&socket_dir, &id); + + // Acquire exclusive lock on PID file to prevent TOCTOU races. + let pid_file = std::fs::OpenOptions::new() + .create(true) + .read(true) + .write(true) + .truncate(false) + .open(&pid_path)?; + use nix::fcntl::{Flock, FlockArg}; + let mut lock = match Flock::lock(pid_file, FlockArg::LockExclusiveNonblock) { + Ok(lock) => lock, + Err(_) => die_session_locked(&id, &pid_path), + }; + + // kill(pid, 0) is a POSIX probe: signal 0 is never delivered. The kernel + // runs all existence and permission checks, then does nothing. It's the + // standard Unix "is this PID alive?" idiom. + // + // Check if existing PIDs in the file are still alive. + // kill(pid, 0) == 0 → process exists, we can signal it (alive → bail) + // kill(pid, 0) == -1/EPERM → process exists, different owner (alive → bail) + // kill(pid, 0) == -1/ESRCH → no such process (stale → fall through) + if let Some(pf) = crate::pidfile::PidFile::read(&pid_path) + && pf.any_alive() + { + let display_pid = pf.child.unwrap_or(pf.supervisor); + let detail = proc_detail(pf.supervisor); + if detail.is_empty() { + eprintln!( + "Session '{id}' is already running (pid {display_pid}). \ + Use `hm kill {id}` first.", + ); + } else { + eprintln!( + "Session '{id}' is already running (pid {display_pid}, {detail}). \ + Use `hm kill {id}` first.", + ); + } + std::process::exit(1); + } + + // Clean up stale socket. Ignore NotFound (race or already gone), + // but surface anything else (permissions, filesystem errors). + if let Err(e) = std::fs::remove_file(&socket_path) + && e.kind() != std::io::ErrorKind::NotFound + { + return Err(e.into()); + } + + let workdir = workdir.canonicalize()?; + + // Write supervisor PID (line 1) before fork — if we crash between here + // and fork, the PID file still identifies who held the lock. + { + let f: &mut std::fs::File = &mut lock; + crate::pidfile::PidFile::write_supervisor(f, std::process::id())?; + } + + // Fork child BEFORE starting Tokio runtime (single-threaded requirement). + let pty_child = pty::spawn(&cmd, &workdir, &id, cols, rows, &cfg)?; + let child_pid = pty_child.pid; + + // Append child PID (line 2) after fork. + { + let f: &mut std::fs::File = &mut lock; + crate::pidfile::PidFile::write_child(f, child_pid.as_raw())?; + } + + // RAII cleanup — removes socket + PID on drop (panic, early return, normal exit). + let _cleanup = CleanupGuard { + socket_path: socket_path.clone(), + pid_path: pid_path.clone(), + }; + + // Log to file, never to stderr (which would corrupt the terminal or + // vanish in detach mode). RUST_LOG env var takes precedence over both + // log_level (heimdall's own level) and log_filter (dependency crates). + let log_writer = std::fs::OpenOptions::new() + .create(true) + .append(true) + .open(&log_file)?; + let mut env_filter = tracing_subscriber::EnvFilter::from_default_env() + .add_directive(format!("heimdall={}", cfg.log_level).parse().unwrap()); + if let Some(ref filter) = cfg.log_filter { + for directive in filter.split(',') { + let directive = directive.trim(); + if !directive.is_empty() { + if let Ok(d) = directive.parse() { + env_filter = env_filter.add_directive(d); + } else { + eprintln!("warning: ignoring invalid log_filter directive: {directive}"); + } + } + } + } + tracing_subscriber::fmt() + .with_env_filter(env_filter) + .with_target(false) + .with_ansi(false) + .with_writer(log_writer) + .init(); + + tracing::info!( + session_id = %id, + child_pid = child_pid.as_raw(), + socket = %socket_path.display(), + "supervisor started" + ); + + // Single-threaded runtime — sufficient for our I/O workload. + let rt = tokio::runtime::Builder::new_current_thread() + .enable_all() + .build()?; + + let exit_code = rt.block_on(event_loop(socket_path, cfg, child_pid, pty_child.master))?; + + drop(_cleanup); + std::process::exit(exit_code); +} + +/// Async event loop: bind socket, multiplex pty reads, signals, and client connections. +async fn event_loop( + socket_path: PathBuf, + cfg: crate::config::Config, + child_pid: nix::unistd::Pid, + master_fd: std::os::fd::OwnedFd, +) -> Result { + let listener = UnixListener::bind(&socket_path)?; + let master_raw_fd = master_fd.as_raw_fd(); + + let output = Arc::new(OutputState::new(&cfg)); + let alive = Arc::new(AtomicBool::new(true)); + + let server_state = Arc::new(ServerState { + output: Arc::clone(&output), + child_pid, + master_fd: master_raw_fd, + alive: Arc::clone(&alive), + exit_code: std::sync::atomic::AtomicI32::new(0), + kill_process_group: cfg.kill_process_group, + }); + + // Prepare the master fd for tokio's async reactor. Four steps: + // + // 1. Consume the OwnedFd via into_raw_fd(). We already captured the + // raw fd number (line above) for ServerState before this point. + // 2. Wrap the raw fd back into a fresh OwnedFd. This isn't a no-op — + // it transfers ownership from the function parameter into a local + // that AsyncFd will consume. + // 3. Set O_NONBLOCK. openpty() returns blocking fds, but tokio's + // AsyncFd requires non-blocking so epoll can drive readiness + // without stalling the single-threaded runtime. + // 4. Register with tokio's reactor via AsyncFd::new(). After this, + // the event loop can `await` readability on the master fd. + let owned_fd = master_fd.into_raw_fd(); + // SAFETY: we just consumed the only owner; no double-close possible. + let owned = unsafe { std::os::fd::OwnedFd::from_raw_fd(owned_fd) }; + + let raw = owned.as_fd().as_raw_fd(); + let flags = unsafe { nix::libc::fcntl(raw, nix::libc::F_GETFL) }; + if flags == -1 { + return Err(std::io::Error::last_os_error()); + } + if unsafe { nix::libc::fcntl(raw, nix::libc::F_SETFL, flags | nix::libc::O_NONBLOCK) } == -1 { + return Err(std::io::Error::last_os_error()); + } + + let master_async = tokio::io::unix::AsyncFd::new(owned)?; + + // Signal handlers for the hm supervisor process. + let mut sigchld = tokio::signal::unix::signal(tokio::signal::unix::SignalKind::child())?; + // recv when kill, kill -15, or kill -TERM (graceful shutdown) are sent + let mut sigterm = tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate())?; + + // Spawn socket server + let server_state_clone = Arc::clone(&server_state); + tokio::spawn(async move { + socket::serve(listener, server_state_clone).await; + }); + + // Main event loop + // Use BytesMut so we can freeze() a zero-copy Bytes handle per read, + // rather than Bytes::copy_from_slice which memcpy's into a new alloc. + let mut read_buf = BytesMut::with_capacity(8192); + let mut buf = [0u8; 4096]; + let exit_code: i32; + + loop { + tokio::select! { + // Child pty master is readable — drain available bytes into scrollback. + ready = master_async.readable() => { + let mut guard = ready?; + match guard.try_io(|inner| { + let fd = inner.as_raw_fd(); + let n = unsafe { + nix::libc::read(fd, buf.as_mut_ptr().cast(), buf.len()) + }; + if n < 0 { + Err(std::io::Error::last_os_error()) + } else { + Ok(n as usize) + } + }) { + Ok(Ok(0)) => { + tracing::info!("pty EOF"); + exit_code = pty::wait_child(child_pid).unwrap_or(-1); + break; + } + Ok(Ok(n)) => { + // Extend the BytesMut and split off a frozen Bytes. + // When read_buf has no other outstanding Bytes handles + // (the common case — broadcast subscribers hold their + // own refcounted views), this reuses the same backing + // allocation instead of malloc+memcpy per read. + read_buf.extend_from_slice(&buf[..n]); + let chunk = read_buf.split().freeze(); + output.push(chunk); + } + Ok(Err(e)) => { + if e.raw_os_error() == Some(nix::libc::EIO) { + tracing::info!("pty EIO (child exited)"); + } else { + tracing::error!("pty read error: {e}"); + } + exit_code = pty::wait_child(child_pid).unwrap_or(-1); + break; + } + Err(_would_block) => { + continue; + } + } + } + + // Child exited — reap it and exit the loop. + _ = sigchld.recv() => { + tracing::info!("SIGCHLD received"); + exit_code = pty::wait_child(child_pid).unwrap_or(-1); + break; + } + + // Supervisor asked to stop — SIGTERM the child, SIGKILL after grace period. + _ = sigterm.recv() => { + tracing::info!("SIGTERM received, shutting down"); + let _ = pty::send_sigterm(child_pid, cfg.kill_process_group); + tokio::time::sleep(SIGKILL_GRACE).await; + let _ = pty::send_sigkill(child_pid, cfg.kill_process_group); + exit_code = pty::wait_child(child_pid).unwrap_or(-1); + break; + } + } + } + + output.set_dead(); + // Store the exit code before broadcasting so subscribers see it on Acquire load. + server_state.exit_code.store(exit_code, Ordering::Release); + // Broadcast the EXIT frame first, then mark alive=false. This ordering + // guarantees that any subscriber that observes alive=false has already + // received (or will receive) the EXIT frame as the last channel message, + // never raw OUTPUT chunks written without their frame wrapper. + let exit_frame = crate::protocol::pack_exit(exit_code); + let _ = server_state.output.tx.send(exit_frame); + alive.store(false, Ordering::Release); + + tokio::time::sleep(EXIT_DRAIN_DELAY).await; + + tracing::info!(exit_code, "supervisor exiting"); + + Ok(exit_code) +} diff --git a/src/terminal.rs b/src/terminal.rs new file mode 100644 index 0000000..c693062 --- /dev/null +++ b/src/terminal.rs @@ -0,0 +1,211 @@ +//! Terminal utilities: ANSI sequences, status bar rendering, termios guard. + +use nix::sys::termios; +use std::os::fd::{AsRawFd, BorrowedFd}; +use tokio::io::AsyncWriteExt; + +// -- ANSI escape sequences -- +// Named consts instead of inline literals for readability. + +/// Save cursor position. +const SAVE_CURSOR: &str = "\x1b7"; +/// Restore cursor position. +const RESTORE_CURSOR: &str = "\x1b8"; +/// Reset all attributes. +const RESET: &str = "\x1b[0m"; +/// Switch to alternate screen buffer. +const ALT_SCREEN_ON: &str = "\x1b[?1049h"; +/// Leave alternate screen buffer. +const ALT_SCREEN_OFF: &str = "\x1b[?1049l"; +/// Clear entire screen. +const CLEAR_SCREEN: &str = "\x1b[2J"; +/// Move cursor to home position (1,1). +const CURSOR_HOME: &str = "\x1b[H"; +/// Reset scroll region to full screen. +const SCROLL_REGION_RESET: &str = "\x1b[r"; +/// CSI prefix for parameterized sequences. +const CSI: &str = "\x1b["; + +// Status bar colors — SGR sequences. +const GREEN_BG_BLACK_FG: &str = "\x1b[42;30m"; +const DARK_GRAY_BG_WHITE_FG: &str = "\x1b[48;5;236;37m"; +const YELLOW_BG_BLACK_FG: &str = "\x1b[43;30m"; +const BLUE_BG_WHITE_FG: &str = "\x1b[44;37m"; +const MAGENTA_BG_WHITE_FG: &str = "\x1b[45;37m"; +const CYAN_BG_BLACK_FG: &str = "\x1b[46;30m"; +const RED_BG_WHITE_FG: &str = "\x1b[41;37m"; +const WHITE_BG_BLACK_FG: &str = "\x1b[47;30m"; +const GRAY_BG_WHITE_FG: &str = "\x1b[100;37m"; + +/// Info from a STATUS_RESP used to render the right side of the bar. +pub struct StatusInfo { + pub state_byte: u8, + pub state_ms: u32, +} + +/// Get current terminal size via ioctl. +pub fn terminal_size() -> std::io::Result<(u16, u16)> { + unsafe { + let mut ws: nix::libc::winsize = std::mem::zeroed(); + if nix::libc::ioctl(std::io::stdin().as_raw_fd(), nix::libc::TIOCGWINSZ, &mut ws) == 0 { + Ok((ws.ws_col, ws.ws_row)) + } else { + Err(std::io::Error::last_os_error()) + } + } +} + +/// Set up the scroll region, alt screen, and draw the initial status bar. +/// +/// Returns the inner row count (total rows minus the status bar line). +/// Callers should use this for pty RESIZE frames so the child sees the +/// correct usable height. +pub async fn setup_status_bar( + stdout: &mut tokio::io::Stdout, + session_id: &str, + cols: u16, + rows: u16, + info: Option<&StatusInfo>, +) -> std::io::Result { + let inner_rows = rows.saturating_sub(1).max(1); + + let setup = format!("{ALT_SCREEN_ON}{CLEAR_SCREEN}{CURSOR_HOME}{CSI}1;{inner_rows}r"); + stdout.write_all(setup.as_bytes()).await?; + + draw_status_bar(stdout, session_id, cols, rows, info).await?; + Ok(inner_rows) +} + +/// Update scroll region and redraw status bar after a terminal resize. +/// +/// Unlike [`setup_status_bar`], this does not switch to the alt screen or +/// clear — it just adjusts the scroll region to the new dimensions and +/// redraws the bar. +/// +/// Returns the inner row count for the RESIZE frame. +pub async fn resize_status_bar( + stdout: &mut tokio::io::Stdout, + session_id: &str, + cols: u16, + rows: u16, + info: Option<&StatusInfo>, +) -> std::io::Result { + let inner_rows = rows.saturating_sub(1).max(1); + + let region = format!("{CSI}1;{inner_rows}r"); + stdout.write_all(region.as_bytes()).await?; + + draw_status_bar(stdout, session_id, cols, rows, info).await?; + Ok(inner_rows) +} + +/// Draw (or redraw) the status bar on the last line. +/// +/// Layout: +/// Left (green bg): [hm] session-id +/// Right (state color): state-name duration +/// Middle: dark fill +/// +/// Uses a single pre-sized buffer and `write!` to minimize allocations. +/// This runs every second for status bar updates. +pub async fn draw_status_bar( + stdout: &mut tokio::io::Stdout, + session_id: &str, + cols: u16, + rows: u16, + info: Option<&StatusInfo>, +) -> std::io::Result<()> { + use std::fmt::Write as FmtWrite; + + // Pre-size: ANSI escapes (~100 bytes) + session_id + fill (up to cols) + state name. + // 256 covers most terminals without reallocation. + let mut bar = String::with_capacity(256 + cols as usize); + + let (state_name, state_color) = match info { + Some(si) => match si.state_byte { + 0x00 => ("idle", GREEN_BG_BLACK_FG), + 0x01 => ("thinking", YELLOW_BG_BLACK_FG), + 0x02 => ("streaming", BLUE_BG_WHITE_FG), + 0x03 => ("tool_use", MAGENTA_BG_WHITE_FG), + 0x04 => ("active", CYAN_BG_BLACK_FG), + 0xFF => ("dead", RED_BG_WHITE_FG), + _ => ("unknown", WHITE_BG_BLACK_FG), + }, + None => ("...", GRAY_BG_WHITE_FG), + }; + + // Compute left/right content lengths for fill calculation. + let left_len = " [hm] ".len() + session_id.len(); + let mut right_len = 1 + state_name.len(); // " " + state_name + + // Compute duration suffix length without allocating. + let duration_secs = info.map(|si| si.state_ms / 1000); + if let Some(secs) = duration_secs { + if secs >= 60 { + // " Xm Ys " — estimate digit count + right_len += 2 + digit_count(secs / 60) + 1 + digit_count(secs % 60) + 2; + } else { + // " Xs " + right_len += 1 + digit_count(secs) + 2; + } + } + + let fill_len = (cols as usize).saturating_sub(left_len + right_len); + + // Build the bar in one pass. + let _ = write!( + bar, + "{SAVE_CURSOR}{CSI}{rows};1H{GREEN_BG_BLACK_FG} [hm] {session_id} {RESET}{DARK_GRAY_BG_WHITE_FG}" + ); + for _ in 0..fill_len { + bar.push(' '); + } + let _ = write!(bar, "{RESET}{state_color} {state_name}"); + if let Some(secs) = duration_secs { + if secs >= 60 { + let _ = write!(bar, " {}m{}s ", secs / 60, secs % 60); + } else { + let _ = write!(bar, " {}s ", secs); + } + } + let _ = write!(bar, "{RESET}{RESTORE_CURSOR}"); + + stdout.write_all(bar.as_bytes()).await?; + stdout.flush().await?; + Ok(()) +} + +/// Count decimal digits in a u32 (used for status bar layout calculation). +fn digit_count(n: u32) -> usize { + if n == 0 { + return 1; + } + let mut count = 0; + let mut v = n; + while v > 0 { + count += 1; + v /= 10; + } + count +} + +/// Reset scroll region and switch back to the main screen buffer. +pub async fn reset_scroll_region(stdout: &mut tokio::io::Stdout) -> std::io::Result<()> { + stdout.write_all(SCROLL_REGION_RESET.as_bytes()).await?; + stdout.write_all(ALT_SCREEN_OFF.as_bytes()).await?; + stdout.flush().await?; + Ok(()) +} + +/// RAII guard to restore terminal settings on drop. +pub struct RestoreTermios { + pub fd: i32, + pub original: termios::Termios, +} + +impl Drop for RestoreTermios { + fn drop(&mut self) { + let fd = unsafe { BorrowedFd::borrow_raw(self.fd) }; + let _ = termios::tcsetattr(fd, termios::SetArg::TCSANOW, &self.original); + } +} diff --git a/src/util.rs b/src/util.rs new file mode 100644 index 0000000..575c021 --- /dev/null +++ b/src/util.rs @@ -0,0 +1,34 @@ +//! Shared utilities used across subcommands. + +use std::path::{Path, PathBuf}; + +/// Socket path for a session: `/.sock`. +pub fn socket_path(dir: &Path, id: &str) -> PathBuf { + dir.join(format!("{id}.sock")) +} + +/// PID file path for a session: `/.pid`. +pub fn pid_path(dir: &Path, id: &str) -> PathBuf { + dir.join(format!("{id}.pid")) +} + +/// Resolve socket path and bail if the session doesn't exist. +pub fn session_socket(id: &str, socket_dir: &Path) -> PathBuf { + let path = socket_path(socket_dir, id); + if !path.exists() { + eprintln!("No session found: {id}"); + std::process::exit(1); + } + path +} + +/// Build a single-threaded tokio runtime and run an async closure. +pub fn with_runtime(f: F) -> anyhow::Result +where + F: std::future::Future>, +{ + let rt = tokio::runtime::Builder::new_current_thread() + .enable_all() + .build()?; + rt.block_on(f) +} diff --git a/tests/integration.rs b/tests/integration.rs index 4d44788..649f614 100644 --- a/tests/integration.rs +++ b/tests/integration.rs @@ -90,8 +90,8 @@ fn duplicate_session_rejected_when_alive() { ); let stderr = String::from_utf8_lossy(&output.stderr); assert!( - stderr.contains("already running"), - "stderr should mention 'already running': {stderr}" + stderr.contains("already running") || stderr.contains("is locked by"), + "stderr should mention session conflict: {stderr}" ); // Clean up first session. @@ -224,11 +224,24 @@ fn status_over_socket() { let mut payload = [0u8; 15]; stream.read_exact(&mut payload).unwrap(); - // Parse fields. + // Parse fields: [pid: u32][idle_ms: u32][alive: u8][state: u8][state_ms: u32] let pid = u32::from_be_bytes([payload[0], payload[1], payload[2], payload[3]]); + let idle_ms = u32::from_be_bytes([payload[4], payload[5], payload[6], payload[7]]); let alive = payload[8]; + let state = payload[9]; + let state_ms = u32::from_be_bytes([payload[10], payload[11], payload[12], payload[13]]); assert!(pid > 0, "pid should be nonzero"); assert_eq!(alive, 1, "alive should be 1"); + assert!( + idle_ms < 5000, + "idle_ms should be small for a just-started session, got {idle_ms}" + ); + // State: 0x00=idle, 0x01=thinking, 0x02=streaming, 0x03=tool_use, 0x04=active. + assert!( + state <= 0x04, + "state should be valid (0x00-0x04), got {state:#x}" + ); + let _ = state_ms; // state_ms is valid at any value let _ = child.kill(); let _ = child.wait(); @@ -278,8 +291,7 @@ fn kill_subcommand_terminates_session() { assert!(kill_output.status.success(), "kill command should succeed"); // The supervisor should exit within a few seconds. - let status = child.wait().expect("failed to wait for child"); - let _ = status; + let _status = child.wait().expect("failed to wait for child"); // Socket and PID files should be cleaned up after kill. let pid_path = socket_dir.join(format!("{session_id}.pid")); @@ -474,6 +486,15 @@ fn status_works_after_subscriber_disconnect() { header[0], 0x82, "should get STATUS_RESP after subscriber disconnect" ); + let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]); + assert_eq!(len, 15, "STATUS_RESP payload must be 15 bytes"); + + let mut payload = [0u8; 15]; + stream.read_exact(&mut payload).unwrap(); + let pid = u32::from_be_bytes([payload[0], payload[1], payload[2], payload[3]]); + let alive = payload[8]; + assert!(pid > 0, "pid should be nonzero after subscriber disconnect"); + assert_eq!(alive, 1, "alive should be 1 after subscriber disconnect"); } let _ = child.kill(); @@ -642,13 +663,24 @@ fn concurrent_status_queries() { let mut mode = [0u8; 1]; stream.read_exact(&mut mode).unwrap(); + assert_eq!(mode[0], 0x00, "mode byte should be MODE_BINARY"); let status_frame: [u8; 5] = [0x03, 0, 0, 0, 0]; stream.write_all(&status_frame).unwrap(); let mut header = [0u8; 5]; stream.read_exact(&mut header).unwrap(); - header[0] == 0x82 // STATUS_RESP + assert_eq!(header[0], 0x82, "expected STATUS_RESP"); + let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]); + assert_eq!(len, 15, "STATUS_RESP payload must be 15 bytes"); + + let mut payload = [0u8; 15]; + stream.read_exact(&mut payload).unwrap(); + let pid = u32::from_be_bytes([payload[0], payload[1], payload[2], payload[3]]); + let alive = payload[8]; + assert!(pid > 0, "pid should be nonzero"); + assert_eq!(alive, 1, "alive should be 1"); + true }) }) .collect(); @@ -794,18 +826,20 @@ fn subscriber_receives_exit_frame_on_child_exit() { let msg_type = header[0]; let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]) as usize; let mut payload = vec![0u8; len]; - if len > 0 { - if stream.read_exact(&mut payload).is_err() { - break; - } + if len > 0 && stream.read_exact(&mut payload).is_err() { + break; } if msg_type == 0x83 { got_exit = true; - if payload.len() >= 4 { - exit_code = Some(i32::from_be_bytes([ - payload[0], payload[1], payload[2], payload[3], - ])); - } + assert_eq!( + payload.len(), + 4, + "EXIT frame payload must be exactly 4 bytes, got {}", + payload.len() + ); + exit_code = Some(i32::from_be_bytes([ + payload[0], payload[1], payload[2], payload[3], + ])); break; } } @@ -948,6 +982,15 @@ fn partial_frame_disconnect_does_not_crash_supervisor() { header[0], 0x82, "should get STATUS_RESP after partial frame client" ); + let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]); + assert_eq!(len, 15, "STATUS_RESP payload must be 15 bytes"); + + let mut payload = [0u8; 15]; + stream.read_exact(&mut payload).unwrap(); + let pid = u32::from_be_bytes([payload[0], payload[1], payload[2], payload[3]]); + let alive = payload[8]; + assert!(pid > 0, "pid should be nonzero after partial frame client"); + assert_eq!(alive, 1, "alive should be 1 after partial frame client"); } let _ = child.kill(); @@ -1051,6 +1094,179 @@ fn list_cleans_stale_socket_and_pid_files() { assert!(!pid.exists(), "stale .pid should be cleaned up by ls"); } +// -- Clean subcommand -- + +/// `hm clean --force` removes orphaned .log files older than the retention window. +#[test] +fn clean_removes_old_orphan_logs() { + let tmp = tempfile::tempdir().unwrap(); + let socket_dir = tmp.path().to_path_buf(); + std::fs::create_dir_all(&socket_dir).unwrap(); + + // Create an old orphaned log (no .sock or .pid). + let old_log = socket_dir.join("dead-session.log"); + std::fs::write(&old_log, "some log output").unwrap(); + + // Backdate the file to 2 days ago. + let two_days_ago = filetime::FileTime::from_system_time( + std::time::SystemTime::now() - Duration::from_secs(2 * 86400), + ); + filetime::set_file_mtime(&old_log, two_days_ago).unwrap(); + + assert!(old_log.exists()); + + let output = Command::new(hm_bin()) + .args([ + "clean", + "--socket-dir", + socket_dir.to_str().unwrap(), + "--older-than", + "24h", + "--force", + ]) + .output() + .expect("failed to run clean"); + + assert!(output.status.success(), "clean should succeed"); + assert!(!old_log.exists(), "old orphan log should be removed"); + + let stdout = String::from_utf8_lossy(&output.stdout); + assert!( + stdout.contains("Cleaned 1 log file"), + "should report cleaning: {stdout}" + ); +} + +/// `hm clean --force` preserves logs within the retention window. +#[test] +fn clean_preserves_young_logs() { + let tmp = tempfile::tempdir().unwrap(); + let socket_dir = tmp.path().to_path_buf(); + std::fs::create_dir_all(&socket_dir).unwrap(); + + // Create a fresh orphaned log (just created, well within 24h). + let fresh_log = socket_dir.join("recent-session.log"); + std::fs::write(&fresh_log, "recent log output").unwrap(); + + let output = Command::new(hm_bin()) + .args([ + "clean", + "--socket-dir", + socket_dir.to_str().unwrap(), + "--older-than", + "24h", + "--force", + ]) + .output() + .expect("failed to run clean"); + + assert!(output.status.success()); + assert!(fresh_log.exists(), "young log should be preserved"); + + let stdout = String::from_utf8_lossy(&output.stdout); + assert!( + stdout.contains("Nothing to clean"), + "should report nothing to clean: {stdout}" + ); +} + +/// `hm clean` without --force is dry-run by default. +#[test] +fn clean_default_is_dry_run() { + let tmp = tempfile::tempdir().unwrap(); + let socket_dir = tmp.path().to_path_buf(); + std::fs::create_dir_all(&socket_dir).unwrap(); + + let old_log = socket_dir.join("old-session.log"); + std::fs::write(&old_log, "old log").unwrap(); + + let two_days_ago = filetime::FileTime::from_system_time( + std::time::SystemTime::now() - Duration::from_secs(2 * 86400), + ); + filetime::set_file_mtime(&old_log, two_days_ago).unwrap(); + + let output = Command::new(hm_bin()) + .args([ + "clean", + "--socket-dir", + socket_dir.to_str().unwrap(), + "--older-than", + "24h", + ]) + .output() + .expect("failed to run clean"); + + assert!(output.status.success()); + assert!( + old_log.exists(), + "default (dry-run) should not delete files" + ); + + let stdout = String::from_utf8_lossy(&output.stdout); + assert!( + stdout.contains("would remove"), + "should show what would be removed: {stdout}" + ); +} + +/// `hm clean` skips logs belonging to live sessions. +#[test] +fn clean_preserves_live_session_logs() { + let tmp = tempfile::tempdir().unwrap(); + let socket_dir = tmp.path().to_path_buf(); + let session_id = "clean-live"; + let socket_path = socket_dir.join(format!("{session_id}.sock")); + + // Start a live session. + let mut child = Command::new(hm_bin()) + .args([ + "run", + "--detach", + "--id", + session_id, + "--socket-dir", + socket_dir.to_str().unwrap(), + "--", + "sleep", + "30", + ]) + .stdout(Stdio::null()) + .stderr(Stdio::null()) + .spawn() + .expect("failed to spawn"); + + assert!(wait_for_socket(&socket_path, Duration::from_secs(5))); + + // The log file exists and belongs to a live session. + let log_path = socket_dir.join(format!("{session_id}.log")); + + // Backdate it so it would be cleaned if orphaned. + if log_path.exists() { + let old = filetime::FileTime::from_system_time( + std::time::SystemTime::now() - Duration::from_secs(2 * 86400), + ); + filetime::set_file_mtime(&log_path, old).unwrap(); + } + + let output = Command::new(hm_bin()) + .args([ + "clean", + "--socket-dir", + socket_dir.to_str().unwrap(), + "--older-than", + "1s", + "--force", + ]) + .output() + .expect("failed to run clean"); + + assert!(output.status.success()); + assert!(log_path.exists(), "live session log should not be removed"); + + let _ = child.kill(); + let _ = child.wait(); +} + // -- Tests that would have caught bugs found by code review -- /// Session startup should complete quickly. On systems where _SC_OPEN_MAX @@ -1190,6 +1406,20 @@ fn status_works_with_output_producing_children() { .read_exact(&mut header) .unwrap_or_else(|e| panic!("{session_id}: timed out reading status response: {e}")); assert_eq!(header[0], 0x82, "{session_id}: should get STATUS_RESP"); + let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]); + assert_eq!( + len, 15, + "{session_id}: STATUS_RESP payload must be 15 bytes" + ); + + let mut payload = [0u8; 15]; + stream + .read_exact(&mut payload) + .unwrap_or_else(|e| panic!("{session_id}: failed to read status payload: {e}")); + let pid = u32::from_be_bytes([payload[0], payload[1], payload[2], payload[3]]); + let alive = payload[8]; + assert!(pid > 0, "{session_id}: pid should be nonzero"); + assert_eq!(alive, 1, "{session_id}: alive should be 1"); let _ = child.kill(); let _ = child.wait(); @@ -1344,6 +1574,26 @@ fn oversized_frame_over_socket_rejected() { resp_header[0], 0x82, "supervisor should still respond after rejecting oversized frame" ); + let len = u32::from_be_bytes([ + resp_header[1], + resp_header[2], + resp_header[3], + resp_header[4], + ]); + assert_eq!(len, 15, "STATUS_RESP payload must be 15 bytes"); + + let mut payload = [0u8; 15]; + stream.read_exact(&mut payload).unwrap(); + let pid = u32::from_be_bytes([payload[0], payload[1], payload[2], payload[3]]); + let alive = payload[8]; + assert!( + pid > 0, + "pid should be nonzero after oversized frame rejection" + ); + assert_eq!( + alive, 1, + "alive should be 1 after oversized frame rejection" + ); } let _ = child.kill(); @@ -1621,10 +1871,8 @@ fn subscribe_receives_scrollback() { let msg_type = header[0]; let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]) as usize; let mut payload = vec![0u8; len]; - if len > 0 { - if stream.read_exact(&mut payload).is_err() { - break; - } + if len > 0 && stream.read_exact(&mut payload).is_err() { + break; } if msg_type == 0x81 { output_data.extend_from_slice(&payload); @@ -1703,10 +1951,8 @@ fn scrollback_eviction() { let msg_type = header[0]; let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]) as usize; let mut payload = vec![0u8; len]; - if len > 0 { - if stream.read_exact(&mut payload).is_err() { - break; - } + if len > 0 && stream.read_exact(&mut payload).is_err() { + break; } if msg_type == 0x81 { total_bytes += len; @@ -1780,10 +2026,8 @@ fn multiple_subscribers_see_same_output() { let msg_type = header[0]; let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]) as usize; let mut payload = vec![0u8; len]; - if len > 0 { - if stream.read_exact(&mut payload).is_err() { - break; - } + if len > 0 && stream.read_exact(&mut payload).is_err() { + break; } if msg_type == 0x81 { output_data.extend_from_slice(&payload); @@ -1873,10 +2117,8 @@ fn input_round_trip() { let msg_type = header[0]; let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]) as usize; let mut payload = vec![0u8; len]; - if len > 0 { - if stream.read_exact(&mut payload).is_err() { - break; - } + if len > 0 && stream.read_exact(&mut payload).is_err() { + break; } if msg_type == 0x81 { output_data.extend_from_slice(&payload); @@ -1943,6 +2185,7 @@ fn kill_frame_sends_sigterm() { // Read frames until EXIT (0x83) or timeout. let mut got_exit = false; + let mut exit_code: Option = None; let start = std::time::Instant::now(); while start.elapsed() < Duration::from_secs(5) { let mut header = [0u8; 5]; @@ -1953,18 +2196,30 @@ fn kill_frame_sends_sigterm() { let msg_type = header[0]; let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]) as usize; let mut payload = vec![0u8; len]; - if len > 0 { - if stream.read_exact(&mut payload).is_err() { - break; - } + if len > 0 && stream.read_exact(&mut payload).is_err() { + break; } if msg_type == 0x83 { got_exit = true; + assert_eq!( + payload.len(), + 4, + "EXIT frame payload must be exactly 4 bytes" + ); + exit_code = Some(i32::from_be_bytes([ + payload[0], payload[1], payload[2], payload[3], + ])); break; } } assert!(got_exit, "should receive EXIT frame after sending KILL"); + // SIGTERM kills sleep, so exit code should be non-zero (signal death). + assert!( + exit_code.unwrap() != 0, + "exit code after KILL should be non-zero (signal death), got {:?}", + exit_code + ); let _ = child.wait(); } @@ -2113,10 +2368,8 @@ fn custom_session_env_var() { let msg_type = header[0]; let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]) as usize; let mut payload = vec![0u8; len]; - if len > 0 { - if stream.read_exact(&mut payload).is_err() { - break; - } + if len > 0 && stream.read_exact(&mut payload).is_err() { + break; } if msg_type == 0x81 { output_data.extend_from_slice(&payload); @@ -2198,10 +2451,8 @@ value = "integ_test_value_42" let msg_type = header[0]; let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]) as usize; let mut payload = vec![0u8; len]; - if len > 0 { - if stream.read_exact(&mut payload).is_err() { - break; - } + if len > 0 && stream.read_exact(&mut payload).is_err() { + break; } if msg_type == 0x81 { output_data.extend_from_slice(&payload); @@ -2275,10 +2526,8 @@ fn workdir_applied_to_child() { let msg_type = header[0]; let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]) as usize; let mut payload = vec![0u8; len]; - if len > 0 { - if stream.read_exact(&mut payload).is_err() { - break; - } + if len > 0 && stream.read_exact(&mut payload).is_err() { + break; } if msg_type == 0x81 { output_data.extend_from_slice(&payload); @@ -2427,10 +2676,8 @@ fn binary_data_through_pty() { let msg_type = header[0]; let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]) as usize; let mut payload = vec![0u8; len]; - if len > 0 { - if stream.read_exact(&mut payload).is_err() { - break; - } + if len > 0 && stream.read_exact(&mut payload).is_err() { + break; } if msg_type == 0x81 { got_output = true; @@ -2524,6 +2771,8 @@ fn zero_size_terminal() { let mut payload = [0u8; 15]; stream.read_exact(&mut payload).unwrap(); + let pid = u32::from_be_bytes([payload[0], payload[1], payload[2], payload[3]]); + assert!(pid > 0, "pid should be nonzero with zero-size terminal"); assert_eq!(payload[8], 1, "should be alive with zero-size terminal"); let _ = child.kill(); @@ -2581,7 +2830,11 @@ fn idle_ms_counter() { let mut payload = [0u8; 15]; stream.read_exact(&mut payload).unwrap(); + let pid = u32::from_be_bytes([payload[0], payload[1], payload[2], payload[3]]); let idle_ms = u32::from_be_bytes([payload[4], payload[5], payload[6], payload[7]]); + let alive = payload[8]; + assert!(pid > 0, "pid should be nonzero"); + assert_eq!(alive, 1, "should be alive"); assert!( idle_ms >= 1500, "idle_ms should be >= 1500 after 2s wait, got {idle_ms}" @@ -2622,20 +2875,25 @@ fn supervisor_sigterm_graceful_shutdown() { assert!(wait_for_socket(&socket_path, Duration::from_secs(5))); - // Read supervisor PID from the PID file. - let pid_str = std::fs::read_to_string(&pid_path) - .expect("failed to read PID file") + // Read supervisor PID from line 1 of the PID file. + let pid_contents = std::fs::read_to_string(&pid_path).expect("failed to read PID file"); + let supervisor_pid: i32 = pid_contents + .lines() + .next() + .expect("PID file should have at least one line") .trim() - .to_string(); - let supervisor_pid: i32 = pid_str.parse().expect("PID file should contain a number"); + .parse() + .expect("PID file line 1 should be the supervisor PID"); // Send SIGTERM to the supervisor. signal::kill(Pid::from_raw(supervisor_pid), Signal::SIGTERM).expect("failed to send SIGTERM"); // Wait for socket to disappear (graceful shutdown). + // The supervisor sends SIGTERM to the child, waits up to 5s (SIGKILL_GRACE), + // then SIGKILLs. Allow enough time for the full grace period + cleanup. let start = std::time::Instant::now(); let mut cleaned_up = false; - while start.elapsed() < Duration::from_secs(5) { + while start.elapsed() < Duration::from_secs(10) { if !socket_path.exists() { cleaned_up = true; break; @@ -2644,6 +2902,10 @@ fn supervisor_sigterm_graceful_shutdown() { } assert!(cleaned_up, "socket file should be cleaned up after SIGTERM"); + assert!( + !pid_path.exists(), + "PID file should be cleaned up after SIGTERM" + ); let _ = child.wait(); } @@ -2711,10 +2973,8 @@ fn read_output_frames(stream: &mut UnixStream, timeout: Duration) -> (Vec, b let msg_type = header[0]; let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]) as usize; let mut payload = vec![0u8; len]; - if len > 0 { - if stream.read_exact(&mut payload).is_err() { - break; - } + if len > 0 && stream.read_exact(&mut payload).is_err() { + break; } if msg_type == 0x81 { output_data.extend_from_slice(&payload); @@ -2828,7 +3088,7 @@ fn large_paste_burst() { // Build large input: "echo " + 4000 'A's + "\r" let mut input_data = b"echo ".to_vec(); - input_data.extend(std::iter::repeat(b'A').take(4000)); + input_data.extend(std::iter::repeat_n(b'A', 4000)); input_data.push(b'\r'); let frame = build_input_frame(&input_data); stream.write_all(&frame).unwrap(); @@ -2920,10 +3180,8 @@ fn ctrl_c_forwarded_not_exit() { let msg_type = header[0]; let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]) as usize; let mut payload = vec![0u8; len]; - if len > 0 { - if stream.read_exact(&mut payload).is_err() { - break; - } + if len > 0 && stream.read_exact(&mut payload).is_err() { + break; } if msg_type == 0x81 { output_data.extend_from_slice(&payload); @@ -2953,10 +3211,8 @@ fn ctrl_c_forwarded_not_exit() { let msg_type = header[0]; let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]) as usize; let mut payload = vec![0u8; len]; - if len > 0 { - if stream.read_exact(&mut payload).is_err() { - break; - } + if len > 0 && stream.read_exact(&mut payload).is_err() { + break; } if msg_type == 0x81 { output_data.extend_from_slice(&payload); @@ -3236,10 +3492,8 @@ fn input_after_subscribe() { let msg_type = header[0]; let len = u32::from_be_bytes([header[1], header[2], header[3], header[4]]) as usize; let mut payload = vec![0u8; len]; - if len > 0 { - if stream.read_exact(&mut payload).is_err() { - break; - } + if len > 0 && stream.read_exact(&mut payload).is_err() { + break; } if msg_type == 0x81 { output_data.extend_from_slice(&payload); diff --git a/tests/test_attach.py b/tests/test_attach.py index da5f8b7..61002fe 100644 --- a/tests/test_attach.py +++ b/tests/test_attach.py @@ -1,13 +1,10 @@ -#!/usr/bin/env python3 """Attach-flow integration tests for heimdall. These tests exercise the terminal UX: alt screen, status bar, detach, signal handling. They require a real PTY (pexpect allocates one). -Usage: - python3 tests/test_attach.py [--hm PATH] - -Requires: pexpect (pip install pexpect) +Requires: uv sync (installs pexpect + pytest) +Run: uv run pytest tests/ or just test-attach """ from __future__ import annotations @@ -15,354 +12,240 @@ import os import signal import subprocess -import sys -import tempfile import time from pathlib import Path +from typing import TYPE_CHECKING import pexpect +import pytest -# ── Binary resolution ──────────────────────────────────────────────── - -HM_BIN = os.environ.get("HM_BIN") -if not HM_BIN: - # Try target/debug/hm relative to project root. - _project = Path(__file__).resolve().parent.parent - _debug = _project / "target" / "debug" / "hm" - _release = _project / "target" / "release" / "hm" - if _debug.exists(): - HM_BIN = str(_debug) - elif _release.exists(): - HM_BIN = str(_release) - else: - print("ERROR: hm binary not found. Run `cargo build` first.", file=sys.stderr) - sys.exit(1) - - -# ── Helpers ────────────────────────────────────────────────────────── - -def wait_for_socket(path: Path, timeout: float = 5.0) -> bool: - """Poll until a socket file appears.""" - deadline = time.monotonic() + timeout - while time.monotonic() < deadline: - if path.exists(): - return True - time.sleep(0.05) - return False - - -def start_detached_session( - session_id: str, - socket_dir: Path, - cmd: list[str], - *, - extra_args: list[str] | None = None, -) -> subprocess.Popen: - """Start a detached hm session and wait for the socket.""" - args = [ - HM_BIN, "run", "--detach", - "--id", session_id, - "--socket-dir", str(socket_dir), - ] - if extra_args: - args.extend(extra_args) - args.append("--") - args.extend(cmd) - proc = subprocess.Popen( - args, - stdout=subprocess.DEVNULL, - stderr=subprocess.DEVNULL, - ) - sock = socket_dir / f"{session_id}.sock" - if not wait_for_socket(sock): - proc.kill() - proc.wait() - raise RuntimeError(f"Socket never appeared: {sock}") - return proc - - -# ── Test infrastructure ────────────────────────────────────────────── - -_results: list[tuple[str, bool, str]] = [] - - -def run_test(fn): - """Run a test function, record pass/fail.""" - name = fn.__name__ - try: - fn() - _results.append((name, True, "")) - print(f" ✓ {name}") - except Exception as e: - _results.append((name, False, str(e))) - print(f" ✗ {name}: {e}") - - -# ── Tests ──────────────────────────────────────────────────────────── - - -def test_attach_shows_status_bar(): - """Attaching shows the status bar with session name.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "attach-bar" - proc = start_detached_session(sid, socket_dir, ["bash"]) - - try: - child = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=5, - dimensions=(24, 80), - ) - # Status bar should contain [hm] and the session name. - child.expect(r"\[hm\].*attach-bar", timeout=5) - - child.sendcontrol("\\") # Ctrl-\ to detach - child.expect(pexpect.EOF, timeout=3) - finally: - proc.kill() - proc.wait() - - -def test_attach_alt_screen(): - """Attach enters alternate screen buffer (ESC[?1049h).""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "attach-alt" - proc = start_detached_session(sid, socket_dir, ["bash"]) - - try: - child = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=5, - dimensions=(24, 80), - ) - # Alt screen escape should be in early output. - # pexpect captures raw bytes from the PTY. - child.expect(r"\[hm\]", timeout=5) - - # Detach and verify we get the alt screen leave sequence - # or at least the detach message (which means cleanup ran). - child.sendcontrol("\\") - child.expect("detached", timeout=3) - child.expect(pexpect.EOF, timeout=3) - finally: - proc.kill() - proc.wait() - - -def test_detach_ctrl_backslash(): - """Ctrl-\\ detaches cleanly, prints message, supervisor stays alive.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "detach-test" - proc = start_detached_session(sid, socket_dir, ["sleep", "60"]) - - try: - child = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=5, - dimensions=(24, 80), - ) - # Wait for status bar to confirm attach is live. - child.expect(r"\[hm\]", timeout=5) +if TYPE_CHECKING: + from collections.abc import Generator - # Send detach key. - child.sendcontrol("\\") - child.expect("detached.*detach-test", timeout=3) - child.expect(pexpect.EOF, timeout=3) +# ── Binary resolution ──────────────────────────────────────────────── - # Supervisor should still be alive — socket should still exist. - sock = socket_dir / f"{sid}.sock" - assert sock.exists(), "Socket should still exist after detach" - finally: - proc.kill() - proc.wait() +_PROJECT = Path(__file__).resolve().parent.parent +_DEBUG = _PROJECT / "target" / "debug" / "hm" +_RELEASE = _PROJECT / "target" / "release" / "hm" -def test_attach_receives_output(): - """Attach shows child process output.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "attach-output" - proc = start_detached_session( - sid, socket_dir, - ["bash", "-c", "echo ATTACH_OUTPUT_MARKER && sleep 60"], - ) +def _find_hm() -> str: + """Resolve the hm binary path.""" + env = os.environ.get("HM_BIN") + if env: + return env + if _DEBUG.exists(): + return str(_DEBUG) + if _RELEASE.exists(): + return str(_RELEASE) + pytest.skip("hm binary not found — run `cargo build` first") + return "" # unreachable, keeps type checker happy - try: - child = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=5, - dimensions=(24, 80), - ) - # Should see the output from the child. - child.expect("ATTACH_OUTPUT_MARKER", timeout=5) - - child.sendcontrol("\\") - child.expect(pexpect.EOF, timeout=3) - finally: - proc.kill() - proc.wait() +HM_BIN = _find_hm() -def test_attach_input_forwarded(): - """Keystrokes in attach are forwarded to the child.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "attach-input" - proc = start_detached_session(sid, socket_dir, ["bash"]) - - try: - child = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=5, - dimensions=(24, 80), - ) - # Wait for bash prompt (or at least status bar). - child.expect(r"\[hm\]", timeout=5) - time.sleep(0.5) # Let bash initialize. - # Type a command. - child.sendline("echo INPUT_FORWARD_TEST") +# ── Fixtures ───────────────────────────────────────────────────────── - # Should see the command output. - child.expect("INPUT_FORWARD_TEST", timeout=5) - child.sendcontrol("\\") - child.expect(pexpect.EOF, timeout=3) - finally: - proc.kill() - proc.wait() +@pytest.fixture +def socket_dir(tmp_path: Path) -> Path: + """Provide a temporary socket directory.""" + return tmp_path -def test_attach_session_exit(): - """When the child exits, attach shows exit code and terminates.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "attach-exit" - proc = start_detached_session( - sid, socket_dir, - ["bash", "-c", "echo GOODBYE && sleep 1 && exit 0"], - ) - - try: - child = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=10, - dimensions=(24, 80), - ) - # Should see exit message. - child.expect("session exited", timeout=10) - child.expect(pexpect.EOF, timeout=3) - finally: - proc.kill() - proc.wait() +def _wait_for_socket(path: Path, timeout: float = 5.0) -> bool: + """Poll until a socket file appears.""" + deadline = time.monotonic() + timeout + while time.monotonic() < deadline: + if path.exists(): + return True + time.sleep(0.05) + return False -def test_reattach_after_detach(): - """Can detach and reattach to the same session.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "reattach" - proc = start_detached_session( - sid, socket_dir, - ["bash", "-c", "echo REATTACH_MARKER && sleep 60"], +@pytest.fixture +def detached_session(socket_dir: Path) -> Generator[ + "function[[str, list[str], list[str] | None], subprocess.Popen[bytes]]", +]: + """Factory fixture: start a detached hm session, clean up on exit.""" + procs: list[subprocess.Popen[bytes]] = [] + + def _start( + session_id: str, + cmd: list[str], + extra_args: list[str] | None = None, + ) -> subprocess.Popen[bytes]: + args = [ + HM_BIN, "run", "--detach", + "--id", session_id, + "--socket-dir", str(socket_dir), + ] + if extra_args: + args.extend(extra_args) + args.append("--") + args.extend(cmd) + proc = subprocess.Popen( + args, + stdout=subprocess.DEVNULL, + stderr=subprocess.DEVNULL, ) - - try: - # First attach. - child1 = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=5, - dimensions=(24, 80), - ) - child1.expect("REATTACH_MARKER", timeout=5) - child1.sendcontrol("\\") - child1.expect(pexpect.EOF, timeout=3) - - time.sleep(0.3) - - # Second attach — should still get scrollback. - child2 = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=5, - dimensions=(24, 80), - ) - child2.expect("REATTACH_MARKER", timeout=5) - child2.sendcontrol("\\") - child2.expect(pexpect.EOF, timeout=3) - finally: + sock = socket_dir / f"{session_id}.sock" + if not _wait_for_socket(sock): proc.kill() proc.wait() + msg = f"Socket never appeared: {sock}" + raise RuntimeError(msg) + procs.append(proc) + return proc + yield _start -def test_sighup_kills_attach_not_supervisor(): - """SIGHUP (simulating X close) kills attach but supervisor survives.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "sighup-test" - proc = start_detached_session(sid, socket_dir, ["sleep", "60"]) - - try: - child = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=5, - dimensions=(24, 80), - ) - child.expect(r"\[hm\]", timeout=5) - - # Send SIGHUP to the attach process. - os.kill(child.pid, signal.SIGHUP) - - # Attach should die. - child.expect(pexpect.EOF, timeout=5) + for p in procs: + p.kill() + p.wait() - # Supervisor should still be alive. - time.sleep(0.3) - sock = socket_dir / f"{sid}.sock" - assert sock.exists(), "Socket should still exist — supervisor must survive SIGHUP" - finally: - proc.kill() - proc.wait() +def _attach( + socket_dir: Path, + session_id: str, + *, + timeout: int = 5, + dimensions: tuple[int, int] = (24, 80), + maxread: int = 2000, +) -> pexpect.spawn: + """Spawn an attach client.""" + return pexpect.spawn( + HM_BIN, + ["attach", "--socket-dir", str(socket_dir), session_id], + timeout=timeout, + dimensions=dimensions, + maxread=maxread, + ) -def test_run_without_detach_auto_attaches(): - """`hm run` without --detach spawns supervisor and attaches.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "run-auto" +def _detach(child: pexpect.spawn) -> None: + """Send detach key, assert detach message, and wait for EOF.""" + child.sendcontrol("\\") + child.expect("detached", timeout=3) + child.expect(pexpect.EOF, timeout=3) + + +# ── Core attach tests ──────────────────────────────────────────────── + + +class TestAttachCore: + """Basic attach/detach behaviour.""" + + def test_shows_status_bar(self, socket_dir: Path, detached_session) -> None: + """Attaching shows the status bar with session name.""" + detached_session("bar", ["bash"]) + child = _attach(socket_dir, "bar") + child.expect(r"\[hm\].*bar", timeout=5) + _detach(child) + + def test_alt_screen(self, socket_dir: Path, detached_session) -> None: + """Attach enters alternate screen; detach exits cleanly.""" + detached_session("alt", ["bash"]) + child = _attach(socket_dir, "alt") + child.expect(r"\[hm\]", timeout=5) + child.sendcontrol("\\") + child.expect("detached", timeout=3) + child.expect(pexpect.EOF, timeout=3) + + def test_detach_ctrl_backslash(self, socket_dir: Path, detached_session) -> None: + """Ctrl-\\ detaches cleanly; supervisor stays alive.""" + detached_session("detach", ["sleep", "60"]) + child = _attach(socket_dir, "detach") + child.expect(r"\[hm\]", timeout=5) + child.sendcontrol("\\") + child.expect("detached.*detach", timeout=3) + child.expect(pexpect.EOF, timeout=3) + assert (socket_dir / "detach.sock").exists(), "Socket should survive detach" + + def test_receives_output(self, socket_dir: Path, detached_session) -> None: + """Attach shows child process output.""" + detached_session("output", ["bash", "-c", "echo OUTPUT_MARKER && sleep 60"]) + child = _attach(socket_dir, "output") + child.expect("OUTPUT_MARKER", timeout=5) + _detach(child) + + def test_input_forwarded(self, socket_dir: Path, detached_session) -> None: + """Keystrokes in attach are forwarded to the child.""" + detached_session("input", ["bash"]) + child = _attach(socket_dir, "input") + child.expect(r"\[hm\]", timeout=5) + time.sleep(0.5) + child.sendline("echo INPUT_FORWARD_TEST") + child.expect("INPUT_FORWARD_TEST", timeout=5) + _detach(child) + + def test_session_exit(self, socket_dir: Path, detached_session) -> None: + """When the child exits, attach shows exit code and terminates.""" + detached_session("exit", ["bash", "-c", "echo GOODBYE && sleep 1 && exit 0"]) + child = _attach(socket_dir, "exit", timeout=10) + child.expect("session exited", timeout=10) + child.expect(pexpect.EOF, timeout=3) + + def test_session_exit_cleans_up(self, socket_dir: Path, detached_session) -> None: + """Socket and PID files are removed after child exits.""" + detached_session("cleanup", ["bash", "-c", "sleep 1 && exit 0"]) + child = _attach(socket_dir, "cleanup", timeout=10) + child.expect("session exited", timeout=10) + child.expect(pexpect.EOF, timeout=3) + + # Give supervisor time to run its cleanup guard. + time.sleep(0.5) + assert not (socket_dir / "cleanup.sock").exists(), "Socket file should be cleaned up" + assert not (socket_dir / "cleanup.pid").exists(), "PID file should be cleaned up" + + def test_reattach_after_detach(self, socket_dir: Path, detached_session) -> None: + """Can detach and reattach; scrollback preserved.""" + detached_session("reattach", ["bash", "-c", "echo REATTACH_MARKER && sleep 60"]) + + child1 = _attach(socket_dir, "reattach") + child1.expect("REATTACH_MARKER", timeout=5) + _detach(child1) + time.sleep(0.3) + + child2 = _attach(socket_dir, "reattach") + child2.expect("REATTACH_MARKER", timeout=5) + _detach(child2) + + def test_status_bar_shows_state(self, socket_dir: Path, detached_session) -> None: + """Status bar shows process state after poll interval.""" + detached_session("state", ["sleep", "60"]) + child = _attach(socket_dir, "state") + child.expect(r"\[hm\]", timeout=5) + child.expect(r"(idle|thinking|streaming|tool_use|active)", timeout=5) + _detach(child) + + def test_run_without_detach_auto_attaches(self, socket_dir: Path) -> None: + """`hm run` without --detach spawns supervisor and attaches.""" child = pexpect.spawn( HM_BIN, [ - "run", "--id", sid, + "run", "--id", "auto", "--socket-dir", str(socket_dir), - "--", "bash", "-c", "echo AUTO_ATTACH_TEST && sleep 60", + "--", "bash", "-c", "echo AUTO_ATTACH && sleep 60", ], timeout=10, dimensions=(24, 80), ) try: - # Should see both status bar and child output. - child.expect("AUTO_ATTACH_TEST", timeout=10) + child.expect("AUTO_ATTACH", timeout=10) child.expect(r"\[hm\]", timeout=5) - - # Detach. child.sendcontrol("\\") child.expect("detached", timeout=3) child.expect(pexpect.EOF, timeout=3) - # Supervisor should still be running (socket exists). time.sleep(0.3) - sock = socket_dir / f"{sid}.sock" - assert sock.exists(), "Supervisor should survive detach from run-and-attach" + assert (socket_dir / "auto.sock").exists(), "Supervisor should survive detach" finally: child.close(force=True) - # Clean up the supervisor. try: - pid_path = socket_dir / f"{sid}.pid" + pid_path = socket_dir / "auto.pid" if pid_path.exists(): pid = int(pid_path.read_text().strip()) os.kill(pid, signal.SIGTERM) @@ -370,343 +253,142 @@ def test_run_without_detach_auto_attaches(): pass -def test_status_bar_shows_state(): - """Status bar shows process state (idle/active/etc) after poll interval.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "bar-state" - proc = start_detached_session(sid, socket_dir, ["sleep", "60"]) - - try: - child = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=5, - dimensions=(24, 80), - ) - child.expect(r"\[hm\]", timeout=5) +# ── Signal handling tests ──────────────────────────────────────────── - # Wait for at least one status poll (1s interval) to populate - # the right side with a state name. - # Any of: idle, thinking, streaming, tool_use, active - child.expect(r"(idle|thinking|streaming|tool_use|active)", timeout=5) - child.sendcontrol("\\") - child.expect(pexpect.EOF, timeout=3) - finally: - proc.kill() - proc.wait() - - -def test_sigterm_kills_attach(): - """SIGTERM to attach process exits cleanly.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "sigterm-attach" - proc = start_detached_session(sid, socket_dir, ["sleep", "60"]) - - try: - child = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=5, - dimensions=(24, 80), - ) - child.expect(r"\[hm\]", timeout=5) - - os.kill(child.pid, signal.SIGTERM) - child.expect(pexpect.EOF, timeout=5) - - # Supervisor should still be alive. - time.sleep(0.3) - sock = socket_dir / f"{sid}.sock" - assert sock.exists(), "Supervisor should survive attach SIGTERM" - finally: - proc.kill() - proc.wait() - - -# ── Adversarial tests ──────────────────────────────────────────────── - - -def test_resize_during_attach(): - """Resize terminal while attached — status bar must survive.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "adv-resize-attach" - proc = start_detached_session( - sid, socket_dir, - ["bash", "-c", "for i in $(seq 1 100); do echo LINE_$i; sleep 0.02; done && sleep 60"], - ) - - try: - child = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=10, - dimensions=(24, 80), - ) - child.expect(r"\[hm\]", timeout=5) - - # Rapid resize while output is streaming. - for cols, rows in [(120, 40), (40, 10), (200, 50), (80, 24), (60, 15)]: - child.setwinsize(rows, cols) - time.sleep(0.1) - - # Wait for output to settle, then verify status bar is still there. - time.sleep(2) - # Status bar should still render after resizes. - child.expect(r"\[hm\].*adv-resize-attach", timeout=5) - - child.sendcontrol("\\") - child.expect(pexpect.EOF, timeout=3) - finally: - proc.kill() - proc.wait() - - -def test_arrow_keys_work(): - """Arrow keys produce correct escape sequences, not garbage like ^[[A.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "adv-arrows" - proc = start_detached_session(sid, socket_dir, ["bash"]) - - try: - child = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=5, - dimensions=(24, 80), - ) - child.expect(r"\[hm\]", timeout=5) - time.sleep(0.5) +class TestSignals: + """Signal handling during attach.""" - # Type a command, then use arrow keys to recall it. - child.sendline("echo ARROW_TEST_1") - child.expect("ARROW_TEST_1", timeout=5) + def test_sighup_kills_attach_not_supervisor( + self, socket_dir: Path, detached_session, + ) -> None: + """SIGHUP kills attach but supervisor survives.""" + detached_session("sighup", ["sleep", "60"]) + child = _attach(socket_dir, "sighup") + child.expect(r"\[hm\]", timeout=5) - child.sendline("echo ARROW_TEST_2") - child.expect("ARROW_TEST_2", timeout=5) + os.kill(child.pid, signal.SIGHUP) + child.expect(pexpect.EOF, timeout=5) - # Up arrow should recall previous command, not print ^[[A. - child.send("\x1b[A") # Up arrow escape sequence - time.sleep(0.3) - child.send("\r") - # Should see ARROW_TEST_2 again (recalled from history). - child.expect("ARROW_TEST_2", timeout=5) + time.sleep(0.3) + assert (socket_dir / "sighup.sock").exists(), "Supervisor must survive SIGHUP" - # Verify no literal ^[[A in output (sign of broken terminal). - # Read what's in the buffer. - remaining = child.before.decode("utf-8", errors="replace") if child.before else "" - assert "^[[A" not in remaining, f"Arrow key escaped as literal ^[[A: {remaining!r}" + def test_sigterm_kills_attach(self, socket_dir: Path, detached_session) -> None: + """SIGTERM to attach process exits cleanly.""" + detached_session("sigterm", ["sleep", "60"]) + child = _attach(socket_dir, "sigterm") + child.expect(r"\[hm\]", timeout=5) - child.sendcontrol("\\") - child.expect(pexpect.EOF, timeout=3) - finally: - proc.kill() - proc.wait() + os.kill(child.pid, signal.SIGTERM) + child.expect(pexpect.EOF, timeout=5) + time.sleep(0.3) + assert (socket_dir / "sigterm.sock").exists(), "Supervisor must survive SIGTERM" -def test_ctrl_c_forwarded(): - """Ctrl-C is forwarded to the child, not caught by attach.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "adv-ctrlc" - proc = start_detached_session( - sid, socket_dir, - ["bash", "-c", 'trap "echo GOT_SIGINT_VIA_ATTACH" INT; echo TRAP_READY; while true; do sleep 1; done'], + def test_ctrl_c_forwarded(self, socket_dir: Path, detached_session) -> None: + """Ctrl-C is forwarded to the child, not caught by attach.""" + detached_session( + "ctrlc", + ["bash", "-c", + 'trap "echo GOT_SIGINT" INT; echo TRAP_READY; while true; do sleep 1; done'], ) + child = _attach(socket_dir, "ctrlc", timeout=10) + child.expect("TRAP_READY", timeout=5) + child.sendcontrol("c") + child.expect("GOT_SIGINT", timeout=5) + child.expect(r"\[hm\]", timeout=5) + _detach(child) - try: - child = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=10, - dimensions=(24, 80), - ) - child.expect("TRAP_READY", timeout=5) - # Send Ctrl-C. - child.sendcontrol("c") +# ── Adversarial tests ──────────────────────────────────────────────── - # Child's trap should fire. - child.expect("GOT_SIGINT_VIA_ATTACH", timeout=5) - # Attach should still be alive (not killed by Ctrl-C). - child.expect(r"\[hm\]", timeout=5) +class TestAdversarial: + """Edge cases and stress scenarios.""" - child.sendcontrol("\\") - child.expect(pexpect.EOF, timeout=3) - finally: - proc.kill() - proc.wait() - - -def test_output_flood_attach(): - """Massive output flood doesn't crash attach or corrupt status bar.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "adv-flood" - proc = start_detached_session( - sid, socket_dir, - ["bash", "-c", "seq 1 5000 && sleep 60"], + def test_resize_during_attach(self, socket_dir: Path, detached_session) -> None: + """Resize terminal while attached — status bar must survive.""" + detached_session( + "resize", + ["bash", "-c", + "for i in $(seq 1 100); do echo LINE_$i; sleep 0.02; done && sleep 60"], ) - - try: - child = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=15, - dimensions=(24, 80), - maxread=65536, - ) - - # Wait for flood to finish (last line should be 5000). - child.expect("5000", timeout=10) - time.sleep(1) - - # Status bar should still be rendering. - child.expect(r"\[hm\]", timeout=5) - - child.sendcontrol("\\") - child.expect(pexpect.EOF, timeout=3) - finally: - proc.kill() - proc.wait() - - -def test_two_attaches_same_session(): - """Two simultaneous attaches to the same session both work.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "adv-dual-attach" - proc = start_detached_session( - sid, socket_dir, - ["bash", "-c", "sleep 1 && echo DUAL_MARKER && sleep 60"], + child = _attach(socket_dir, "resize", timeout=10) + child.expect(r"\[hm\]", timeout=5) + + for cols, rows in [(120, 40), (40, 10), (200, 50), (80, 24), (60, 15)]: + child.setwinsize(rows, cols) + time.sleep(0.1) + + time.sleep(2) + child.expect(r"\[hm\].*resize", timeout=5) + _detach(child) + + def test_arrow_keys_work(self, socket_dir: Path, detached_session) -> None: + """Arrow keys produce correct escape sequences.""" + detached_session("arrows", ["bash"]) + child = _attach(socket_dir, "arrows") + child.expect(r"\[hm\]", timeout=5) + time.sleep(0.5) + + child.sendline("echo ARROW_1") + child.expect("ARROW_1", timeout=5) + child.sendline("echo ARROW_2") + child.expect("ARROW_2", timeout=5) + + child.send("\x1b[A") # Up arrow + time.sleep(0.3) + child.send("\r") + child.expect("ARROW_2", timeout=5) + + remaining = child.before.decode("utf-8", errors="replace") if child.before else "" + assert "^[[A" not in remaining, f"Arrow key escaped as literal: {remaining!r}" + _detach(child) + + def test_output_flood(self, socket_dir: Path, detached_session) -> None: + """Massive output flood doesn't crash attach or corrupt status bar.""" + detached_session("flood", ["bash", "-c", "seq 1 5000 && sleep 60"]) + child = _attach(socket_dir, "flood", timeout=15, maxread=65536) + child.expect("5000", timeout=10) + time.sleep(1) + child.expect(r"\[hm\]", timeout=5) + _detach(child) + + def test_two_attaches_same_session( + self, socket_dir: Path, detached_session, + ) -> None: + """Two simultaneous attaches both receive output.""" + detached_session( + "dual", ["bash", "-c", "sleep 1 && echo DUAL_MARKER && sleep 60"], ) + child1 = _attach(socket_dir, "dual", timeout=10) + child2 = _attach(socket_dir, "dual", timeout=10) - try: - child1 = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=10, - dimensions=(24, 80), - ) - child2 = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=10, - dimensions=(24, 80), - ) - - # Both should see the marker. - child1.expect("DUAL_MARKER", timeout=10) - child2.expect("DUAL_MARKER", timeout=10) - - # Detach first, second should still work. - child1.sendcontrol("\\") - child1.expect(pexpect.EOF, timeout=3) - - # Second is still live. - time.sleep(0.3) - child2.sendcontrol("\\") - child2.expect("detached", timeout=3) - child2.expect(pexpect.EOF, timeout=3) - finally: - proc.kill() - proc.wait() - - -def test_rapid_detach_reattach(): - """Rapid detach/reattach cycles don't leak or crash.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "adv-rapid-reattach" - proc = start_detached_session(sid, socket_dir, ["sleep", "60"]) - - try: - for i in range(5): - child = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=5, - dimensions=(24, 80), - ) - child.expect(r"\[hm\]", timeout=5) - child.sendcontrol("\\") - child.expect(pexpect.EOF, timeout=3) - time.sleep(0.1) - - # Supervisor should still be healthy. - sock = socket_dir / f"{sid}.sock" - assert sock.exists(), f"Socket gone after {i+1} reattach cycles" - finally: - proc.kill() - proc.wait() + child1.expect("DUAL_MARKER", timeout=10) + child2.expect("DUAL_MARKER", timeout=10) + _detach(child1) + time.sleep(0.3) + child2.sendcontrol("\\") + child2.expect("detached", timeout=3) + child2.expect(pexpect.EOF, timeout=3) -def test_long_session_name_status_bar(): - """Very long session name doesn't break status bar rendering.""" - with tempfile.TemporaryDirectory() as tmp: - socket_dir = Path(tmp) - sid = "a-very-long-session-name-that-might-overflow-the-status-bar-width" - proc = start_detached_session(sid, socket_dir, ["sleep", "60"]) + def test_rapid_detach_reattach(self, socket_dir: Path, detached_session) -> None: + """Rapid detach/reattach cycles don't leak or crash.""" + detached_session("rapid", ["sleep", "60"]) - try: - child = pexpect.spawn( - HM_BIN, ["attach", "--socket-dir", str(socket_dir), sid], - timeout=5, - dimensions=(24, 80), # 80 cols, name is 67 chars - ) - # Should still show [hm] prefix at minimum. + for _i in range(5): + child = _attach(socket_dir, "rapid") child.expect(r"\[hm\]", timeout=5) - - child.sendcontrol("\\") - child.expect(pexpect.EOF, timeout=3) - finally: - proc.kill() - proc.wait() - - -# ── Runner ─────────────────────────────────────────────────────────── - -def main(): - print(f"\nAttach integration tests (hm={HM_BIN})\n") - - tests = [ - # Core attach tests - test_attach_shows_status_bar, - test_attach_alt_screen, - test_detach_ctrl_backslash, - test_attach_receives_output, - test_attach_input_forwarded, - test_attach_session_exit, - test_reattach_after_detach, - test_sighup_kills_attach_not_supervisor, - test_run_without_detach_auto_attaches, - test_status_bar_shows_state, - test_sigterm_kills_attach, - # Adversarial attach tests - test_resize_during_attach, - test_arrow_keys_work, - test_ctrl_c_forwarded, - test_output_flood_attach, - test_two_attaches_same_session, - test_rapid_detach_reattach, - test_long_session_name_status_bar, - ] - - for test in tests: - run_test(test) - - passed = sum(1 for _, ok, _ in _results if ok) - failed = sum(1 for _, ok, _ in _results if not ok) - total = len(_results) - - print(f"\n{passed}/{total} passed", end="") - if failed: - print(f", {failed} FAILED") - print("\nFailures:") - for name, ok, err in _results: - if not ok: - print(f" {name}: {err}") - sys.exit(1) - else: - print() - sys.exit(0) - - -if __name__ == "__main__": - main() + _detach(child) + time.sleep(0.1) + + assert (socket_dir / "rapid.sock").exists(), "Socket gone after reattach cycles" + + def test_long_session_name(self, socket_dir: Path, detached_session) -> None: + """Very long session name doesn't break status bar.""" + sid = "long-name-that-overflows-bar" + detached_session(sid, ["sleep", "60"]) + child = _attach(socket_dir, sid) + child.expect(r"\[hm\]", timeout=5) + _detach(child) diff --git a/uv.lock b/uv.lock new file mode 100644 index 0000000..3f11d87 --- /dev/null +++ b/uv.lock @@ -0,0 +1,104 @@ +version = 1 +revision = 3 +requires-python = ">=3.13" + +[[package]] +name = "colorama" +version = "0.4.6" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d8/53/6f443c9a4a8358a93a6792e2acffb9d9d5cb0a5cfd8802644b7b1c9a02e4/colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44", size = 27697, upload-time = "2022-10-25T02:36:22.414Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335, upload-time = "2022-10-25T02:36:20.889Z" }, +] + +[[package]] +name = "heimdall-tests" +version = "0.1.0" +source = { virtual = "." } + +[package.dev-dependencies] +dev = [ + { name = "pexpect" }, + { name = "pytest" }, +] + +[package.metadata] + +[package.metadata.requires-dev] +dev = [ + { name = "pexpect", specifier = ">=4.9" }, + { name = "pytest", specifier = ">=8" }, +] + +[[package]] +name = "iniconfig" +version = "2.3.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/72/34/14ca021ce8e5dfedc35312d08ba8bf51fdd999c576889fc2c24cb97f4f10/iniconfig-2.3.0.tar.gz", hash = "sha256:c76315c77db068650d49c5b56314774a7804df16fee4402c1f19d6d15d8c4730", size = 20503, upload-time = "2025-10-18T21:55:43.219Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/cb/b1/3846dd7f199d53cb17f49cba7e651e9ce294d8497c8c150530ed11865bb8/iniconfig-2.3.0-py3-none-any.whl", hash = "sha256:f631c04d2c48c52b84d0d0549c99ff3859c98df65b3101406327ecc7d53fbf12", size = 7484, upload-time = "2025-10-18T21:55:41.639Z" }, +] + +[[package]] +name = "packaging" +version = "26.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/65/ee/299d360cdc32edc7d2cf530f3accf79c4fca01e96ffc950d8a52213bd8e4/packaging-26.0.tar.gz", hash = "sha256:00243ae351a257117b6a241061796684b084ed1c516a08c48a3f7e147a9d80b4", size = 143416, upload-time = "2026-01-21T20:50:39.064Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/b7/b9/c538f279a4e237a006a2c98387d081e9eb060d203d8ed34467cc0f0b9b53/packaging-26.0-py3-none-any.whl", hash = "sha256:b36f1fef9334a5588b4166f8bcd26a14e521f2b55e6b9de3aaa80d3ff7a37529", size = 74366, upload-time = "2026-01-21T20:50:37.788Z" }, +] + +[[package]] +name = "pexpect" +version = "4.9.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "ptyprocess" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/42/92/cc564bf6381ff43ce1f4d06852fc19a2f11d180f23dc32d9588bee2f149d/pexpect-4.9.0.tar.gz", hash = "sha256:ee7d41123f3c9911050ea2c2dac107568dc43b2d3b0c7557a33212c398ead30f", size = 166450, upload-time = "2023-11-25T09:07:26.339Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/9e/c3/059298687310d527a58bb01f3b1965787ee3b40dce76752eda8b44e9a2c5/pexpect-4.9.0-py2.py3-none-any.whl", hash = "sha256:7236d1e080e4936be2dc3e326cec0af72acf9212a7e1d060210e70a47e253523", size = 63772, upload-time = "2023-11-25T06:56:14.81Z" }, +] + +[[package]] +name = "pluggy" +version = "1.6.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/f9/e2/3e91f31a7d2b083fe6ef3fa267035b518369d9511ffab804f839851d2779/pluggy-1.6.0.tar.gz", hash = "sha256:7dcc130b76258d33b90f61b658791dede3486c3e6bfb003ee5c9bfb396dd22f3", size = 69412, upload-time = "2025-05-15T12:30:07.975Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538, upload-time = "2025-05-15T12:30:06.134Z" }, +] + +[[package]] +name = "ptyprocess" +version = "0.7.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/20/e5/16ff212c1e452235a90aeb09066144d0c5a6a8c0834397e03f5224495c4e/ptyprocess-0.7.0.tar.gz", hash = "sha256:5c5d0a3b48ceee0b48485e0c26037c0acd7d29765ca3fbb5cb3831d347423220", size = 70762, upload-time = "2020-12-28T15:15:30.155Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/22/a6/858897256d0deac81a172289110f31629fc4cee19b6f01283303e18c8db3/ptyprocess-0.7.0-py2.py3-none-any.whl", hash = "sha256:4b41f3967fce3af57cc7e94b888626c18bf37a083e3651ca8feeb66d492fef35", size = 13993, upload-time = "2020-12-28T15:15:28.35Z" }, +] + +[[package]] +name = "pygments" +version = "2.19.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/b0/77/a5b8c569bf593b0140bde72ea885a803b82086995367bf2037de0159d924/pygments-2.19.2.tar.gz", hash = "sha256:636cb2477cec7f8952536970bc533bc43743542f70392ae026374600add5b887", size = 4968631, upload-time = "2025-06-21T13:39:12.283Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/c7/21/705964c7812476f378728bdf590ca4b771ec72385c533964653c68e86bdc/pygments-2.19.2-py3-none-any.whl", hash = "sha256:86540386c03d588bb81d44bc3928634ff26449851e99741617ecb9037ee5ec0b", size = 1225217, upload-time = "2025-06-21T13:39:07.939Z" }, +] + +[[package]] +name = "pytest" +version = "9.0.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "colorama", marker = "sys_platform == 'win32'" }, + { name = "iniconfig" }, + { name = "packaging" }, + { name = "pluggy" }, + { name = "pygments" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/d1/db/7ef3487e0fb0049ddb5ce41d3a49c235bf9ad299b6a25d5780a89f19230f/pytest-9.0.2.tar.gz", hash = "sha256:75186651a92bd89611d1d9fc20f0b4345fd827c41ccd5c299a868a05d70edf11", size = 1568901, upload-time = "2025-12-06T21:30:51.014Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/3b/ab/b3226f0bd7cdcf710fbede2b3548584366da3b19b5021e74f5bde2a8fa3f/pytest-9.0.2-py3-none-any.whl", hash = "sha256:711ffd45bf766d5264d487b917733b453d917afd2b0ad65223959f59089f875b", size = 374801, upload-time = "2025-12-06T21:30:49.154Z" }, +]