Skip to content

Conversation

@drlucaa
Copy link
Member

@drlucaa drlucaa commented Jan 28, 2026

No description provided.

…utdown

Implements phase 1 of the daemon architecture:

Domain Layer:
- Add daemon path constants (socket, PID, log paths)
- Add daemon-specific sentinel errors (ErrDaemonNotRunning, etc.)
- Add 3-hour inactivity timeout constant

Port Interfaces:
- Define DaemonClient interface (Ping, Status, Shutdown, Close)
- Define DaemonConnector interface (Connect, IsRunning, Spawn)
- Define DaemonStatus struct with runtime metrics

gRPC Service:
- Add daemon.proto with DaemonService (Ping/Status/Shutdown RPCs)
- Configure buf for protocol buffer generation
- Generate pb.go and grpc.pb.go stubs

Adapter Implementation:
- Lifecycle manager with time.Timer-based inactivity tracking
- gRPC server listening on Unix Domain Socket
- gRPC client with automatic connection retry
- Background process spawner with detached execution
- Graft DI node registration

CLI Integration:
- Add 'daemon serve' subcommand (hidden, for internal spawning)
- Add 'daemon status' subcommand (shows uptime, idle time, PID)
- Add 'daemon stop' subcommand (graceful shutdown)
- Wire daemon commands into root CLI

Testing:
- Add lifecycle_test.go with synctest-based timer validation
- Test auto-shutdown after inactivity timeout
- Test timer reset prevents premature shutdown

Infrastructure:
- Add grpc, protobuf dependencies to go.mod
- Update flake.nix with buf tooling
- Register daemon adapter in wiring
- Extend daemon proto with GetGraph and GetEnvironment RPCs
- Add GraphCacheEntry domain type for cache metadata
- Extend DaemonClient interface with GetGraph/GetEnvironment methods
- Implement thread-safe ServerCache for graph and environment caching
- Add daemon RPC handlers for GetGraph and GetEnvironment
- Implement daemon client methods with string interning
- Add DiscoverConfigPaths to config loader for cache validation
- Refactor Scheduler to optionally use daemon client with fallback
- Refactor App layer to request graph from daemon with local fallback
Must Fix:
- Add nil pointer guards in GetGraph/GetEnvironment RPC handlers
  Prevents panic when Server is created via NewServer() without deps

Should Fix:
- Fix daemon client availability check: verify both err and client
- Add Graph.Validate() call after deserialization to compute
dependencies
- Use boolean flag for clearer daemon availability tracking
- Document ctx parameter limitation in GetGraph (no cancellation
propagation)
- Document ServerCache mtime validation assumes shared filesystem view
1. Consistent guard order: Both GetGraph and GetEnvironment now check
   cache first (line 159, 203), then their specific dependencies in
   struct declaration order (configLoader, envFactory respectively).

2. Explicit return semantics: GetGraph now uses named return values
   (graph *domain.Graph, cacheHit bool, err error) to match
GetEnvironment's
   pattern, making the three-value return (result, cacheHit, error)
explicit.
Add background file-watching service in daemon that proactively computes
and caches input hashes, eliminating blocking hash computation during
runs.

- Add watcher adapter using fsnotify for recursive directory monitoring
- Implement debouncer to coalesce rapid file changes (50ms window)
- Add hash cache with path-to-task index for efficient invalidation
- Extend gRPC API with GetInputHash RPC (Ready/Pending/Unknown states)
- Wire watcher service into daemon server lifecycle
- Use unique.Handle[string] for path interning to reduce memory
- Add ExecuteTask server-side streaming RPC to DaemonService
- Define ExecuteTaskRequest with task details and PTY dimensions
- Define ExecuteTaskResponse for log chunks (raw bytes)
- Import google/rpc/status.proto for error details
- Add buf dependency on googleapis for rpc types

Exit code handling:
- Success (exit=0): gRPC trailer x-exit-code: 0
- Failure (exit!=0): gRPC status UNKNOWN with exit code in trailer
- Regenerate daemon.pb.go and daemon_grpc.pb.go
- Generated by buf generate with googleapis dependency
- Extend DaemonClient interface with ExecuteTask method
- Accept context, task, nixEnv, stdout, stderr parameters
- Return error with exit_code field for non-zero exits
- Implement ExecuteTask method on daemon Client
- Handle streaming RPC and forward log chunks to io.Writer
- Extract exit code from gRPC trailers and status
- Add handleExecuteError helper for error handling
- Use constants for default PTY dimensions

Streaming behavior:
- Receives LogChunk messages and writes to stdout
- Extracts x-exit-code trailer on completion
- Handles UNKNOWN status for task failures
- Add executor field to Server struct
- Update NewServerWithDeps to accept executor parameter
- Implement ExecuteTask RPC handler with streaming
- Add streamWriter helper for io.Writer interface
- Set exit code in gRPC trailer (x-exit-code)
- Return UNKNOWN status for non-zero exit codes

Implementation details:
- Reconstruct domain.Task from ExecuteTaskRequest
- Execute via injected executor with PTY support
- Stream output back to client via LogChunk messages
- Handle exit codes consistently with local execution
- Add noDaemon field to Scheduler struct
- Add WithNoDaemon method for CLI flag propagation
- Refactor executeTask to use executeWithStrategy
- Implement executeWithFallback for remote/local execution
- Add isConnectionError helper for fallback detection
- Pass NoDaemon option through app.Run to scheduler

Execution strategy:
- Try remote execution via daemon when available
- Fallback to local on connection errors
- Respect --no-daemon flag to bypass remote execution
- Maintain backward compatibility with local-only mode
- Add --no-daemon flag to same run command
- Pass flag through app.Run to scheduler.WithNoDaemon
- Allow explicit bypass of remote daemon execution
- Update help text to document flag behavior

Use cases:
- Force local execution even when daemon is available
- Debugging remote execution issues
- Testing local vs remote behavior differences
- Lock buf dependency on googleapis for google/rpc/status.proto
- Required for ExecuteTask RPC error handling
- Extract actual exit codes from executor errors instead of always
returning 1
- Handle malformed exit code trailer values and preserve original errors
- Replace string-based error detection with proper gRPC status codes
- Remove unused google/rpc/status.proto import
- Document stderr parameter handling in PTY mode
- Add robust error chain traversal for wrapped gRPC errors
- Include both Unavailable and DeadlineExceeded as connection errors
- Add graph.Validate() call in GetGraph to populate executionOrder
- Use absolute socket path in client
- Use absolute working directory in app for robustness
- Update tests to accept any path with gomock.Any()
- Add backoff constants (50ms base, 200ms max, 1.5 multiplier)
- Configure grpc.WithConnectParams for faster connection establishment
- Optimizes local Unix Domain Socket connection behavior"
- Remove ResetTimer() from Ping() RPC (liveness check shouldn't count as
activity)
- Remove ResetTimer() from Status() RPC (read-only query shouldn't count
as activity)
- Add IsRunning() guard in DaemonStatus() to prevent daemon auto-spawn
- Format Last Activity as HH:MM:SS (X ago) for better readability

Changes only real work (GetGraph, GetEnvironment, GetInputHash,
ExecuteTask) to reset the idle timer.
Previously, the --no-daemon flag only prevented task execution via
daemon
but still connected to fetch the graph, updating the daemon's last
activity
timestamp. This caused 'same daemon status' to show recent activity even
when --no-daemon was used.

Changes:
- Check opts.NoDaemon before any daemon connection
- Load graph locally when --no-daemon is set
- Eliminate duplicated fallback code for cleaner logic
- Fix variable shadowing issue with client

The daemon's last activity now correctly remains unchanged when using
--no-daemon flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants