Skip to content

feat: SQLite Worker Coordination — cross-process worker tracking (Phase 2) #89

@dean0x

Description

@dean0x

Epic: #87 — Architectural Simplification v0.6.0

Goal

Enable multiple CLI/MCP processes to coordinate worker spawning through a shared workers database table. Currently, each beat run / beat mcp start creates an independent in-memory runtime with zero awareness of other processes' workers, causing uncoordinated spawning and inaccurate resource monitoring.

Sub-tasks

2a. Add workers Table

Schema (migration v9):

CREATE TABLE workers (
  id TEXT PRIMARY KEY,
  task_id TEXT NOT NULL UNIQUE,
  pid INTEGER NOT NULL,
  owner_pid INTEGER NOT NULL,  -- PID of the process that spawned this worker
  agent TEXT NOT NULL DEFAULT 'claude',
  started_at INTEGER NOT NULL,
  FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE
);
CREATE INDEX idx_workers_owner ON workers(owner_pid);

Design principle: This is a coordination registry, not full worker state. ChildProcess handles, timeout timers, and output streams stay in-memory. The table answers one question: "how many workers exist across all processes?"

File: src/implementations/database.ts — Add migration v9.

2b. Worker Lifecycle Writes

On spawn: INSERT into workers table.
On completion/kill: DELETE from workers table.
On startup (recovery): DELETE stale rows where owner_pid is no longer running.

Files:

  • src/implementations/event-driven-worker-pool.ts — Add Database injection. INSERT after successful spawn, DELETE on completion/kill.
  • src/services/recovery-manager.ts — On startup, query workers table, check each owner_pid with process.kill(pid, 0). DELETE rows for dead processes. Mark their tasks as FAILED.

2c. Cross-Process Resource Checks

Change: ResourceMonitor.canSpawnWorker() queries the workers table for global worker count instead of relying on in-memory workerCount.

Files:

  • src/implementations/resource-monitor.ts — Add Database injection. canSpawnWorker() runs SELECT COUNT(*) FROM workers for global count. Keep settlingWorkers array in-memory (it's per-process and still relevant).
  • src/core/interfaces.ts — Update ResourceMonitor interface if method signatures change.

2d. Spawn Serialization Across Processes

Change: Use SQLite's built-in locking for cross-process spawn serialization. The existing in-process mutex (spawnLock in WorkerHandler) handles within-process serialization. For cross-process, use a BEGIN IMMEDIATE transaction around the spawn-decision + INSERT.

File: src/services/handlers/worker-handler.ts — Wrap the spawn decision (resource check + dequeue + spawn + INSERT) in a database.runInTransaction().

Files Changed

Modified

  • src/implementations/database.ts — migration v9
  • src/implementations/event-driven-worker-pool.ts — INSERT/DELETE on spawn/completion
  • src/services/recovery-manager.ts — stale worker cleanup
  • src/implementations/resource-monitor.ts — global worker count from DB
  • src/core/interfaces.ts — ResourceMonitor interface update
  • src/services/handlers/worker-handler.ts — cross-process spawn serialization

Risk

Medium — new table, cross-process logic. No existing behavior should break since the workers table is additive.

Verification

  • npm run build — clean compilation
  • npx biome check src/ tests/ — no lint issues
  • All test groups pass
  • Two concurrent beat run — workers table shows both, resource checks account for both
  • Kill process mid-task → restart → stale workers cleaned, tasks marked failed

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureArchitecture improvementenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions