Skip to content

epic: Architectural Simplification — Hybrid Event Model + SQLite Coordination (v0.6.0) #87

@dean0x

Description

@dean0x

Overview

Backbeat's event-driven architecture is over-applied. A full audit of the codebase identified that 18 of the 42 events are pure overhead — query events that add latency, linear chains pretending to be event-driven, informational events consumed only for debug logging, and dead code (defined but never emitted or subscribed). The architecture works, but it's more complex than necessary.

Separately, a critical concurrency gap exists: every beat run and beat mcp start calls bootstrap(), creating an independent in-memory runtime (queue, worker pool, event bus). Multiple CLI invocations share SQLite but have zero awareness of each other's workers, causing uncoordinated spawning and inaccurate resource monitoring.

Additionally, task output is captured in-memory only (OutputCapture uses Map<TaskId, OutputBuffer>). Cross-process output visibility is impossible — beat task logs from a different process returns nothing. An OutputRepository with SQLite backing (task_output table) exists but is never wired into the live capture path.

Decisions

  1. Hybrid event model — Keep events for terminal-state fan-out (where multiple independent handlers react). Replace query events, linear trigger chains, and informational events with direct calls.
  2. SQLite coordination over daemon — Add a workers table for cross-process worker tracking instead of running a daemon. Backbeat's orchestration is infrequent (spawns every 10s+); a daemon is overkill.
  3. Output persistence — Wire the existing OutputRepository into ProcessConnector so output is persisted to SQLite during capture. Enables cross-process beat task logs and the lightweight CLI path.
  4. Lightweight CLI path — Read-only CLI commands (task status, task logs) should not bootstrap the full runtime. Direct repository access is sufficient.
  5. Do this before new features — This refactoring affects the core runtime. Issues fix: RecoveryManager should check dependencies before re-enqueuing QUEUED tasks #84 (RecoveryManager deps), refactor: wrap ScheduleExecutor handleMissedRun in transaction #83 (ScheduleExecutor tx), cancelTasks on CancelSchedule only covers latest execution #82 (cancelTasks) all become simpler after simplification.

Phases

All phases ship together as v0.6.0 (includes 3 already-merged PRs: #78, #85, #86). Each phase gets its own branch/PR for reviewability.

Phase Issue Scope Risk
Phase 1 #88 Simplify Event System — remove 18 events, replace with direct calls Medium
Phase 2 #89 SQLite Worker Coordination + Output Persistence Medium
Phase 3 #90 Lightweight CLI Path — read-only commands skip bootstrap Low

Impact on Open Issues

Issue Effect
#84 RecoveryManager dependency checks Simpler after Phase 1 — direct repo calls available
#83 ScheduleExecutor handleMissedRun tx Still relevant, independent. Can do during or after.
#82 cancelTasks coverage Still relevant, independent.
#79 Task loops Build on top of simplified architecture
#31 Tech debt — QueryHandler findAllUnbounded() Resolved by Phase 1 (query events removed)
#31 Tech debt — pagination in TaskStatusQuery Resolved by Phase 1 (no more query events)

Complete Event Disposition (42 events)

Removed — 18 events

Event Phase Reason
TaskStatusQuery 1a Query event → direct taskRepo.findById()
TaskStatusResponse 1a Query response → removed with query
TaskLogsQuery 1a Query event → direct outputCapture.getOutput()
TaskLogsResponse 1a Query response → removed with query
NextTaskQuery 1a Queue query → direct queue.dequeue()
ScheduleQuery 1a Vestigial — never emitted, ScheduleManager already uses direct repo calls
ScheduleQueryResponse 1a Vestigial — response for never-emitted query
TaskPersisted 1b Linear chain → TaskManager saves directly
OutputCaptured 1c Informational — debug logging only
WorkerSpawned 1c Informational — already logged at spawn site
WorkerKilled 1c Only subscriber (AutoscalingManager) is advisory
RecoveryStarted 1c Informational — no functional subscribers
RecoveryCompleted 1c Informational — no functional subscribers
SystemResourcesUpdated 1c Only subscriber (AutoscalingManager) is advisory
TaskResumed 1c Informational — emitted but no subscribers
TaskConfigured 1c Dead code — never emitted, never subscribed
TaskDeleted 1c Orphaned — DependencyHandler subscribes but never emitted
LogsRequested 1c Orphaned — OutputHandler subscribes but never emitted

Retained — 24 events

Event Purpose Subscribers
TaskDelegated Fan-out: dependency validation DependencyHandler
TaskQueued Trigger: worker spawning WorkerHandler
TaskStarting Status update: pre-start PersistenceHandler
TaskStarted Status update: running PersistenceHandler
TaskCompleted Fan-out: persist, deps, checkpoint, schedule PersistenceHandler, DependencyHandler, CheckpointHandler, ScheduleHandler
TaskFailed Fan-out: persist, deps, checkpoint, schedule PersistenceHandler, DependencyHandler, CheckpointHandler, ScheduleHandler
TaskCancelled Fan-out: persist, deps, checkpoint, schedule PersistenceHandler, DependencyHandler, CheckpointHandler, ScheduleHandler
TaskTimeout Fan-out: persist, deps, schedule PersistenceHandler, DependencyHandler, ScheduleHandler
TaskCancellationRequested Command: cancel task WorkerHandler
TaskUnblocked Trigger: enqueue unblocked task QueueHandler
TaskDependencyAdded Dependency lifecycle DependencyHandler
TaskDependencyResolved Dependency resolution tracking DependencyHandler
TaskDependencyFailed Dependency cascade failure DependencyHandler
ScheduleCreated Schedule lifecycle: persist + compute nextRunAt ScheduleHandler
ScheduleTriggered Schedule lifecycle: execution audit ScheduleHandler
ScheduleExecuted Schedule lifecycle: execution tracking ScheduleHandler
ScheduleMissed Schedule lifecycle: missed run policy ScheduleHandler
ScheduleCancelled Schedule lifecycle: cancel ScheduleHandler
SchedulePaused Schedule lifecycle: pause ScheduleHandler
ScheduleResumed Schedule lifecycle: resume ScheduleHandler
ScheduleExpired Schedule lifecycle: expiry ScheduleHandler
ScheduleUpdated Schedule lifecycle: update ScheduleHandler
CheckpointCreated Checkpoint persistence notification (consumers optional)
RequeueTask Retry mechanism QueueHandler

Files Changed (Summary)

Modified

File Phases
src/services/task-manager.ts 1a, 1b, 1c
src/core/events/events.ts 1a, 1b, 1c
src/services/handlers/worker-handler.ts 1a, 1c, 2d
src/services/handlers/queue-handler.ts 1a, 1b
src/services/handlers/persistence-handler.ts 1b
src/services/handlers/dependency-handler.ts 1c, 1d
src/services/handlers/schedule-handler.ts 1a
src/services/handler-setup.ts 1a, 1c
src/bootstrap.ts 1a, 1b, 2e, 3b
src/implementations/event-driven-worker-pool.ts 1c, 2b
src/implementations/output-capture.ts 1c
src/implementations/resource-monitor.ts 2c
src/implementations/database.ts 2a
src/services/recovery-manager.ts 1c, 2b
src/services/process-connector.ts 2e
src/core/interfaces.ts 2c

Removed

File Phase Reason
src/services/handlers/query-handler.ts 1a Replaced by direct repo calls
src/services/handlers/output-handler.ts 1c Only consumed orphaned/debug events
src/services/autoscaling-manager.ts 1c Purely advisory — logs scaling opportunities but never spawns
tests/unit/services/handlers/query-handler.test.ts 1a Handler removed
tests/unit/services/handlers/output-handler.test.ts 1c Handler removed
tests/unit/services/autoscaling-manager.test.ts 1c Manager removed

Added

File Phase
src/cli/read-only-context.ts 3a

Verification (End-to-End)

  1. Single beat run "hello" — task completes normally
  2. Two concurrent beat run — workers table shows both, resource checks account for both
  3. beat task status — fast (no bootstrap), reads directly from SQLite
  4. beat task logs from a different process — returns output persisted via OutputRepository
  5. beat mcp start — MCP server starts, schedule executor runs, recovery works
  6. Kill MCP server mid-task → restart → stale workers cleaned, tasks marked failed
  7. Event audit: grep -r "emit\|subscribe" src/ shows only the 24 retained events

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureArchitecture improvementenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions