nullclaw · DonPrus · Mar 14, 2026 · Mar 13, 2026 · Mar 13, 2026 · Mar 13, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -1,76 +1,104 @@
 # NullBoiler
 
-DAG-based workflow orchestrator for NullClaw AI bot agents. Part of the Null ecosystem (NullTracker, NullClaw).
+Graph-based workflow orchestrator with unified state model for NullClaw AI bot agents. Part of the Null ecosystem (NullTracker, NullClaw).
 
 ## Tech Stack
 
 - **Language**: Zig 0.15.2
 - **Database**: SQLite (vendored in `deps/sqlite/`), WAL mode
 - **Protocol**: HTTP/1.1 REST API with JSON payloads
-- **Dispatch**: HTTP (webhook/api_chat/openai_chat), MQTT, Redis Streams
+- **Dispatch**: HTTP (webhook/api_chat/openai_chat/a2a), MQTT, Redis Streams
 - **Vendored C libs**: SQLite (`deps/sqlite/`), hiredis (`deps/hiredis/`), libmosquitto (`deps/mosquitto/`)
 
 ## Module Map
 
 | File | Role |
 |------|------|
-| `main.zig` | CLI args (`--port`, `--db`, `--config`, `--version`), HTTP accept loop, engine thread, tracker thread |
-| `api.zig` | REST API routing and 19 endpoint handlers (incl. signal, chat, tracker status) |
-| `store.zig` | SQLite layer, 30+ CRUD methods, schema migrations |
-| `engine.zig` | DAG scheduler: tick loop, 14 step type handlers, graph cycles, worker handoff |
-| `dispatch.zig` | Worker selection (tags, capacity), protocol-aware dispatch (`webhook`, `api_chat`, `openai_chat`, `mqtt`, `redis_stream`) |
+| `main.zig` | CLI args (`--port`, `--db`, `--config`, `--version`, `--export-manifest`, `--from-json`), HTTP accept loop, engine thread, tracker thread |
+| `api.zig` | REST API routing and 30+ endpoint handlers (runs, workers, workflows, checkpoints, state, SSE stream, tracker) |
+| `store.zig` | SQLite layer, CRUD methods for all tables, schema migrations (4 migration files) |
+| `engine.zig` | Graph-based state scheduler: tick loop, 7 node type handlers, checkpoints, reducers, goto, breakpoints, deferred nodes, reconciliation |
+| `state.zig` | Unified state model: 7 reducer types (last_value, append, merge, add, min, max, add_messages), overwrite bypass, ephemeral keys, state path resolution |
+| `sse.zig` | Server-Sent Events hub: per-run event queues, 5 stream modes (values, updates, tasks, debug, custom) |
+| `dispatch.zig` | Worker selection (tags, capacity, A2A preference), protocol-aware dispatch |
 | `async_dispatch.zig` | Thread-safe response queue for async MQTT/Redis dispatch (keyed by correlation_id) |
 | `redis_client.zig` | Hiredis wrapper: connect, XADD, listener thread for response streams |
 | `mqtt_client.zig` | Libmosquitto wrapper: connect, publish, subscribe, listener thread for response topics |
-| `templates.zig` | Prompt template rendering: `{{input.X}}`, `{{steps.ID.output}}`, `{{item}}`, `{{task.X}}`, `{{debate_responses}}`, `{{chat_history}}`, `{{role}}` |
+| `templates.zig` | Prompt template rendering: state-based `{{state.X}}`, legacy `{{input.X}}`, `{{item}}`, `{{task.X}}`, `{{attempt}}`, conditional blocks |
 | `callbacks.zig` | Fire-and-forget webhook callbacks on step/run events |
 | `config.zig` | JSON config loader (`Config`, `WorkerConfig`, `EngineConfig`, `TrackerConfig`) |
-| `types.zig` | `RunStatus`, `StepStatus`, `StepType` (14 types), `WorkerStatus`, `TrackerTaskState`, row types |
+| `types.zig` | `RunStatus`, `StepStatus`, `StepType` (7 types), `WorkerStatus`, `ReducerType`, row types |
 | `tracker.zig` | Pull-mode tracker thread: poll NullTickets, claim tasks, heartbeat leases, stall detection |
 | `tracker_client.zig` | HTTP client for NullTickets API (claim, heartbeat, transition, fail, artifacts) |
 | `workspace.zig` | Workspace lifecycle: create, hook execution, cleanup, path sanitization |
 | `subprocess.zig` | NullClaw subprocess: spawn, health check, prompt sending, kill |
-| `workflow_loader.zig` | Load JSON workflow definitions from `workflows/` directory |
+| `workflow_loader.zig` | Load JSON workflow definitions from `workflows/` directory, hot-reload watcher |
+| `workflow_validation.zig` | Graph-based workflow validation: reachability, cycles, state key refs, route/send targets |
 | `ids.zig` | UUID v4 generation, `nowMs()` |
-| `migrations/001_init.sql` | 6 tables: workers, runs, steps, step_deps, events, artifacts |
-| `migrations/002_advanced_steps.sql` | 3 tables: cycle_state, chat_messages, saga_state + ALTER TABLE |
+| `metrics.zig` | Prometheus-style metrics counters |
+| `strategy.zig` | Pluggable strategy map for workflow execution |
+| `worker_protocol.zig` | Protocol-specific request body builders |
+| `worker_response.zig` | Protocol-specific response parsers |
+| `export_manifest.zig` | Export tool manifest for CLI integration |
+| `from_json.zig` | Import workflow from JSON CLI command |
 
 ## Build / Test / Run
 
 ```sh
 zig build              # build
-zig build test         # unit tests
+zig build test         # unit tests (320 tests)
 zig build && bash tests/test_e2e.sh   # e2e tests (requires Python 3 for mock workers)
 ./zig-out/bin/nullboiler --port 8080 --db nullboiler.db --config config.json
 ```
 
+## Step Types (7)
+
+`task`, `route`, `interrupt`, `agent`, `send`, `transform`, `subgraph`
+
+## Reducers (7)
+
+`last_value`, `append`, `merge`, `add`, `min`, `max`, `add_messages`
+
 ## API Endpoints
 
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/health` | Health check |
+| GET | `/metrics` | Prometheus metrics |
 | POST | `/workers` | Register worker |
 | GET | `/workers` | List workers |
 | DELETE | `/workers/{id}` | Remove worker |
-| POST | `/runs` | Create workflow run |
-| GET | `/runs` | List runs |
+| POST | `/runs` | Create workflow run (legacy step-array or graph format) |
+| GET | `/runs` | List runs (supports ?status= filter) |
 | GET | `/runs/{id}` | Get run details |
 | POST | `/runs/{id}/cancel` | Cancel run |
 | POST | `/runs/{id}/retry` | Retry failed run |
+| POST | `/runs/{id}/resume` | Resume interrupted run (with optional state updates) |
+| POST | `/runs/{id}/state` | Inject state into running run (pending injection) |
+| POST | `/runs/{id}/replay` | Replay run from a checkpoint |
+| POST | `/runs/fork` | Fork run from a checkpoint into a new run |
 | GET | `/runs/{id}/steps` | List steps for run |
 | GET | `/runs/{id}/steps/{step_id}` | Get step details |
-| POST | `/runs/{id}/steps/{step_id}/approve` | Approve approval step |
-| POST | `/runs/{id}/steps/{step_id}/reject` | Reject approval step |
 | GET | `/runs/{id}/events` | List run events |
-| POST | `/runs/{id}/steps/{step_id}/signal` | Signal a waiting step |
-| GET | `/runs/{id}/steps/{step_id}/chat` | Get group_chat transcript |
-| GET | `/tracker/status` | Pull-mode tracker status (running tasks, concurrency, counters) |
+| GET | `/runs/{id}/checkpoints` | List checkpoints for run |
+| GET | `/runs/{id}/checkpoints/{cpId}` | Get checkpoint details |
+| GET | `/runs/{id}/stream` | SSE stream (supports ?mode=values\|updates\|tasks\|debug) |
+| POST | `/workflows` | Create workflow definition |
+| GET | `/workflows` | List workflow definitions |
+| GET | `/workflows/{id}` | Get workflow definition |
+| PUT | `/workflows/{id}` | Update workflow definition |
+| DELETE | `/workflows/{id}` | Delete workflow definition |
+| POST | `/workflows/{id}/validate` | Validate workflow definition |
+| GET | `/workflows/{id}/mermaid` | Export workflow as Mermaid diagram |
+| POST | `/workflows/{id}/run` | Start a run from a stored workflow |
+| GET | `/rate-limits` | Get current rate limit info per worker |
+| POST | `/admin/drain` | Enable drain mode |
+| GET | `/tracker/status` | Pull-mode tracker status |
 | GET | `/tracker/tasks` | List running pull-mode tasks |
 | GET | `/tracker/tasks/{task_id}` | Get single pull-mode task details |
-
-## Step Types
-
-`task`, `fan_out`, `map`, `condition`, `approval`, `reduce`, `loop`, `sub_workflow`, `wait`, `router`, `transform`, `saga`, `debate`, `group_chat`
+| GET | `/tracker/stats` | Tracker statistics |
+| POST | `/tracker/refresh` | Force tracker poll |
+| POST | `/internal/agent-events/{run_id}/{step_id}` | Agent event callback (from NullClaw) |
 
 ## Coding Conventions
 
@@ -83,16 +111,47 @@ zig build && bash tests/test_e2e.sh   # e2e tests (requires Python 3 for mock wo
 
 ## Architecture
 
-- Single-threaded HTTP accept loop on main thread
-- Background engine thread polls DB for active runs (+ polls async response queue for MQTT/Redis steps)
-- `std.atomic.Value(bool)` for coordinated shutdown
-- Config workers seeded into DB on startup (source = "config")
-- Schema in `migrations/001_init.sql` + `002_advanced_steps.sql`, applied on `Store.init`
-- Graph cycles: condition/router can route back to completed steps, engine creates new step instances per iteration
-- Worker handoff: dispatch result can include `handoff_to` for chained delegation (max 5)
-- Async dispatch: MQTT/Redis workers use two-phase dispatch (publish → engine polls response queue)
-- Background listener threads (MQTT/Redis) started conditionally when async workers are configured
-- Pull-mode tracker thread (conditional): polls NullTickets for tasks, claims work, manages subprocess lifecycles
+- **Unified state model**: Every node reads from state, returns partial updates, engine applies reducers
+- **Graph-based execution**: Workflow = `{nodes: {}, edges: [], state_schema: {}}` with `__start__` and `__end__` synthetic nodes
+- **Checkpoints**: State snapshot after every node, enabling fork/replay/resume
+- **Conditional edges**: Route nodes produce values, edges like `["router:yes", "next"]` are taken when route result matches
+- **Deferred nodes**: Nodes with `"defer": true` execute right before `__end__`
+- **Command primitive**: Workers can return `{"goto": "node_name"}` to override normal graph traversal
+- **Breakpoints**: `interrupt_before` / `interrupt_after` arrays pause execution
+- **Subgraph**: Inline child workflow execution with input/output mapping (max recursion depth 10)
+- **Multi-turn agents**: Agent nodes can loop with `continuation_prompt` up to `max_turns`
+- **Configurable runs**: Per-run config stored as `state.__config`
+- **Node-level cache**: FNV hash of (node_name, rendered_prompt) with configurable TTL
+- **Token accounting**: Cumulative input/output token tracking per step and per run
+- **Workflow hot-reload**: `WorkflowWatcher` polls `workflows/` directory for JSON changes, upserts into DB
+- **Reconciliation**: Check NullTickets task status between steps, cancel if task is terminal
+
+### Thread Model
+
+```
+Main thread:       HTTP accept loop (push API)
+Engine thread:     Graph tick loop (state-based scheduler)
+Tracker thread:    Poll NullTickets -> claim -> workspace -> subprocess/dispatch
+MQTT listener:     (conditional, for async MQTT workers)
+Redis listener:    (conditional, for async Redis workers)
+```
+
+### SSE Streaming
+
+5 modes for real-time consumption via `GET /runs/{id}/stream?mode=X`:
+- `values` -- full state after each step
+- `updates` -- node name + partial state updates
+- `tasks` -- task start/finish with metadata
+- `debug` -- everything with step number + timestamp
+- `custom` -- user-defined events from worker output (`ui_messages`, `stream_messages`)
+
+## Database
+
+SQLite with WAL mode. Schema across 4 migrations:
+- `001_init.sql`: workers, runs, steps, step_deps, events, artifacts
+- `002_advanced_steps.sql`: cycle_state, chat_messages, saga_state (legacy, unused by current engine)
+- `003_tracker.sql`: tracker_runs
+- `004_orchestration.sql`: workflows, checkpoints, agent_events, pending_state_injections, node_cache, pending_writes + ALTER TABLE extensions for state_json, config_json, parent_run_id, token accounting
 
 ## Pull-Mode (NullTickets Integration)
 
@@ -131,27 +190,3 @@ Optional pull-mode where NullBoiler acts as an agent polling NullTickets for wor
 ```
 
 If `tracker` is absent or null, the tracker thread does not start and push-mode operates unchanged.
-
-### Workflow Definitions
-
-JSON files in `workflows/` directory. Two execution modes:
-- `subprocess` — spawn NullClaw child process per task (isolated workspace)
-- `dispatch` — use existing registered workers (no workspace)
-
-Three-axis concurrency: global (`max_concurrent_tasks`) + per-pipeline + per-role limits.
-
-### Thread Model
-
-```
-Main thread:       HTTP accept loop (push API — unchanged)
-Engine thread:     DAG tick loop (unchanged)
-Tracker thread:    Poll NullTickets → claim → workspace → subprocess/dispatch
-MQTT listener:     (unchanged, conditional)
-Redis listener:    (unchanged, conditional)
-```
-
-## Database
-
-SQLite with WAL mode. Schema: 9 tables across 2 migrations.
-- `001_init.sql`: workers, runs, steps, step_deps, events, artifacts
-- `002_advanced_steps.sql`: cycle_state, chat_messages, saga_state + iteration_index/child_run_id columns on steps
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 nullclaw contributors
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -67,6 +67,22 @@ This keeps the architecture modular, simpler to reason about, and easier to evol
 
 See additional integration docs in [`docs/`](./docs).
 
+## Workflow Graph Features
+
+The orchestration graph runtime supports:
+
+- `task`, `agent`, `route`, `interrupt`, `send`, `transform`, and `subgraph` nodes
+- run replay, checkpoint forking, breakpoint interrupts, and post-start state injection
+- `send` fan-out with canonical `items_key` and configurable `output_key`
+- task/agent output shaping via `output_key` and `output_mapping`
+- template access to `state.*`, `input.*`, `item.*`, `config.*`, and `store.<namespace>.<key>`
+- `transform.store_updates` for writing durable workflow memory back to NullTickets
+
+Store-backed templates and `store_updates` require a NullTickets base URL. The
+runtime resolves it from workflow fields such as `tracker_url` or from run config
+(`config.tracker_url` / `config.tracker_api_token`), which are injected into
+state as `__config`.
+
 ## Config Location
 
 - Default config path: `~/.nullboiler/config.json`

diff --git a/config.example.json b/config.example.json
@@ -3,6 +3,7 @@
   "port": 8080,
   "db": "nullboiler.db",
   "api_token": null,
+  "self_url": null,
   "workers": [
     {
       "id": "nullclaw-1",
@@ -28,6 +29,14 @@
       "model": "anthropic/claude-sonnet-4-6",
       "tags": ["writer", "editor"],
       "max_concurrent": 2
+    },
+    {
+      "id": "nullclaw-a2a",
+      "url": "http://localhost:3000",
+      "token": "set_same_value_as_nullclaw_gateway_paired_tokens",
+      "protocol": "a2a",
+      "tags": ["coder", "agent"],
+      "max_concurrent": 3
     }
   ],
   "engine": {