feat(runbooks): runbook execution engine + pup workflows PoC by platinummonkey · Pull Request #146 · datadog-labs/pup

platinummonkey · 2026-03-03T00:19:29Z

Summary

Proof-of-concept implementation of the runbook execution engine proposed in #143 (discussion). Adds two new command groups — pup runbooks and pup workflows — with no new dependencies.

What's included

`pup runbooks`

YAML runbooks live in ~/.config/pup/runbooks/. Each file defines sequential steps that mix pup commands, shell tools, Datadog Workflow triggers, HTTP calls, and interactive confirm gates.

pup runbooks list [--tag=key:value ...]   # discover by tag
pup runbooks describe <name>              # show steps + vars
pup runbooks run <name> [--set K=V ...]   # execute
pup runbooks validate <name>              # lint without running
pup runbooks import <path-or-url>         # fetch into runbooks dir

Example runbook (~/.config/pup/runbooks/hello.yaml):

name: hello
description: Test runbook
vars:
  NAME:
    default: world
steps:
  - name: Say hello
    kind: shell
    run: echo "Hello, {{ NAME }}!"
  - name: List monitors
    kind: pup
    run: monitors list --limit=3

Output while running:

runbook: hello  (2 steps)  2026-03-03 00:16:08 UTC
  Test runbook

[1/2] Say hello  (shell)  2026-03-03 00:16:08 UTC
  $ echo "Hello, pup!"
  stdout:
Hello, pup!
  ✓ done  10ms  ·  next: step 2/2 — List monitors (pup)

[2/2] List monitors  (pup)  2026-03-03 00:16:08 UTC
  $ monitors list --limit=3
  stdout:
{ ... }
  ✓ done  320ms  ·  last step

✓ done  hello  2/2 steps  330ms  2026-03-03 00:16:08 UTC

`pup workflows`

Raw REST access to the Datadog Workflows API (not covered by the typed SDK client):

pup workflows run --id=<id> [--input k=v ...] [--watch]
pup workflows instances list --workflow-id=<id>
pup workflows instances get  --workflow-id=<id> --instance-id=<id>

--watch polls every 15 s until terminal state, printing elapsed time and status to stderr. Without --watch, agent mode includes a watch_command metadata hint.

Step kinds

Kind	What it does
`pup`	Shells out to the current `pup` binary with `--output json`; supports `poll:` loops
`shell`	`sh -c "..."` with template rendering; surfaces stderr even on success
`datadog-workflow`	POST trigger + auto-poll to terminal state; emits `watch_command` hint in agent mode
`confirm`	Prompts `[y/N]`; bypassed by `--yes` / agent mode
`http`	Authenticated GET/POST via existing `client::raw_get` / `raw_post`

Control flow

{{ VAR }} and {{ VAR | default: "x" }} template substitution in all string fields
on_failure: warn | confirm | fail per step
when: always | on_success to run cleanup steps after failure
optional: true to swallow errors silently
capture: VAR_NAME to pipe step stdout into a variable for later steps
poll: { interval, timeout, until } with conditions: empty, status == X, value < N, decreasing

Reference runbooks

Three annotated examples in docs/examples/runbooks/:

deploy-service.yaml — SLO check → incident gate → DD Workflow trigger → monitor poll → Slack notify
incident-triage.yaml — fetch incident → search logs → check monitors → auto-mitigation workflow → shell diagnostics
maintenance-window.yaml — create downtime (capture ID) → drain → metric poll → confirm → delete downtime

Not in this PoC

Parallel step execution
Step retry logic
Remote runbook registry / sync
Web UI

Platform support

pup runbooks and pup workflows are native-only and excluded from all wasm builds via #[cfg(not(target_arch = "wasm32"))]. They rely on capabilities that wasm targets don't provide:

Dependency	Why it rules out wasm
`tokio::process::Command`	subprocess spawning — unavailable in wasm
`std::fs` + `dirs`	filesystem access for runbook discovery and import — unavailable in browser wasm, restricted in WASI
`chrono::Utc` (step timestamps)	fine in native/WASI but not pulled in for browser
`client::raw_post` / `raw_get`	these work in wasm, but they're only reached via the engine which can't run

The gating covers every touch point — the mod runbooks declaration, both pub mod entries in commands/mod.rs, the Commands::Runbooks and Commands::Workflows enum variants, the three subcommand enums, and the dispatch arms in main_inner(). All three build targets pass cleanly:

cargo build (native) ✓
cargo check --target wasm32-wasip2 --features wasi ✓
cargo check --target wasm32-unknown-unknown --lib --features browser ✓

Testing

# Build
cargo build

# Create a test runbook
mkdir -p ~/.config/pup/runbooks
cat > ~/.config/pup/runbooks/hello.yaml <<'YAML'
name: hello
description: Test runbook
vars:
  NAME:
    default: world
steps:
  - name: Say hello
    kind: shell
    run: echo "Hello, {{ NAME }}!"
YAML

pup runbooks list
pup runbooks validate hello
pup runbooks run hello --set NAME=pup

All existing tests pass (cargo test, cargo clippy -- -D warnings, cargo fmt --check).

Discussion: #143

🤖 Generated with Claude Code

Implements the runbooks PoC as specified: - `pup runbooks list/describe/run/validate/import` — execute YAML runbooks from ~/.config/pup/runbooks/ with {{ VAR }} templating, sequential step execution, poll loops, and confirm gates - `pup workflows run/instances list/get` — trigger Datadog Workflows and poll to completion via raw REST (POST/GET /api/v2/workflows/...) New files: - src/runbooks/mod.rs — Runbook, Step, VarDef, PollConfig types - src/runbooks/template.rs — {{ VAR }} and | default: "x" rendering - src/runbooks/loader.rs — scan runbooks dir, load/import by name - src/runbooks/engine.rs — sequential executor with polling, confirm, on_failure handling, variable capture - src/commands/runbooks.rs — list/describe/run/validate/import CLI - src/commands/workflows.rs — trigger + watch, instances list/get - docs/examples/runbooks/ — deploy-service, incident-triage, maintenance-window reference templates Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…hints Each step now shows: - Header: ► Step N/M · <name> · <kind> [HH:MM:SS] with command preview - Labeled sections: ── stdout ── and ── stderr ── blocks wrapping output - Footer: ✓/✗/⊘ <elapsed> · next: step N/M — <name> (<kind>) - Summary line with total elapsed time and pass/fail count Shell steps surface non-empty stderr even on success so warnings from curl, grep, etc. aren't silently dropped. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

- Remove RULE constant and all long separator lines - Timestamps now show full UTC date+time: 2026-03-02 18:11:01 UTC - Step header: [N/M] <name> (<kind>) <timestamp> - Output labeled with indented "stdout:" / "stderr:" markers - Summary line: ✓/⚠ done <name> N/M steps <elapsed> <timestamp> Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…ch = "wasm32"))] Neither feature is compatible with wasm targets: - loader.rs uses std::fs and reqwest::Client::new() - engine.rs uses tokio::process::Command and chrono::Utc - both require dirs-based config path resolution (native-only) Gated items: - mod runbooks; in main.rs - pub mod runbooks/workflows; in commands/mod.rs - Commands::Runbooks/Workflows variants and their subcommand enums - dispatch arms in main_inner() Verified: native build, wasm32-wasip2 (wasi feature), and wasm32-unknown-unknown --lib (browser feature) all pass. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

platinummonkey and others added 4 commits March 2, 2026 18:03

platinummonkey added enhancement New feature or request product:automation labels Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(runbooks): runbook execution engine + pup workflows PoC#146

feat(runbooks): runbook execution engine + pup workflows PoC#146
platinummonkey wants to merge 4 commits intomainfrom
feat/runbooks-poc

platinummonkey commented Mar 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

platinummonkey commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

pup runbooks

pup workflows

Step kinds

Control flow

Reference runbooks

Not in this PoC

Platform support

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

platinummonkey commented Mar 3, 2026 •

edited

Loading

`pup runbooks`

`pup workflows`