Operations Runbook

This runbook is the day-2 operating guide for RepoAgents maintainers.

Scope

Use this document when you need to:

start or stop routine repository maintenance
inspect a failed or stuck run
re-run a specific issue safely
validate webhook-driven execution
review runtime artifacts, logs, and dashboard output

Primary operator loop

Normal operating flow:

Check environment health with uv run repoagents doctor
Inspect the latest state with uv run repoagents status
Render the local dashboard with uv run repoagents dashboard
Run the polling loop with uv run repoagents run
Use uv run repoagents service start when you want a detached repo-local worker
Use uv run repoagents trigger <issue-id> or uv run repoagents webhook ... for targeted intervention

Command reference

uv run repoagents doctor
uv run repoagents doctor --format all
uv run repoagents run
uv run repoagents run --once
uv run repoagents run --dry-run
uv run repoagents service start
uv run repoagents service status
uv run repoagents service restart
uv run repoagents service stop
uv run repoagents trigger 123
uv run repoagents trigger 123 --dry-run
uv run repoagents webhook --event issues --payload webhook.json --dry-run
uv run repoagents status
uv run repoagents status --issue 123
uv run repoagents status --format all
uv run repoagents ops snapshot --archive
uv run repoagents ops status
uv run repoagents ops status --format all
uv run repoagents github smoke --require-write-ready
uv run repoagents ops snapshot --include-cleanup-preview --include-cleanup-result --include-sync-check --include-sync-repair-preview --archive
uv run repoagents ops snapshot --archive --history-limit 10 --prune-history

uv run repoagents service restart waits for the current worker to stop before launching a replacement. uv run repoagents service stop also clears a stale worker record when the saved pid is already gone. The command also refreshes:

.ai-repoagents/reports/ops/latest.json
.ai-repoagents/reports/ops/latest.md
.ai-repoagents/reports/ops/history.json
.ai-repoagents/reports/ops/history.md
.ai-repoagents/reports/ops-status.json
.ai-repoagents/reports/ops-status.md
.ai-repoagents/reports/ops-brief.json
.ai-repoagents/reports/ops-brief.md

Use --prune-history only for bundle/archive paths managed under .ai-repoagents/reports/ops/. External custom output directories remain indexed but are not deleted by ops history pruning. uv run repoagents sync ls uv run repoagents sync show local-markdown/issue-1/-comment.md uv run repoagents sync health --issue 1 --format all uv run repoagents sync check --issue 1 uv run repoagents sync repair --issue 1 --dry-run uv run repoagents sync audit --format all uv run repoagents sync apply --issue 1 --tracker local-file --action comment --latest uv run repoagents sync apply --issue 1 --tracker local-markdown --action comment --latest uv run repoagents clean --sync-applied --dry-run uv run repoagents clean --sync-applied --dry-run --report --report-format all uv run repoagents retry 123 uv run repoagents clean --dry-run uv run repoagents clean uv run repoagents dashboard uv run repoagents dashboard --format all


## Runtime locations

- config: `.ai-repoagents/repoagents.yaml`
- state: `.ai-repoagents/state/runs.json`
- artifacts: `.ai-repoagents/artifacts/issue-<id>/<run-id>/`
- workspaces: `.ai-repoagents/workspaces/issue-<id>/<run-id>/repo/`
- dashboard: `.ai-repoagents/dashboard/index.html`
- dashboard JSON snapshot: `.ai-repoagents/dashboard/index.json`
- doctor snapshots: `.ai-repoagents/reports/doctor.json`, `.ai-repoagents/reports/doctor.md`
- status snapshots: `.ai-repoagents/reports/status.json`, `.ai-repoagents/reports/status.md`
- ops status snapshots: `.ai-repoagents/reports/ops-status.json`, `.ai-repoagents/reports/ops-status.md`
- dashboard Markdown snapshot: `.ai-repoagents/dashboard/index.md`
- sync health reports: `.ai-repoagents/reports/sync-health.json`, `.ai-repoagents/reports/sync-health.md`
- sync audit reports: `.ai-repoagents/reports/sync-audit.json`, `.ai-repoagents/reports/sync-audit.md`
- cleanup reports: `.ai-repoagents/reports/cleanup-preview.json`, `.ai-repoagents/reports/cleanup-result.json`
- logs when enabled: `.ai-repoagents/logs/repoagents.jsonl`
- sync staging: `.ai-repoagents/sync/<tracker>/issue-<id>/`
- sync applied archive: `.ai-repoagents/sync-applied/<tracker>/issue-<id>/`

## Dashboard sync handoffs and retention

The dashboard now includes `Sync handoffs` and `Sync retention` sourced from `.ai-repoagents/sync-applied/**/manifest.json`, an `Ops snapshots` section sourced from `.ai-repoagents/reports/ops/latest.*` and `history.*`, plus direct `Reports` links for sync audit, sync health, GitHub smoke, ops status, ops brief, and cleanup exports under `.ai-repoagents/reports/`.

Use `repoagents ops status` when you want the same ops index posture in one CLI/export surface, but with the latest bundle manifest component summaries, current handoff brief headline/severity, landing paths, and recent history preview included directly in the output.
When `ops-status.json` or `ops-brief.json` exists, the dashboard `Reports` section renders matching cards and cross-links them to related report exports referenced by the latest bundle. `repoagents ops snapshot` now writes `ops-status.json|md`, `ops-brief.json|md`, and, for live GitHub REST trackers, `github-smoke.json|md`, plus bundle landing files `index.html`, `README.md` inside the handoff bundle, and refreshes root `ops-status.json|md`, `ops-brief.json|md`, `sync-health.json|md`, and live `github-smoke.json|md` at the repo root, so both the handoff bundle and dashboard/report surfaces can follow the latest sync posture, landing summary, and GitHub publish readiness without a separate command.

Use it when you need to:

- inspect which staged publish proposals were already handled
- open archived `branch` / `pr` / `pr-body` bundle members from one place
- follow normalized links such as `metadata_artifact` after the original staged file has moved
- review which applied issue archives are `stable`, `prunable`, or `repair-needed`
- estimate cleanup impact with prunable group counts, prunable bytes, and oldest prunable age before running `clean`

Refresh all exports with:

```bash
uv run repoagents dashboard --format all

Normal checks

Before starting live runs:

confirm codex --version and codex login
confirm GITHUB_TOKEN if the tracker is in live GitHub REST mode
run uv run repoagents doctor
run uv run repoagents github smoke --require-write-ready before enabling unattended live writes
the smoke gate now expects default-branch protection, PR review requirements, required status checks, and readable GitHub repo permissions for draft-PR publish
use a dedicated sandbox repo/issue for REPOREPUBLIC_GITHUB_WRITE_E2E=1 and REPOREPUBLIC_GITHUB_PR_E2E=1 tests; the comment test deletes its comment and the draft PR test closes the PR and deletes the branch during cleanup
if the repo uses workspace.strategy: worktree, confirm the target repo is a valid Git work tree
inspect workspace.dirty_policy before running against a locally modified repository

Failure handling

A run is `retry_pending`

Inspect the latest run: uv run repoagents status --issue <id>
Open the role artifacts for the failing run
Fix the underlying cause if needed
Force a new retry window: uv run repoagents retry <id>
Rebuild the dashboard: uv run repoagents dashboard

A run is `failed`

Check these first:

Codex CLI availability and login state
GitHub auth, rate limit, or network health
policy findings in reviewer artifacts
dirty working tree or worktree setup problems

Then either:

re-run one issue directly with uv run repoagents trigger <id>
or schedule it back into retry with uv run repoagents retry <id>

The polling loop appears idle

Use:

uv run repoagents status
uv run repoagents run --once
uv run repoagents dashboard

If run --once finds nothing:

verify the tracker input source
verify issue state and labels
verify the issue fingerprint did not already complete
use trigger for a one-off forced rerun when appropriate

Webhook payload did not start a run

Save the payload to disk
Validate it with:

uv run repoagents webhook --event issues --payload webhook.json --dry-run

Confirm the payload maps to an open issue number
If the issue is intentionally already complete, use trigger --force only after human review

Safe manual intervention

Use the least destructive option first:

status --issue <id> to inspect
dashboard to rebuild the view
retry <id> to reopen the run
trigger <id> --dry-run to preview one issue
trigger <id> to execute one issue
clean --dry-run before any cleanup

Avoid deleting state or workspace files by hand unless the CLI cleanup path cannot recover.

Offline publish handoff

When a tracker stages publish proposals locally instead of applying them directly:

inspect the inventory with uv run repoagents sync ls
open one artifact with uv run repoagents sync show ...
apply supported tracker helpers with uv run repoagents sync apply ... when appropriate, for example local-file or local-markdown comment and label proposals
copy any remaining handoff proposal manually
review the archive under .ai-repoagents/sync-applied/ and the dashboard Sync handoffs / Sync retention sections
use uv run repoagents clean --sync-applied --dry-run before pruning old applied handoff groups Capture a shareable machine-readable cleanup preview with --report --report-format all when the cleanup needs review.
use uv run repoagents sync health --issue <id> --format all when you want one combined snapshot before choosing between repair, audit, or cleanup
if manifest drift is suspected, run uv run repoagents sync check --issue <id> before sync repair
export uv run repoagents sync audit --issue <id> --format all when you need a narrower shareable audit snapshot

Human approval boundary

RepoAgents remains conservative by default:

reviewer approval does not merge code
dangerous diffs still require human judgment
docs/tests changes may open a draft PR depending on policy, but merge stays manual
secrets, CI/CD changes, auth-sensitive paths, and large deletions should be reviewed as incidents
use repoagents approval ls / repoagents approval show <issue-id> to inspect pending publication actions
repoagents approval approve|reject <issue-id> records the maintainer decision and artifacts; publish remains manual in this slice

Recommended routine

Daily:

doctor
status
dashboard

For each incident:

inspect the failing run
collect artifacts and logs
decide whether to retry, trigger, or leave the issue pending

Weekly:

clean stale local data with clean --dry-run then clean
review template drift with repoagents init --upgrade

Related examples

live GitHub ops blueprint: ../examples/live-github-ops/README.md
live GitHub rollout walkthrough: ./live-github-ops.md
sandbox publish rollout example: ../examples/live-github-sandbox-rollout/README.md
sandbox publish rollout walkthrough: ./live-github-sandbox-rollout.md
local webhook receiver: ../examples/webhook-receiver/README.md
signed local webhook receiver: ../examples/webhook-signature-receiver/README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operations Runbook

Scope

Primary operator loop

Command reference

Normal checks

Failure handling

A run is `retry_pending`

A run is `failed`

The polling loop appears idle

Webhook payload did not start a run

Safe manual intervention

Offline publish handoff

Human approval boundary

Recommended routine

Related examples

FilesExpand file tree

runbook.md

Latest commit

History

runbook.md

File metadata and controls

Operations Runbook

Scope

Primary operator loop

Command reference

Normal checks

Failure handling

A run is retry_pending

A run is failed

The polling loop appears idle

Webhook payload did not start a run

Safe manual intervention

Offline publish handoff

Human approval boundary

Recommended routine

Related examples

A run is `retry_pending`

A run is `failed`