-
Notifications
You must be signed in to change notification settings - Fork 325
[WHM] Workflow Health Dashboard — 2026-04-04 #24477
Description
Overview
Health monitoring report for 179 agentic workflows in the github/gh-aw repository. Run §23978397450 — 2026-04-04.
Score: 70/100 (↓2 from 72 last run) — API rate limiting is a new systemic issue; ongoing P1s persist; stale lock files reduced from 19→13.
| Metric | Value |
|---|---|
| Total workflows | 179 |
| Lock files present | 179/179 ✅ |
| Stale lock files | 13 |
| P1 failures tracked | 2 (ongoing) |
| New systemic issues | 1 (API rate limiting) |
| Resolved this run | 0 |
Critical Issues 🚨
1. Daily Issues Report Generator — AGENT FAILURE (P1, ongoing)
- Status: 13+ consecutive schedule failures (since Mar 24)
- Error: Agent job fails (likely at
Fetch issues datastep) - Auto-issue: #24461 (open, Apr 4) — previous [aw] Daily Issues Report Generator failed #24266 closed as
not_plannedApr 3 - Recommendation: Root cause still unresolved; investigate step-level logs in recent run §23976894248
- Priority: P1
2. Duplicate Code Detector — CODEX API RESTRICTION (P1, ongoing)
- Status: 8+ consecutive failures (since Mar 28)
- Error:
This user's access to this model has been temporarily limited for potentially suspicious activity related to cybersecurity. - Auto-issue: #24471 (open, Apr 4) — previous [aw] Duplicate Code Detector failed #24284 closed as
not_plannedApr 3 - Note: Codex API safety restriction is externally controlled; team had previously marked
not_planned - Priority: P1 — externally blocked
New Systemic Issue: API Rate Limiting ⚠️
Pattern detected — multiple workflows failing at pre_activation step with API rate limit exceeded for installation:
| Workflow | Run | Time (UTC) | Stage |
|---|---|---|---|
| Issue Monster | §23971928758 | 05:07 | pre_activation |
| Daily CLI Performance Agent | §23972374984 | 05:35 | pre_activation |
| Agentic Maintenance | §23971979119 | 05:11 | zizmor-scan (API rate limit) |
Error: API rate limit exceeded for installation. Request ID: ... during pre_activation check runs API.
Root cause: Many workflows scheduled at the same time window (~05:00-05:40 UTC) hitting the GitHub installation API rate limit simultaneously.
Impact: Issue Monster (~15% failure rate over 40 runs); pre_activation failures cascade to no workflow execution.
Recommendation: Stagger schedule times for high-frequency workflows; or retry on rate limit errors in the pre_activation step.
Warnings ⚠️
Stale Lock Files (13)
Down from 19 last run (good progress), but 13 workflows still running outdated compiled definitions:
View 13 stale lock files
| Workflow | File |
|---|---|
| Tidy | tidy.md |
| Daily Security Red Team | daily-security-red-team.md |
| Agentic Observability Kit | agentic-observability-kit.md |
| Layout Spec Maintainer | layout-spec-maintainer.md |
| Dev Hawk | dev-hawk.md |
| Firewall | firewall.md |
| Prompt Clustering Analysis | prompt-clustering-analysis.md |
| GPClean | gpclean.md |
| Weekly Safe Outputs Spec Review | weekly-safe-outputs-spec-review.md |
| Release | release.md |
| Daily CLI Tools Tester | daily-cli-tools-tester.md |
| Video Analyzer | video-analyzer.md |
| Daily Malicious Code Scan | daily-malicious-code-scan.md |
Run make recompile or gh aw compile to update.
Previously Marked not_planned (P2, stable)
Team decision to not investigate further:
- Smoke Codex: Codex API restriction
- Smoke Gemini: Exit code 41
- Smoke Create Cross-Repo PR: push_repo_memory git branch bug
- Smoke Update Cross-Repo PR: Same root cause
Intermittent Failures (Single-run, monitoring)
View single-run failures this cycle
| Workflow | Run | Failure | Notes |
|---|---|---|---|
| Workflow Normalizer | §23966459696 | safe_outputs job | Artifact uploaded OK; processing error |
| Auto-Triage Issues | §23957755831 | safe_outputs job | Artifact uploaded OK; processing error |
| Daily Observability Report | §23966346682 | agent job | Docker build completed; agent failure |
| Super Linter Report | §23949152392 | EACCES on super-linter.log | Permission error uploading artifact |
| GitHub MCP Structural Analysis | §23945544719 | 1/1 runs | Isolated failure |
| Contribution Check | §23947460225 | 1/6 runs | Likely rate limit |
Healthy Workflows ✅
~160+ workflows operating normally. Notable recent successes:
- Terminal Stylist, Issue Monster (85% success), PR Triage Agent, Smoke Agent workflows all passing.
Systemic Issues Summary
-
API Rate Limiting: ~05:00-05:40 UTC window saturating GitHub installation API; affects pre_activation for high-frequency workflows. Multiple independent failures on Apr 4. Recommend schedule staggering.
-
Codex API restrictions (ongoing): Any Codex workflow performing security/code analysis may trigger OpenAI safety check. Currently affecting Duplicate Code Detector; watch for others.
-
Safe-outputs processing errors (new, isolated): Workflow Normalizer and Auto-Triage Issues both had safe_outputs job failures despite artifact upload succeeding. May indicate an issue with safe-output processing (downstream webhook or label validation).
-
Stale lock files (13): Reduced from 19 but still elevated. Likely from active .md edits. Run
make recompile.
Trends
- Overall score: 70/100 (↓2 from 72)
- Score trajectory: 73 → 74 → 75 → 72 → 70 (↓↓)
- New failures this cycle: API rate limit cluster (+1 systemic pattern)
- Resolved this cycle: 0
- Stale lock files: 13 (↓6 from 19) ✅
Actions Taken
- Confirmed Daily Issues Report auto-issue: #24461
- Confirmed Duplicate Code Detector auto-issue: #24471
- Identified new API rate limit systemic pattern (no prior issue — part of this dashboard)
- Updated shared memory
Last updated: 2026-04-04T12:00Z
Next check: 2026-04-05T12:00Z
Run: §23978397450
Note
🔒 Integrity filter blocked 1 item
The following item were blocked because they don't meet the GitHub integrity level.
- #19099
search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
To allow these resources, lower min-integrity in your GitHub frontmatter:
tools:
github:
min-integrity: approved # merged | approved | unapproved | noneGenerated by Workflow Health Manager - Meta-Orchestrator · ● 2.7M · ◷
- expires on Apr 5, 2026, 12:06 PM UTC