Agentic Workflow Audit — 2026-03-30 #23592
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Agentic Workflow Audit Agent. A newer discussion is available at Discussion #23784. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Daily audit covering the last 24 hours of agentic workflow runs in
github/gh-aw. 27 runs observed (23 completed, 4 still in progress at audit time).Summary
Workflow Health Timeline
The 65% success rate is below the expected baseline. Three distinct failure patterns were identified: lockdown check rejections, engine startup errors, and agentic control issues. Issue Monster ran reliably across 5 event-triggered runs (all succeeded).
Token & Cost Breakdown
Sergo ($1.56) and Copilot Agent Prompt Clustering ($1.34) together account for 66% of the day's cost. Four workflows have
resource_heavy_for_domain (high)assessments, suggesting model downgrade or pre-computation opportunities exist.Failure Analysis
❌ Engine Startup Errors (3 runs — high severity)
Three workflows failed with "6 engine error messages in agent-stdio.log" and 0 turns / 0 tokens consumed, indicating the agent engine never started:
issues:openedissue_commentpull_requestPattern: All three triggered on repository events (not scheduled). The engine errors occurred before any inference was attempted. This may indicate an API authentication failure, engine binary issue, or resource provisioning problem affecting event-driven runs during this window.
Recommendation: Check if the engine version or secrets were rotated around 19:47–20:32 UTC. Review agent-stdio.log from these runs for the specific error codes.
🔒 Lockdown Check Failures (3 runs — low severity, expected)
Three workflows failed due to
lockdown_check_failed=true, triggered between 20:26–20:34 UTC:These failures are expected during a lockdown/freeze window. No action required for the failures themselves, but if the lockdown was unplanned or longer than expected, the smoke tests and doc updater may need to be re-run manually.
Auto-Triage Issues (§23761615004) — failure
successinternally, but the harness marked the run as failed due to missing safe outputsresource_heavy_for_domain (high),poor_agentic_control (medium)safeoutputstool. This is the rejig docs #1 cause of workflow failures per the safe-outputs guidelines.noopwhen no triageable issues are found. Consider addingpartially_reduciblepre-computation steps to reduce turns.Daily DIFC Integrity-Filtered Events Analyzer (§23765157332) — failure
successbut workflow conclusion isfailureresource_heavy_for_domain (high)despite no token usagePerformance Concerns
Resource-Heavy Runs (high severity assessments)
8 runs flagged as
resource_heavy_for_domain (high):claude-haiku-4-5for simpler Go questionspoor_agentic_control— reduce turns, add nooppoor_agentic_control— move to deterministic stepsCross-cutting recommendation: For runs with
partially_reducible (medium)assessments, move data-gathering logic to deterministic pre-activation steps. This can reduce turns by ~50% per the assessment estimates.Patterns & Recommendations
noopcall when no work foundnoopcall as fallback when no issues to triageclaude-haiku-4-5orgpt-4.1-miniReferences:
Beta Was this translation helpful? Give feedback.
All reactions