Agent Performance Report — Week of 2026-03-20 #22002
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-03-21T17:51:05.849Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Executive Summary
The ecosystem is in a strong recovery phase following last week's dual crisis (GH_AW_GITHUB_TOKEN outage + lockdown mode failure wave). Most critical workflows are healthy again.
Top performers: Issue Monster, Auto-Triage Issues, The Great Escapi, Lockfile Statistics Analysis Agent
Needs attention: Issue Triage Agent (P0), Smoke Gemini (P1), Contribution Check (safe_outputs infrastructure issue)
Performance Rankings
Top Performing Agents 🏆
Issue Monster (Effectiveness: 100%, 4/4 runs)
Engine: copilot | Avg duration: 5.5m | Tokens/run: ~171K
Delivered consistent performance across all 4 schedule runs after full GH_AW_GITHUB_TOKEN recovery. High volume, reliable execution. The benchmark for copilot-engine stability.
Auto-Triage Issues (Effectiveness: 100%, 2/2 runs)
Engine: copilot | Avg duration: 3.4m | Tokens/run: ~51K
Fast, efficient event-driven triage. Low token cost, high responsiveness to issue events. Excellent resource efficiency.
The Great Escapi (Effectiveness: 100%, 1/1 run)
Engine: copilot | Duration: 3.5m | Tokens: 77K
Rated A+ on prompt injection detection last run. Best-in-class security posture, consistent operation.
Daily Safe Outputs Conformance Checker (Effectiveness: 100%, 1/1 run)
Engine: claude | Duration: 3.4m | Turns: 3 | Cost: $0.17
Most efficient Claude workflow. Concise, targeted analysis with low turn count.
Lockfile Statistics Analysis Agent (Effectiveness: 100%, 1/1 run)
Engine: claude | Duration: 8.1m | Turns: 25 | Cost: $0.87
Deep technical analysis, high turn count reflects thorough investigation. Quality output.
AI Moderator (Effectiveness: 83%, 5/6 runs)
Engine: codex | 6 event-triggered runs across PR and issue events
1 failure on Mar 20 (§23352224372) — environment issue (git checkout failure on already-deleted branch
copilot/sub-pr-21993), not an agent quality problem. Operational resilience is good.Agents Needing Improvement 📉
Contribution Check — Safe Outputs Infrastructure Failure
Run §23353982199: Agent completed successfully (LGTM on PR Fix GH_AW_AGENT_OUTPUT nested path by enforcing /tmp/gh-aw/ artifact root #21968), but the
safe_outputsjob failed. Root cause:pr-filter-results.jsonmissing from pre-agent step. Agent adapted gracefully, created issue [Contribution Check Report] Contribution Check — 2026-03-20 #21996 with correct findings. Failure is in the job orchestration, not agent logic.Recommendation: Investigate why
pr-filter-results.jsonpre-filter step is absent in scheduled runs.Issue Triage Agent — P0 Critical (14+ days, from Workflow Health)
100% failure rate since March 6. Pre-dates GH_AW_GITHUB_TOKEN crisis — this is an independent structural failure.
Recommendation: Dedicated investigation required. See Workflow Health Report.
Smoke Gemini — P1 High
6+ consecutive schedule failures (Mar 15–20). Last success was Mar 17. Likely Gemini API or model key issue.
Recommendation: Verify Gemini API key and model endpoint availability.
Inactive / Not Triggered This Week
Many on-demand workflows ran but were correctly skipped (Scout, /cloclo, Q, Archie, PR Nitpick Reviewer 🔍, Resource Summarizer, Workflow Craft Agent, Poem Bot, etc.). These are event-driven and only activate on matching triggers — skip behavior is expected and healthy.
Quality Analysis
Output Quality by Workflow
Ratings: 1 (poor) → 5 (excellent). Score = average × 20.
Common Quality Observations
Effectiveness Analysis
Task Completion and Resource Efficiency
* Codex token tracking not available in current log format.
† Agent succeeded;
safe_outputsjob failed due to missing pre-filter artifact.Behavioral Patterns
Productive Patterns ✅
Problematic Patterns⚠️
pr-filter-results.jsonartifact that the event-driven flow produces. Agent handles this gracefully but the infrastructure gap causessafe_outputsfailure.Coverage Analysis
Well-covered areas:
Coverage gaps:
Recommendations
High Priority
Fix Issue Triage Agent structural failure (P0 — 14+ days)
Dedicate a workflow health investigation run. Root cause is independent of GH_AW_GITHUB_TOKEN. Check activation job, schedule config, and permissions.
Restore Smoke Gemini (P1)
Verify
GEMINI_API_KEYsecret validity and model endpoint. Consider adding API health check step to smoke tests.Fix Contribution Check pre-filter artifact gap
The
safe_outputsjob fails whenpr-filter-results.jsonis absent in scheduled runs. Add a default empty artifact or conditional step to handle schedule triggers gracefully.Medium Priority
Recompile 14 stale lock files (P2)
make recompileneeded for: blog-auditor, breaking-change-checker, copilot-cli-deep-research, daily-multi-device-docs-tester, daily-regulatory, dependabot-go-checker, discussion-task-miner, example-workflow-analyzer, jsweep, prompt-clustering-analysis, release, security-alert-burndown.campaign.g, update-astro, workflow-skill-extractorReview Semantic Function Refactoring cost ($1.29/run, 59 turns)
At this rate (~$5.16/month), consider whether the scope per run could be narrowed. 59 turns is high — could benefit from a more focused initial context to reduce exploration turns.
Add AI Moderator PR-closed guard
Add a pre-flight check: if the triggering PR is already closed/merged, skip the agent run rather than fail on branch checkout.
Low Priority
Engine Distribution
Copilot dominates volume; Claude leads on quality-per-run (100% + low error rate). Codex reliable for moderation use case.
Actions Taken This Run
References:
Beta Was this translation helpful? Give feedback.
All reactions