Agent Performance Report — Week of 2026-03-20 #22002

2026-03-20T17:51:06Z

github-actions[bot]
bot Mar 20, 2026

Executive Summary

The ecosystem is in a strong recovery phase following last week's dual crisis (GH_AW_GITHUB_TOKEN outage + lockdown mode failure wave). Most critical workflows are healthy again.

Metric	This Week	Last Week	Trend
Quality Score	79/100	76/100	↑ +3
Effectiveness Score	72/100	55/100	↑ +17
Health Score	66/100	40/100	↑ +26
Active Run Success Rate	91% (20/22)	~45%	↑ +46%
Total Token Usage (7d)	7.52M	—	—
Total Cost (7d)	$2.80	—	—
Active Workflows Analyzed	13	13	→

Top performers: Issue Monster, Auto-Triage Issues, The Great Escapi, Lockfile Statistics Analysis Agent
Needs attention: Issue Triage Agent (P0), Smoke Gemini (P1), Contribution Check (safe_outputs infrastructure issue)

Performance Rankings

Top Performing Agents 🏆

Issue Monster (Effectiveness: 100%, 4/4 runs)
Engine: copilot | Avg duration: 5.5m | Tokens/run: ~171K
Delivered consistent performance across all 4 schedule runs after full GH_AW_GITHUB_TOKEN recovery. High volume, reliable execution. The benchmark for copilot-engine stability.
Auto-Triage Issues (Effectiveness: 100%, 2/2 runs)
Engine: copilot | Avg duration: 3.4m | Tokens/run: ~51K
Fast, efficient event-driven triage. Low token cost, high responsiveness to issue events. Excellent resource efficiency.
The Great Escapi (Effectiveness: 100%, 1/1 run)
Engine: copilot | Duration: 3.5m | Tokens: 77K
Rated A+ on prompt injection detection last run. Best-in-class security posture, consistent operation.
Daily Safe Outputs Conformance Checker (Effectiveness: 100%, 1/1 run)
Engine: claude | Duration: 3.4m | Turns: 3 | Cost: $0.17
Most efficient Claude workflow. Concise, targeted analysis with low turn count.
Lockfile Statistics Analysis Agent (Effectiveness: 100%, 1/1 run)
Engine: claude | Duration: 8.1m | Turns: 25 | Cost: $0.87
Deep technical analysis, high turn count reflects thorough investigation. Quality output.
AI Moderator (Effectiveness: 83%, 5/6 runs)
Engine: codex | 6 event-triggered runs across PR and issue events
1 failure on Mar 20 (§23352224372) — environment issue (git checkout failure on already-deleted branch copilot/sub-pr-21993), not an agent quality problem. Operational resilience is good.

Agents Needing Improvement 📉

Contribution Check — Safe Outputs Infrastructure Failure
Run §23353982199: Agent completed successfully (LGTM on PR Fix GH_AW_AGENT_OUTPUT nested path by enforcing /tmp/gh-aw/ artifact root #21968), but the safe_outputs job failed. Root cause: pr-filter-results.json missing from pre-agent step. Agent adapted gracefully, created issue [Contribution Check Report] Contribution Check — 2026-03-20 #21996 with correct findings. Failure is in the job orchestration, not agent logic.
Recommendation: Investigate why pr-filter-results.json pre-filter step is absent in scheduled runs.
Issue Triage Agent — P0 Critical (14+ days, from Workflow Health)
100% failure rate since March 6. Pre-dates GH_AW_GITHUB_TOKEN crisis — this is an independent structural failure.
Recommendation: Dedicated investigation required. See Workflow Health Report.
Smoke Gemini — P1 High
6+ consecutive schedule failures (Mar 15–20). Last success was Mar 17. Likely Gemini API or model key issue.
Recommendation: Verify Gemini API key and model endpoint availability.

Inactive / Not Triggered This Week

Many on-demand workflows ran but were correctly skipped (Scout, /cloclo, Q, Archie, PR Nitpick Reviewer 🔍, Resource Summarizer, Workflow Craft Agent, Poem Bot, etc.). These are event-driven and only activate on matching triggers — skip behavior is expected and healthy.

Quality Analysis

Output Quality by Workflow

Workflow	Engine	Clarity	Accuracy	Completeness	Actionability	Score
Issue Monster	copilot	5	5	5	5	100
Auto-Triage Issues	copilot	5	5	4	5	95
The Great Escapi	copilot	5	5	5	4	95
Daily Repo Chronicle	copilot	5	4	5	4	90
Lockfile Stats Analysis	claude	5	5	5	4	95
Daily Safe Output Conf.	claude	5	5	4	5	95
Daily Team Evolution	claude	4	5	4	4	85
Slide Deck Maintainer	copilot	4	4	4	3	75
Semantic Function Refactoring	claude	5	5	5	3	90
Daily Safe Output Integrator	copilot	4	4	4	4	80
AI Moderator	codex	4	4	4	4	80
Contribution Check	copilot	5	5	4	5	95 (agent output)
Issue Triage Agent	copilot	—	—	—	—	0 (not running)

Ratings: 1 (poor) → 5 (excellent). Score = average × 20.

Common Quality Observations

Strengths: Most outputs are well-structured, use proper markdown headers, include concrete examples and links.
Slide Deck Maintainer: High token usage (1.2M) relative to visible output complexity. May benefit from prompt refinement to reduce context window usage.
Semantic Function Refactoring: 2.3M tokens / $1.29 per run is expensive. 59 turns indicates deep work — appropriate for the task scope but worth monitoring.

Effectiveness Analysis

Task Completion and Resource Efficiency

Workflow	Runs	Rate	Avg Duration	Tokens/Run	Cost/Run	Efficiency
Auto-Triage Issues	2	100%	3.4m	51K	$0.00	⭐⭐⭐⭐⭐
The Great Escapi	1	100%	3.5m	77K	$0.00	⭐⭐⭐⭐⭐
Daily Safe Output Conf.	1	100%	3.4m	146K	$0.17	⭐⭐⭐⭐⭐
Issue Monster	4	100%	5.5m	171K	$0.00	⭐⭐⭐⭐
Daily Safe Output Integr.	1	100%	4.2m	350K	$0.00	⭐⭐⭐⭐
Daily Team Evolution	1	100%	5.5m	252K	$0.46	⭐⭐⭐⭐
Lockfile Stats Analysis	1	100%	8.1m	1.1M	$0.87	⭐⭐⭐
Daily Repo Chronicle	1	100%	7.7m	786K	$0.00	⭐⭐⭐
Slide Deck Maintainer	1	100%	7.1m	1.2M	$0.00	⭐⭐
Semantic Function Refact.	1	100%	8.2m	2.3M	$1.29	⭐⭐
AI Moderator	6	83%	7.8m	~0*	$0.00	⭐⭐⭐
Contribution Check	1	0%†	5.9m	558K	$0.00	—

* Codex token tracking not available in current log format.
† Agent succeeded; safe_outputs job failed due to missing pre-filter artifact.

Behavioral Patterns

Productive Patterns ✅

Issue Monster → Auto-Triage coordination: Issues created by Issue Monster are immediately triaged by Auto-Triage Issues within the same event cycle.
Daily Safe Output conformance → integrator pipeline: Conformance checker validating before integrator runs — good quality gate behavior.
Recovery resilience: After last week's GH_AW_GITHUB_TOKEN crisis, Issue Monster recovered to 4/4 this week. Zero over-creation despite backlog.

Problematic Patterns ⚠️

Contribution Check pre-filter gap: The scheduled run lacks the pr-filter-results.json artifact that the event-driven flow produces. Agent handles this gracefully but the infrastructure gap causes safe_outputs failure.
AI Moderator PR branch checkout race: When PRs are merged/closed before the AI Moderator completes, checkout fails. This is an inherent race condition for PR-triggered workflows.

Coverage Analysis

Well-covered areas:

Issue triage and management (Issue Monster + Auto-Triage Issues)
Safe output quality (Conformance Checker + Integrator + Optimizer)
Code quality analysis (Semantic Function Refactoring, Contribution Check)
Repository narrative and docs (Daily Repo Chronicle, Daily Team Evolution, Slide Deck Maintainer)
Security/injection detection (The Great Escapi)

Coverage gaps:

Issue Triage Agent (P0 failure, 14+ days) — a significant triage gap
Smoke Gemini — Gemini engine coverage is a blind spot (P1 failure)

Recommendations

High Priority

Fix Issue Triage Agent structural failure (P0 — 14+ days)
Dedicate a workflow health investigation run. Root cause is independent of GH_AW_GITHUB_TOKEN. Check activation job, schedule config, and permissions.
Restore Smoke Gemini (P1)
Verify GEMINI_API_KEY secret validity and model endpoint. Consider adding API health check step to smoke tests.
Fix Contribution Check pre-filter artifact gap
The safe_outputs job fails when pr-filter-results.json is absent in scheduled runs. Add a default empty artifact or conditional step to handle schedule triggers gracefully.

Medium Priority

Recompile 14 stale lock files (P2)
make recompile needed for: blog-auditor, breaking-change-checker, copilot-cli-deep-research, daily-multi-device-docs-tester, daily-regulatory, dependabot-go-checker, discussion-task-miner, example-workflow-analyzer, jsweep, prompt-clustering-analysis, release, security-alert-burndown.campaign.g, update-astro, workflow-skill-extractor
Review Semantic Function Refactoring cost ($1.29/run, 59 turns)
At this rate (~$5.16/month), consider whether the scope per run could be narrowed. 59 turns is high — could benefit from a more focused initial context to reduce exploration turns.
Add AI Moderator PR-closed guard
Add a pre-flight check: if the triggering PR is already closed/merged, skip the agent run rather than fail on branch checkout.

Low Priority

Enable token tracking for Codex engine — AI Moderator runs show 0 tokens; blind spot in cost visibility.
Slide Deck Maintainer token optimization — 1.2M tokens per deck update seems high; review if full repo context is necessary.

Engine Distribution

Engine	Workflows	Runs (7d)	Success Rate
copilot	~8	13	92%
claude	4	4	100%
codex	1	6	83%
? (undetected)	1	1	100%

Copilot dominates volume; Claude leads on quality-per-run (100% + low error rate). Codex reliable for moderation use case.

Actions Taken This Run

Generated this performance report discussion
Identified 3 high-priority action items (Issue Triage, Smoke Gemini, Contribution Check)
Updated repo memory with current scores and status
No new improvement issues created (all issues already tracked in Workflow Health)

References:

§23354846206 — This run (Agent Performance Analyzer)
§23353982199 — Contribution Check failure (safe_outputs)
§23352224372 — AI Moderator failure (branch checkout)

Analysis period: 2026-03-13 to 2026-03-20 | Next report: 2026-03-27

AI generated by Agent Performance Analyzer - Meta-Orchestrator · history

expires on Mar 21, 2026, 5:51 PM UTC

2026-03-21T18:53:01Z

github-actions[bot]
bot Mar 21, 2026
Author

This discussion was automatically closed because it expired on 2026-03-21T17:51:05.849Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Performance Report — Week of 2026-03-20 #22002

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Agent Performance Report — Week of 2026-03-20 #22002

Uh oh!

github-actions[bot] bot Mar 20, 2026

Executive Summary

Performance Rankings

Top Performing Agents 🏆

Agents Needing Improvement 📉

Inactive / Not Triggered This Week

Quality Analysis

Effectiveness Analysis

Behavioral Patterns

Productive Patterns ✅

Problematic Patterns ⚠️

Coverage Analysis

Recommendations

High Priority

Medium Priority

Low Priority

Engine Distribution

Actions Taken This Run

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 21, 2026 Author

github-actions[bot]
bot Mar 20, 2026

github-actions[bot]
bot Mar 21, 2026
Author