[prompt-clustering] Copilot Agent Prompt Clustering Analysis — 2026-03-31 #23775
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Copilot Agent Prompt Clustering Analysis. A newer discussion is available at Discussion #23948. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Daily NLP clustering analysis of 1,000 copilot agent task prompts from PRs in github/gh-aw, covering 2026-01-21 → 2026-03-31.
Summary
Cluster Overview
Cluster Detail: A — CI Failure & Issue-Driven Repairs (221 PRs, 53% success)
Top Terms:
section,issue,gh aw,gh,aw,failure,testProfile: Tasks that arrive as auto-generated issue reports from the CI failure doctor or deep-report workflows. The prompt contains a structured
issue_title/issue_bodyblock with a CI failure description and asks the agent to diagnose and fix it.Why lower success rate? These tasks are inherently open-ended — the failure cause is unknown at prompt time. The agent must triage the log, identify the root cause, and produce a fix without a direct specification, making partial or incorrect fixes more likely.
Sample prompts:
Example PRs: #11058, #11059, #11064
Cluster Detail: B — MCP Server Updates (89 PRs, 64% success)
Top Terms:
mcp,server,mcp server,gateway,mcp gateway,update,versionProfile: Version-bump tasks for MCP server packages (e.g.
@sentry/mcp-server, network gateway config). Also includes network configuration fixes for MCP servers.Why lower success rate? Version bumps tend to succeed but network/gateway config changes often involve trial-and-error or network sandbox constraints that require multiple iterations.
Sample prompts:
Example PRs: #11050, #11082, #12664
Cluster Detail: C — Reference-Guided Fixes (74 PRs, 70% success)
Top Terms:
reference,fix,debug,tests,review,workflow runProfile: Tasks that include a
Reference:section pointing to a specific workflow-run URL. The agent is expected to look at the failing run and apply a targeted fix (file path corrections, asset copies, HTTP transport setup).Sample prompts:
Example PRs: #11065, #11066, #11080
Cluster Detail: D — General Feature & Fix Requests (260 PRs, 73% success)
Top Terms:
fix,update,project,add,agent,command,workflowProfile: The largest cluster — a broad mix of feature additions, command-behavior changes, and general fixes. These are direct human-written prompts with a clear specification.
Sample prompts:
Example PRs: #11054, #11067, #11068
Cluster Detail: E — Campaign Management (58 PRs, 74% success)
Top Terms:
campaign,fix,security,project,remove,label,workerProfile: Tasks related to the campaign orchestration system — label-based discovery, tracker-id requirements, campaign worker workflow changes, and security scanning integration.
Sample prompts:
agentic-campaign..."Example PRs: #11053, #11059, #11074
Cluster Detail: F — Maintenance Workflow Logic (89 PRs, 74% success)
Top Terms:
issue,workflow,section,failure,details,maintenance,jobProfile: Changes to the agentic maintenance workflow jobs — merging close/sync jobs, adding logging, fixing validation rules, disallowing invalid YAML shorthands.
Sample prompts:
permissions: readbut this creates invalid YAML and I think we should really only allowread-all..."Example PRs: #11053, #11060, #11069
Cluster Detail: G — Agentic Workflow Configuration (100 PRs, 78% success)
Top Terms:
agentic,agentic workflows,md,workflows,workflow,template,syncProfile: Changes to workflow
.mdtemplates, sync behaviors, auto-assignment rules, and conclusion job setup. These are well-scoped configuration tasks where the agent knows exactly what file to edit.Sample prompts:
@copilot..."Example PRs: #11050, #11053, #11054
Cluster Detail: H — Safe Outputs Feature Work (79 PRs, 80% success)
Top Terms:
safe,safe outputs,outputs,safe output,project,validate,compileProfile: Feature development and hardening of the safe-outputs subsystem — compile-time validation, better error messages, target field checks. Well-defined scope with clear success criteria.
Sample prompts:
Example PRs: #11064, #11065, #11074
Cluster Detail: I — CI Job Failure Analysis (30 PRs, 80% success)
Top Terms:
job,fix,implement,failing,root,analyze,logsProfile: Smallest but most effective cluster. Prompts follow a strict template: "Fix the failing GitHub Actions workflow
<name>. Analyze the workflow logs, identify the root cause, and implement a fix. Job ID: XXXXX Job URL: ...". The structured format with a direct job URL appears to significantly improve agent success.Sample prompts:
js. Analyze the workflow logs, identify the root cause of the failure, and implement a fix. Job ID: 61070763482 Job URL:..."Example PRs: #11083, #11070, #11077
Key Findings
Structured prompts with direct URLs outperform open-ended issue descriptions. Cluster I (CI Job URLs, 80%) vs Cluster A (issue report prose, 53%) — a 27-percentage-point gap. When the agent can look up exactly what failed, outcomes improve dramatically.
The largest cluster is also mid-tier for success. Cluster D (General Feature & Fix, 260 PRs) at 73% — the broad "fix/update/add" category. Splitting vague prompts into more specific sub-tasks (like Cluster H "safe outputs" does) appears to help.
Issue-driven automated repairs are the hardest. Cluster A at 53% merge rate — these come from
CI Failure Doctoranddeep-reportauto-generated prompts where the agent must both diagnose and fix unknown failures.MCP server tasks underperform version-bump expectations. Cluster B at 64% — likely because gateway/network config tasks are mixed in with simpler version bumps. Separating these into two prompt templates could improve both.
Campaign management is maturing. Cluster E at 74% and steady across the period suggests the prompt templates for campaign tasks are well-established.
Recommendations
Adopt the CI job URL pattern more broadly. Cluster I's structured
Job ID + Job URLformat achieves 80% success. Apply this to Cluster A (issue-driven repairs) by enriching auto-generated issue prompts with direct workflow run URLs and job IDs wherever available.Split MCP cluster prompts. Separate version-bump tasks (
Update X to vY.Z) from network/gateway config tasks. The latter need more diagnostic context (firewall rules, allowed domains) baked into the prompt.Invest in reference-guided prompt templates. Cluster C (70%) uses
Reference:sections pointing to workflow run logs — this pattern could be extended to more task types as a standard enrichment step before dispatching to the agent.For open-ended issue repairs, add failure categorization. Before sending to copilot, classify the CI failure type (test assertion, compile error, lint, flaky) and include the category in the prompt. This gives the agent a narrower search space.
Track turn counts per cluster. The current data lacks workflow turn-count enrichment (MCP logs not available). Joining PR timestamps with workflow run data would reveal which clusters require the most agent iterations, enabling better
max_turnstuning per task type.Full PR Data Table (first 100 of 1,000)
@copilotto workflow sync issues when agent tokenpermissions: readshorthand...881 more rows in full dataset
References:
Beta Was this translation helpful? Give feedback.
All reactions