[prompt-clustering] Copilot Agent Prompt Clustering Analysis — 2026-03-31 #23775

2026-03-31T20:44:35Z

github-actions[bot]
bot Mar 31, 2026

Daily NLP clustering analysis of 1,000 copilot agent task prompts from PRs in github/gh-aw, covering 2026-01-21 → 2026-03-31.

Summary

Metric	Value
PRs Analyzed	1,000
Merged	690 (69%)
Closed (Not Merged)	306 (31%)
Open	4
Clusters Identified	9
Silhouette Score	0.091 (low overlap expected for short NLP tasks)

Cluster Overview

#	Theme	PRs	Merged	Success
A	CI Failure & Issue-Driven Repairs	221	117	53% ⚠️
B	MCP Server Updates	89	57	64%
C	Reference-Guided Fixes	74	52	70%
D	General Feature & Fix Requests	260	190	73%
E	Campaign Management	58	43	74%
F	Maintenance Workflow Logic	89	66	74%
G	Agentic Workflow Configuration	100	78	78%
H	Safe Outputs Feature Work	79	63	80% ✅
I	CI Job Failure Analysis	30	24	80% ✅

Cluster Detail: A — CI Failure & Issue-Driven Repairs (221 PRs, 53% success)

Top Terms: section, issue, gh aw, gh, aw, failure, test

Profile: Tasks that arrive as auto-generated issue reports from the CI failure doctor or deep-report workflows. The prompt contains a structured issue_title / issue_body block with a CI failure description and asks the agent to diagnose and fix it.

Why lower success rate? These tasks are inherently open-ended — the failure cause is unknown at prompt time. The agent must triage the log, identify the root cause, and produce a fix without a direct specification, making partial or incorrect fixes more likely.

Sample prompts:

"CI Failure Doctor: Test failures after PR Fix expiration detection for quoted footers and legacy format #11036 — regex pattern changes without test updates. Summary: The CI workflow failed after merging..."
"deep-report: Install Go toolchain in Daily CLI Performance Agent workflow. Description: The Daily CLI Performance Agent reports a missing Go..."

Example PRs: #11058, #11059, #11064

Cluster Detail: B — MCP Server Updates (89 PRs, 64% success)

Top Terms: mcp, server, mcp server, gateway, mcp gateway, update, version

Profile: Version-bump tasks for MCP server packages (e.g. @sentry/mcp-server, network gateway config). Also includes network configuration fixes for MCP servers.

Why lower success rate? Version bumps tend to succeed but network/gateway config changes often involve trial-and-error or network sandbox constraints that require multiple iterations.

Sample prompts:

"Run the update command and ensure that the Sentry MCP is updated. It should be upgraded to version 0.27.0..."
"Fix network configuration for MCP server time..."

Example PRs: #11050, #11082, #12664

Cluster Detail: C — Reference-Guided Fixes (74 PRs, 70% success)

Top Terms: reference, fix, debug, tests, review, workflow run

Profile: Tasks that include a Reference: section pointing to a specific workflow-run URL. The agent is expected to look at the failing run and apply a targeted fix (file path corrections, asset copies, HTTP transport setup).

Sample prompts:

"Reference: review the reference workflow-run error and update to use the new file location."
"Reference: fix the file so it copies the new JavaScript file that supports the HTTP transport into the same location..."

Example PRs: #11065, #11066, #11080

Cluster Detail: D — General Feature & Fix Requests (260 PRs, 73% success)

Top Terms: fix, update, project, add, agent, command, workflow

Profile: The largest cluster — a broad mix of feature additions, command-behavior changes, and general fixes. These are direct human-written prompts with a clear specification.

Sample prompts:

"Update the command behavior as follows: When the command is invoked without arguments, it should enter an interactive mode and prompt the user to select which agent engine to use: Copilot, Claude, or Codex..."
"There should be no more mention or distinction between active and passive campaigns anymore..."

Example PRs: #11054, #11067, #11068

Cluster Detail: E — Campaign Management (58 PRs, 74% success)

Top Terms: campaign, fix, security, project, remove, label, worker

Profile: Tasks related to the campaign orchestration system — label-based discovery, tracker-id requirements, campaign worker workflow changes, and security scanning integration.

Sample prompts:

"Don't rely on cache memory for campaign discovery but use labels. Each campaign issue/epic/worker should get a label agentic-campaign..."
"Don't require a tracker-id for campaign worker workflows..."

Example PRs: #11053, #11059, #11074

Cluster Detail: F — Maintenance Workflow Logic (89 PRs, 74% success)

Top Terms: issue, workflow, section, failure, details, maintenance, job

Profile: Changes to the agentic maintenance workflow jobs — merging close/sync jobs, adding logging, fixing validation rules, disallowing invalid YAML shorthands.

Sample prompts:

"Agentic Maintenance improvements — merge close issues and close discussions in same job — add extensive logging in close issues script..."
"We appear to allow permissions: read but this creates invalid YAML and I think we should really only allow read-all..."

Example PRs: #11053, #11060, #11069

Cluster Detail: G — Agentic Workflow Configuration (100 PRs, 78% success)

Top Terms: agentic, agentic workflows, md, workflows, workflow, template, sync

Profile: Changes to workflow .md templates, sync behaviors, auto-assignment rules, and conclusion job setup. These are well-scoped configuration tasks where the agent knows exactly what file to edit.

Sample prompts:

"Update the template used to create the parent issue for all agentic-workflow issues so that it creates a conclusion job..."
"If you update the code that checks whether workflows are in sync in the agentic maintenance workflow, and the workflows are not in sync, and the secret is available, then the workflow should automatically assign @copilot..."

Example PRs: #11050, #11053, #11054

Cluster Detail: H — Safe Outputs Feature Work (79 PRs, 80% success)

Top Terms: safe, safe outputs, outputs, safe output, project, validate, compile

Profile: Feature development and hardening of the safe-outputs subsystem — compile-time validation, better error messages, target field checks. Well-defined scope with clear success criteria.

Sample prompts:

"Improve error messages for invalid target configuration in safe outputs..."
"Validate safe-outputs target field at compile time..."

Example PRs: #11064, #11065, #11074

Cluster Detail: I — CI Job Failure Analysis (30 PRs, 80% success)

Top Terms: job, fix, implement, failing, root, analyze, logs

Profile: Smallest but most effective cluster. Prompts follow a strict template: "Fix the failing GitHub Actions workflow <name>. Analyze the workflow logs, identify the root cause, and implement a fix. Job ID: XXXXX Job URL: ...". The structured format with a direct job URL appears to significantly improve agent success.

Sample prompts:

"Fix the failing GitHub Actions workflow js. Analyze the workflow logs, identify the root cause of the failure, and implement a fix. Job ID: 61070763482 Job URL:..."

Example PRs: #11083, #11070, #11077

Key Findings

Structured prompts with direct URLs outperform open-ended issue descriptions. Cluster I (CI Job URLs, 80%) vs Cluster A (issue report prose, 53%) — a 27-percentage-point gap. When the agent can look up exactly what failed, outcomes improve dramatically.
The largest cluster is also mid-tier for success. Cluster D (General Feature & Fix, 260 PRs) at 73% — the broad "fix/update/add" category. Splitting vague prompts into more specific sub-tasks (like Cluster H "safe outputs" does) appears to help.
Issue-driven automated repairs are the hardest. Cluster A at 53% merge rate — these come from CI Failure Doctor and deep-report auto-generated prompts where the agent must both diagnose and fix unknown failures.
MCP server tasks underperform version-bump expectations. Cluster B at 64% — likely because gateway/network config tasks are mixed in with simpler version bumps. Separating these into two prompt templates could improve both.
Campaign management is maturing. Cluster E at 74% and steady across the period suggests the prompt templates for campaign tasks are well-established.

Recommendations

Adopt the CI job URL pattern more broadly. Cluster I's structured Job ID + Job URL format achieves 80% success. Apply this to Cluster A (issue-driven repairs) by enriching auto-generated issue prompts with direct workflow run URLs and job IDs wherever available.
Split MCP cluster prompts. Separate version-bump tasks (Update X to vY.Z) from network/gateway config tasks. The latter need more diagnostic context (firewall rules, allowed domains) baked into the prompt.
Invest in reference-guided prompt templates. Cluster C (70%) uses Reference: sections pointing to workflow run logs — this pattern could be extended to more task types as a standard enrichment step before dispatching to the agent.
For open-ended issue repairs, add failure categorization. Before sending to copilot, classify the CI failure type (test assertion, compile error, lint, flaky) and include the category in the prompt. This gives the agent a narrower search space.
Track turn counts per cluster. The current data lacks workflow turn-count enrichment (MCP logs not available). Joining PR timestamps with workflow run data would reveal which clusters require the most agent iterations, enabling better max_turns tuning per task type.

Full PR Data Table (first 100 of 1,000)

PR #	Title	Cluster	Outcome	Comments	Files
11050	chore: Update Sentry MCP server to 0.27.0	B	MERGED	1	3
11053	Update parent issue template for agentic-workflow failures	G	MERGED	1	2
11054	Auto-assign `@copilot` to workflow sync issues when agent token	D	MERGED	1	1
11058	Fix ephemerals tests after blockquote prefix requirement	A	MERGED	2	2
11059	Install Go toolchain in daily-cli-performance workflow	A	MERGED	1	1
11060	chore: campaign discovery via label-based approach	E	MERGED	1	4
11064	Merge maintenance jobs and add comprehensive logging	F	MERGED	1	3
11065	Disallow invalid `permissions: read` shorthand	F	MERGED	1	4
11066	Improve error messages for invalid target configuration	H	MERGED	2	3
11067	Fix safe-outputs server startup by copying tools.json	C	MERGED	1	1
11068	Add HTTP transport files to safe-outputs setup	C	MERGED	2	2
11069	Validate safe-outputs target field at compile time	H	MERGED	1	2
11070	Add interactive engine selection to init command	D	MERGED	3	5
11071	Remove active/passive campaign distinction	E	MERGED	1	8
11074	Clarify tracker-id is optional for campaign worker workflows	E	MERGED	1	2
11077	Fix static check lint error: remove redundant nil check	I	MERGED	1	1
11080	Update log message to use structured key-value pairs	D	MERGED	1	1
11082	Fix network config for MCP server	B	CLOSED	4	3
11083	Fix failing GitHub Actions workflow for JavaScript	I	MERGED	2	2

...881 more rows in full dataset

References:

Workflow Run: §23816894640
Analysis Period: 2026-01-21 → 2026-03-31 (1,000 PRs, 9 clusters, silhouette=0.091)

AI generated by Copilot Agent Prompt Clustering Analysis · history

expires on Apr 1, 2026, 8:44 PM UTC

2026-04-01T20:38:13Z

github-actions[bot]
bot Apr 1, 2026
Author

This discussion has been marked as outdated by Copilot Agent Prompt Clustering Analysis.

A newer discussion is available at Discussion #23948.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prompt-clustering] Copilot Agent Prompt Clustering Analysis — 2026-03-31 #23775

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[prompt-clustering] Copilot Agent Prompt Clustering Analysis — 2026-03-31 #23775

Uh oh!

github-actions[bot] bot Mar 31, 2026

Summary

Cluster Overview

Key Findings

Recommendations

Replies: 1 comment

Uh oh!

github-actions[bot] bot Apr 1, 2026 Author

github-actions[bot]
bot Mar 31, 2026

github-actions[bot]
bot Apr 1, 2026
Author