[prompt-clustering] Copilot Agent Prompt Clustering Analysis - December 2025 #6553

2025-12-15T19:30:42Z

github-actions[bot]
bot Dec 15, 2025

Daily NLP-based clustering analysis of 951 copilot agent task prompts from the last 30 days reveals 10 distinct work patterns and actionable insights for improving agent performance.

Summary

Analysis of 951 copilot agent tasks identified 10 distinct clusters of work patterns with an overall success rate of 77.0% (732/951 PRs merged). The most common task pattern is bugfix tasks (350 tasks, 78.9% success rate).

Key Highlights:

Highest performing cluster: Cluster 10 (Bugfix - Test Focus) with 87.0% success rate
Most common pattern: Bugfix tasks account for 36.8% of all work
Average task complexity: 13.9 files changed, 664.5 lines added per task
Complexity range: Tasks vary from 6.4 to 19.8 files changed on average

Cluster Overview

Cluster	Category	Tasks	Success Rate	Avg Files	Avg Lines	Top Keywords
1	Bugfix	350	78.9%	13.0	403	section details original, section details, issue resolve
2	Update	139	79.1%	19.6	585	update, error, docs
3	Feature	111	73.9%	15.8	823	code, make, run
4	Update	98	84.7%	9.0	1131	agentic, workflow, agentic workflow
5	Feature	72	69.4%	16.9	519	add, command, json
6	Bugfix	58	72.4%	6.4	292	issue_title, workflow, summary
7	Feature	39	79.5%	12.9	1950	pull, pull request, request
8	Feature	36	58.3%	11.9	1536	firewall, add, logs
9	Update	25	68.0%	19.8	605	cli, version, copilot
10	Bugfix	23	87.0%	12.5	289	fix, tests, test

Full Analysis Report

Detailed Cluster Analysis

Cluster 1: Bugfix Tasks (Largest Cluster)

Size: 350 tasks (36.8% of total)

Outcomes:

Merged: 276 (78.9%)
Closed: 74
Open: 0

Complexity Metrics:

Avg files changed: 13.0
Avg lines added: 402.9
Avg commits: 3.3
Avg review comments: 1.2

Characteristics: This is the dominant cluster, representing more than one-third of all agent tasks. These are primarily issue-driven bugfixes with standardized prompt formatting (e.g., "This section details on the original issue you should resolve"). The success rate is solid at 78.9%, and complexity is moderate with an average of 13 files changed.

Top Keywords: section details original, section details, issue resolve, original issue resolve, details original issue

Example Tasks:

#3779: Implement golden file testing for compiler output validation
#3505: Add create-commit-status safe output type with pending/final status lifecycle

Cluster 2: Update Tasks (High Complexity)

Size: 139 tasks (14.6% of total)

Outcomes:

Merged: 110 (79.1%)
Closed: 29

Complexity Metrics:

Avg files changed: 19.6 (highest complexity)
Avg lines added: 585.0
Avg commits: 3.9
Avg review comments: 1.7

Characteristics: This cluster shows the highest complexity with 19.6 files changed on average. These are substantial update tasks involving documentation, error handling, and JavaScript modifications. Despite high complexity, the success rate remains strong at 79.1%.

Top Keywords: update, error, docs, file, javascript, md, actions, script

Example Tasks:

#3917: Replace activation job checkout with GitHub API for timestamp checking
#2173: Add internal compiler check for secret redaction step ordering

Cluster 3: Feature Tasks (General)

Size: 111 tasks (11.7% of total)

Outcomes:

Merged: 82 (73.9%)
Closed: 29

Complexity Metrics:

Avg files changed: 15.8
Avg lines added: 822.7
Avg commits: 4.1
Avg review comments: 2.0

Characteristics: General feature development tasks with moderate complexity and slightly lower success rate (73.9%). These tasks involve making code work with GitHub and Copilot integrations, requiring more iterations (4.1 commits on average).

Top Keywords: code, make, run, agent, use, github, copilot, sure

Example Tasks:

#2253: Add OIDC authentication with API key fallback
#2400: Fix patch generation when agent provides non-existent branch name

Cluster 4: Agentic Workflow Tasks (Best Success Rate)

Size: 98 tasks (10.3% of total)

Outcomes:

Merged: 83 (84.7%)
Closed: 15

Complexity Metrics:

Avg files changed: 9.0 (lowest complexity)
Avg lines added: 1131.2
Avg commits: 3.8
Avg review comments: 2.1

Characteristics: This cluster achieves the second-highest success rate (84.7%) and lowest complexity (9.0 files). These tasks focus on creating and updating agentic workflows themselves - a meta-pattern where the agent works on workflow automation. The high success rate suggests clear, well-scoped tasks.

Top Keywords: agentic, workflow, agentic workflow, workflows, agentic workflows, update, file, daily

Example Tasks:

#2813: Add Ollama Llama Guard 3 threat scanning for safe outputs
#3507: Add Weekly MCP Registry Insights workflow with live MCP server integration

Cluster 5: MCP & Command Feature Tasks

Size: 72 tasks (7.6% of total)

Outcomes:

Merged: 50 (69.4%)
Closed: 22

Complexity Metrics:

Avg files changed: 16.9
Avg lines added: 519.2
Avg commits: 4.3
Avg review comments: 2.0

Characteristics: Tasks focused on adding commands, MCP server integrations, and compilation features. Lower success rate (69.4%) suggests these are more challenging tasks requiring domain-specific knowledge.

Top Keywords: add, command, json, mcp, support, server, mcp server, compile

Example Tasks:

#2345: Add --dependabot flag to generate npm package manifests
#2268: Use --additional-mcp-config CLI argument for Copilot engine

Cluster 6: Small Bugfix Tasks

Size: 58 tasks (6.1% of total)

Outcomes:

Merged: 42 (72.4%)
Closed: 16

Complexity Metrics:

Avg files changed: 6.4 (second lowest)
Avg lines added: 292.1
Avg commits: 3.0
Avg review comments: 0.7

Characteristics: Smaller, focused bugfix tasks with standardized formatting (issue_title, issue_description). Lower complexity (6.4 files) and fewer review comments (0.7) indicate these are straightforward fixes.

Top Keywords: issue_title, workflow, summary, summary section details, issue_title issue_description

Example Tasks:

#3639: Add daily code metrics workflow with persistent trend tracking
#2858: Limit zizmor security report to 3 workflow examples per issue

Cluster 7: Pull Request & GitHub Integration Tasks

Size: 39 tasks (4.1% of total)

Outcomes:

Merged: 31 (79.5%)
Closed: 8

Complexity Metrics:

Avg files changed: 12.9
Avg lines added: 1949.8 (highest lines added)
Avg commits: 3.8
Avg review comments: 1.7

Characteristics: Tasks involving pull requests, discussions, and GitHub integrations. Despite adding the most lines (1949.8 on average), these tasks achieve a strong 79.5% success rate.

Top Keywords: pull, pull request, request, comment, discussion, branch, workflow, commit

Example Tasks:

#4086: Add shared github-context.md import for comprehensive GitHub invocation context
#4003: Add /cloclo command workflow with Claude engine and MCP integrations

Cluster 8: Firewall & Feature Tasks (Lowest Success Rate)

Size: 36 tasks (3.8% of total)

Outcomes:

Merged: 21 (58.3%)
Closed: 15

Complexity Metrics:

Avg files changed: 11.9
Avg lines added: 1536.2
Avg commits: 3.8
Avg review comments: 1.7

Characteristics: This cluster has the lowest success rate (58.3%). Many tasks involve enabling firewall features or adding logs - areas that may require more careful testing and validation.

Top Keywords: firewall, add, logs, workflow, feature, update, enable, field

Example Tasks:

#2430: [WIP] Add firewall feature to all agentic workflows
#2318: [WIP] Add daily test coverage improver workflow with firewall enabled

Cluster 9: CLI & Version Update Tasks

Size: 25 tasks (2.6% of total)

Outcomes:

Merged: 17 (68.0%)
Closed: 8

Complexity Metrics:

Avg files changed: 19.8 (tied for highest)
Avg lines added: 604.6
Avg commits: 3.2
Avg review comments: 1.5

Characteristics: CLI-related updates with high complexity (19.8 files changed). These tasks involve updating CLI tools, checking versions, and ensuring consistency across commands.

Top Keywords: cli, version, copilot, help, issue_title, tools, update, commands

Example Tasks:

#3659: Add daily CLI consistency checker workflow
#2303: Add copilot-cli-update-checker agentic workflow

Cluster 10: Test-Focused Bugfix Tasks (Best Performance)

Size: 23 tasks (2.4% of total)

Outcomes:

Merged: 20 (87.0%)
Closed: 3

Characteristics: The highest performing cluster with 87.0% success rate! These are bugfix tasks with strong focus on testing and JavaScript. The combination of test focus and clear scope leads to excellent outcomes.

Complexity Metrics:

Avg files changed: 12.5
Avg lines added: 288.6
Avg commits: 3.6
Avg review comments: 1.4

Top Keywords: fix, tests, test, javascript, issues, add, warning, task

Example Tasks:

#2935: Fix missing JavaScript files and implement security sanitization features
#2800: Fix test suite for GitHub MCP toolset permission validation

Key Findings

1. Bugfix Tasks Dominate Agent Work

Observation: Bugfix tasks represent 36.8% of all agent work (350 tasks), with an additional 81 bugfixes distributed across other clusters.

Insight: Nearly half of all copilot agent work involves fixing issues rather than building new features. This suggests:

The agent is frequently used for maintenance work
Issue-driven development is a primary use case
Standardized bugfix prompt templates are widely adopted

Recommendation: Continue to optimize bugfix workflows, as they represent the largest use case. Consider creating specialized sub-workflows for common bugfix patterns.

2. Success Rate Varies Significantly Across Task Types

Observation: Success rates range from 58.3% (Cluster 8 - Firewall features) to 87.0% (Cluster 10 - Test-focused bugfixes), with a standard deviation of 8.1%.

Insight: Task type significantly impacts success probability:

High success (>80%): Test-focused bugfixes, agentic workflow updates
Medium success (70-80%): General bugfixes, updates, PR integrations
Lower success (<70%): Feature development, firewall features, MCP integration

Recommendation:

For high-risk task types (firewall, MCP), provide more detailed instructions and examples
Consider breaking complex features into smaller, incremental tasks
Leverage patterns from high-performing clusters (test focus, clear scope)

3. Complexity Does Not Directly Correlate with Failure

Observation: Cluster 4 (Agentic workflows) has the lowest complexity (9.0 files) and high success rate (84.7%), but Cluster 7 (PR integrations) has high complexity (1950 lines added) and still achieves 79.5% success.

Insight: Task complexity (measured by files changed or lines added) is not a strong predictor of success. Instead, success correlates more with:

Clear scope: Well-defined tasks with specific goals
Domain fit: Tasks aligned with agent strengths (workflow automation, testing)
Prompt quality: Structured prompts with context and examples

Recommendation: Focus on task scoping and prompt engineering rather than avoiding complex tasks. Large, well-scoped tasks can succeed; small, vague tasks may fail.

4. Review Iteration Patterns Reveal Engagement

Observation: Average review comments range from 0.7 (Cluster 6 - small bugfixes) to 2.1 (Cluster 4 - agentic workflows).

Insight: Higher review engagement (more comments) doesn't necessarily indicate problems. Clusters with more review comments (Cluster 4: 2.1 comments, 84.7% success) actually perform better than those with fewer (Cluster 8: 1.7 comments, 58.3% success). This suggests active review collaboration improves outcomes.

Recommendation: Encourage iterative feedback on agent PRs. Active review engagement helps refine implementations and catch issues early.

5. Meta-Tasks Show Highest Success

Observation: Cluster 4 (agentic workflow tasks) - where the agent works on workflow automation itself - achieves 84.7% success rate with low complexity.

Insight: The agent performs best when working on domain-aligned tasks (creating/updating agentic workflows). This "meta-work" pattern shows:

Agent deeply understands workflow structure and patterns
Task requirements are often clearer in this domain
Feedback loops are tighter (workflows can be tested quickly)

Recommendation: Consider expanding agent use for meta-tasks like workflow maintenance, documentation generation, and tooling improvements where domain alignment is strongest.

Recommendations

1. Leverage High-Performing Patterns

Action: Study and replicate patterns from Cluster 10 (87.0% success) and Cluster 4 (84.7% success).

Specific steps:

Use test-focused approach for bugfixes (include test validation in prompts)
Scope agentic workflow tasks clearly with specific requirements
Provide structured prompt templates for high-success task types

Expected impact: 5-10% improvement in success rate for similar task types

2. Improve Low-Performing Task Types

Action: Target Cluster 8 (firewall features, 58.3% success) and Cluster 5 (MCP integration, 69.4% success) for improvement.

Specific steps:

Add detailed security testing checklist for firewall tasks
Provide MCP server integration examples and common patterns
Break complex feature additions into smaller, testable increments
Consider requiring proof-of-concept validation before full implementation

Expected impact: Increase success rate to 70%+ for these task types

3. Optimize for Task Clarity Over Complexity Reduction

Action: Focus prompt engineering efforts on clarity, scope, and context rather than reducing task size.

Specific steps:

Require clear acceptance criteria in prompts
Include relevant context (links to docs, similar PRs, design patterns)
Specify testing requirements upfront
Allow complex tasks when well-scoped

Expected impact: Enable successful completion of larger, more impactful tasks

4. Standardize High-Volume Task Templates

Action: Since bugfixes represent 36.8% of work, invest in optimizing bugfix prompt templates.

Specific steps:

Create category-specific bugfix templates (e.g., "JavaScript bugfix", "Workflow bugfix", "CLI bugfix")
Include standard sections: context, reproduction steps, acceptance criteria, test requirements
Collect and share successful bugfix examples

Expected impact: More consistent bugfix success across all categories

5. Establish Task-Type Routing Guidelines

Action: Help users choose appropriate task types based on success patterns.

Specific steps:

Document which task types work best for agent (tests, bugfixes, workflow updates)
Suggest when to use agent vs. manual development (e.g., experimental features may need human-first approach)
Provide task assessment checklist before agent assignment

Expected impact: Better task-agent matching, higher overall success rate

Methodology

This analysis used NLP clustering techniques on 951 copilot agent task prompts:

Data Collection: Extracted original prompts from 951 merged/closed PRs over last 30 days
Text Preprocessing: Cleaned prompts (removed markdown, normalized text, extracted key content)
Feature Extraction: TF-IDF vectorization with 200 features, n-grams (1-3), min_df=2, max_df=0.8
Clustering: K-means clustering with optimal k=10 (determined via silhouette score: 0.403)
Analysis: Extracted success metrics, complexity patterns, and category distributions per cluster

Validation: Manual inspection of sample tasks from each cluster confirmed semantic coherence and meaningful groupings.

Visualization Charts: See the workflow run artifacts for visual analysis:

Cluster Sizes Distribution
Success Rate Comparison by Cluster
Task Complexity Metrics
Task Type Distribution

Data Access: Full analysis report available at /tmp/gh-aw/pr-data/clustering-report.md

References:

#2097 - Sample bugfix task
#2813 - Sample agentic workflow task
#2935 - Sample test-focused bugfix

AI generated by Copilot Agent Prompt Clustering Analysis

2025-12-16T19:27:55Z

github-actions[bot]
bot Dec 16, 2025
Author

⚓ Avast! This discussion be marked as outdated by Copilot Agent Prompt Clustering Analysis.
🗺️ A newer treasure map awaits ye at Discussion #6662.
Fair winds, matey! 🏴‍☠️

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prompt-clustering] Copilot Agent Prompt Clustering Analysis - December 2025 #6553

Uh oh!

{{title}}

Uh oh!

Detailed Cluster Analysis

Cluster 1: Bugfix Tasks (Largest Cluster)

Cluster 2: Update Tasks (High Complexity)

Cluster 3: Feature Tasks (General)

Cluster 4: Agentic Workflow Tasks (Best Success Rate)

Cluster 5: MCP & Command Feature Tasks

Cluster 6: Small Bugfix Tasks

Cluster 7: Pull Request & GitHub Integration Tasks

Cluster 8: Firewall & Feature Tasks (Lowest Success Rate)

Cluster 9: CLI & Version Update Tasks

Cluster 10: Test-Focused Bugfix Tasks (Best Performance)

Key Findings

1. Bugfix Tasks Dominate Agent Work

2. Success Rate Varies Significantly Across Task Types

3. Complexity Does Not Directly Correlate with Failure

4. Review Iteration Patterns Reveal Engagement

5. Meta-Tasks Show Highest Success

Recommendations

1. Leverage High-Performing Patterns

2. Improve Low-Performing Task Types

3. Optimize for Task Clarity Over Complexity Reduction

4. Standardize High-Volume Task Templates

5. Establish Task-Type Routing Guidelines

Methodology

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[prompt-clustering] Copilot Agent Prompt Clustering Analysis - December 2025 #6553

Uh oh!

github-actions[bot] bot Dec 15, 2025

Summary

Cluster Overview

Detailed Cluster Analysis

Cluster 1: Bugfix Tasks (Largest Cluster)

Cluster 2: Update Tasks (High Complexity)

Cluster 3: Feature Tasks (General)

Cluster 4: Agentic Workflow Tasks (Best Success Rate)

Cluster 5: MCP & Command Feature Tasks

Cluster 6: Small Bugfix Tasks

Cluster 7: Pull Request & GitHub Integration Tasks

Cluster 8: Firewall & Feature Tasks (Lowest Success Rate)

Cluster 9: CLI & Version Update Tasks

Cluster 10: Test-Focused Bugfix Tasks (Best Performance)

Key Findings

1. Bugfix Tasks Dominate Agent Work

2. Success Rate Varies Significantly Across Task Types

3. Complexity Does Not Directly Correlate with Failure

4. Review Iteration Patterns Reveal Engagement

5. Meta-Tasks Show Highest Success

Recommendations

1. Leverage High-Performing Patterns

2. Improve Low-Performing Task Types

3. Optimize for Task Clarity Over Complexity Reduction

4. Standardize High-Volume Task Templates

5. Establish Task-Type Routing Guidelines

Methodology

Replies: 1 comment

Uh oh!

github-actions[bot] bot Dec 16, 2025 Author

github-actions[bot]
bot Dec 15, 2025

github-actions[bot]
bot Dec 16, 2025
Author