[prompt-clustering] Copilot Agent Prompt Clustering Analysis - Dec 3, 2025 #5453

2025-12-03T19:29:45Z

github-actions[bot]
bot Dec 3, 2025

🔬 Copilot Agent Prompt Clustering Analysis - December 3, 2025

Daily NLP-based clustering analysis of copilot agent task prompts from the last 30 days.

Executive Summary

This analysis examined 1,392 copilot agent tasks from GitHub Pull Requests using advanced NLP clustering techniques. The analysis identified 7 distinct task clusters with varying success rates, complexity metrics, and characteristics. The overall success rate (merged PRs) is 75.4%, with significant variation across task types.

Key Findings:

Cluster 2 (Code Quality) has the highest success rate at 82.6% and focuses on refactoring and duplicate code elimination
Cluster 4 (MCP/Infrastructure) has the lowest success rate at 65.3% and involves complex MCP server and safe-output tasks
Cluster 0 (General Updates) is the largest cluster containing 41.7% of all tasks
Tasks with clear, focused objectives (documentation, code cleanup) perform better than infrastructure/tooling tasks
Average task complexity: 16.3 files changed, 689 character prompts

Full Analysis Report

Analysis Methodology

Data Collection

Source: GitHub Pull Requests created by copilot agent
Time Period: Last 30 days
Total PRs Analyzed: 1,508 PR files
Valid Task Prompts Extracted: 1,392 (92.3% success rate)
Data Extraction: Parsed "Original prompt" sections from PR bodies

NLP Techniques Applied

Text Preprocessing: Removed markdown formatting, code blocks, URLs, and special characters
Vectorization: TF-IDF with 200 features, 1-3 word n-grams
Clustering Algorithm: K-means with silhouette score optimization
Dimensionality Reduction: PCA for visualization
Optimal Clusters: 7 (determined via elbow method and silhouette analysis)

Cluster Analysis

Cluster 0: General Updates & Enhancements (41.7%)

Size: 580 tasks | Success Rate: 73.8% | Complexity: Medium-High

Characteristics:

Largest and most diverse cluster
General updates, additions, and modifications to existing code
Includes firewall updates, agent improvements, file modifications
Average files changed: 18.8 (highest across all clusters)
Average additions: 853 lines

Top Keywords: update, add, agent, firewall, file, use, make, remove

Representative Tasks:

Migrate JavaScript memory server to Wasm component
Update frontmatter imports documentation
Add daily workflow for downloading and analyzing logs

Insights:

This cluster represents the "bread and butter" of copilot tasks
Lower success rate suggests these tasks often involve unexpected complexity
High file change count indicates broad impact across codebase

Cluster 5: Repository-Specific Tasks (20.7%)

Size: 288 tasks | Success Rate: 79.2% | Complexity: Medium

Characteristics:

Second largest cluster
Tasks specifically mentioning githubnext/gh repository
Issue-driven work with clear acceptance criteria
Often includes structured issue templates with comments section
Average files changed: 9.6 (lowest across major clusters)
Average additions: 394 lines

Top Keywords: githubnext gh, githubnext, gh, comments, files, issuetitle, functions, validation

Representative Tasks:

Fix bash tool type coercion security bug
Investigate githubActionsStep dead code
Phase 1: Design MCP rendering consolidation architecture

Insights:

Above-average success rate (79.2%) due to well-structured prompts
Issue templates with clear objectives lead to better outcomes
Smaller scope (fewer files changed) correlates with higher success

Cluster 3: Agentic Workflow Tasks (13.4%)

Size: 187 tasks | Success Rate: 77.5% | Complexity: High

Characteristics:

Focus on creating and modifying agentic workflows
Shared workflow components and daily automation
Often involves GitHub integrations and tool configurations
Average additions: 1,680 lines (highest across all clusters)

Top Keywords: agentic, workflow, agentic workflow, daily, shared, github, update, add

Representative Tasks:

Add shared workflow for checking existing PRs before creating duplicates
Add firewall version of Tavily agentic workflow
Verify tools configuration in daily-firewall-report workflow

Insights:

High line additions indicate these are substantial feature implementations
Success rate of 77.5% shows these complex tasks are manageable for the agent
Workflow creation benefits from established patterns in the repository

Cluster 1: Issue-Driven Development (9.3%)

Size: 129 tasks | Success Rate: 77.5% | Complexity: Medium-High

Characteristics:

Tasks derived from GitHub issues with structured templates
Includes CLI updates, version management, documentation
Issue comments provide additional context and guidance
Average files changed: 16.3

Top Keywords: comments, cli, issue, issuetitle, version, section, issuedescription, copilot

Representative Tasks:

Fix push to pull request folder error
Add versioning support to agentic workflow documentation
Various CLI and issue-template related improvements

Insights:

Issue template structure aids agent understanding
Comments from maintainers help guide implementation
Similar success rate to Cluster 3 suggests good prompt quality

Cluster 4: MCP & Infrastructure (8.9%)

Size: 124 tasks | Success Rate: 65.3% | Complexity: Very High

Characteristics:

MCP server development and safe-output tooling
Infrastructure and tooling improvements
Most complex cluster by file changes (31.5 avg) and comments (3.5 avg)
Lowest success rate across all clusters

Top Keywords: mcp, safe, output, safe output, server, mcp server, add, tool

Representative Tasks:

Investigate and reduce compile command verbose output
Add MCP proxy command with filtering and logging
Add validation for non-empty patches in create_pull_request

Insights:

Infrastructure tasks are inherently more challenging
High comment count (3.5 avg) indicates more back-and-forth needed
65.3% success rate suggests these tasks often require human intervention
Complex architecture understanding needed for MCP work

Cluster 2: Code Quality & Refactoring (3.3%)

Size: 46 tasks | Success Rate: 82.6% | Complexity: Medium-High

Characteristics:

Highest success rate across all clusters
Focus on duplicate code detection and refactoring
Code quality improvements and helper extraction
Often generated by automated code analysis (Serena)

Top Keywords: code, duplicate code, duplicate, analysis, refactoring, duplication, helper, commit

Representative Tasks:

Refactor duplicate staged mode preview logic
Extract shared MCP config renderer to eliminate duplication
Add getValidReactions function to compute valid reaction list dynamically

Insights:

Well-defined, focused refactoring tasks have highest success rates
Automated detection provides clear starting point
Agent excels at pattern-based code transformations
Smaller cluster size suggests these are specialized tasks

Cluster 6: Bug Fixes & Testing (2.7%)

Size: 38 tasks | Success Rate: 78.9% | Complexity: Low

Characteristics:

Smallest cluster
Quick fixes, test corrections, JavaScript issues
Low complexity: 8.9 files changed, 168 lines added (lowest)
Short, direct prompts

Top Keywords: fix, tests, javascript, test, issues, workflows, error, agentic

Representative Tasks:

Investigate and fix warning about working-directory in codgen
Format lint fix tests
Fix JavaScript tests

Insights:

Small, focused fixes perform well (78.9% success)
Low file count indicates surgical changes
Simple prompts work well for well-scoped problems

Success Rate Analysis

Success Rate by Cluster (Sorted by Rate)

Rank	Cluster	Tasks	Success Rate	Avg Files	Task Type
1	Cluster 2	46	82.6%	19.1	Code Quality & Refactoring
2	Cluster 5	288	79.2%	9.6	Repository-Specific Tasks
3	Cluster 6	38	78.9%	8.9	Bug Fixes & Testing
4	Cluster 1	129	77.5%	16.3	Issue-Driven Development
5	Cluster 3	187	77.5%	9.4	Agentic Workflow Tasks
6	Cluster 0	580	73.8%	18.8	General Updates & Enhancements
7	Cluster 4	124	65.3%	31.5	MCP & Infrastructure

Overall Average: 75.4%

Key Observations

Focused tasks outperform broad tasks: Clusters 2 and 6 (specific refactoring and fixes) have highest success rates
Infrastructure complexity is a challenge: Cluster 4 (MCP/infrastructure) significantly underperforms
Structured prompts help: Clusters 1 and 5 with issue templates show above-average success
Scope matters: Clusters with fewer file changes tend to have higher success rates

Complexity Metrics Analysis

Files Changed vs Success Rate

Cluster	Avg Files Changed	Success Rate	Correlation
Cluster 6	8.9	78.9%	Low complexity → High success
Cluster 5	9.6	79.2%	Low complexity → High success
Cluster 3	9.4	77.5%	Low complexity → High success
Cluster 1	16.3	77.5%	Medium complexity
Cluster 0	18.8	73.8%	Medium-high complexity
Cluster 2	19.1	82.6%	Exception: High success despite complexity
Cluster 4	31.5	65.3%	High complexity → Low success

Finding: There's generally an inverse correlation between files changed and success rate, with Cluster 2 (refactoring) as a notable exception where focused objectives overcome complexity.

Comments Count (Interaction Metric)

Cluster	Avg Comments	Success Rate	Insight
Cluster 4	3.5	65.3%	High interaction → struggling tasks
Cluster 3	1.9	77.5%	Moderate interaction → normal feedback
Cluster 0	1.7	73.8%	Moderate interaction
Cluster 1	1.3	77.5%	Lower interaction → clearer tasks
Cluster 2	1.3	82.6%	Lower interaction → clearer tasks
Cluster 6	1.0	78.9%	Minimal interaction → straightforward
Cluster 5	1.0	79.2%	Minimal interaction → straightforward

Finding: Fewer comments correlate with higher success rates, suggesting that well-specified tasks require less clarification.

Key Findings

1. Task Type Significantly Impacts Success

Refactoring and code quality tasks (Cluster 2) achieve 82.6% success, while infrastructure tasks (Cluster 4) only reach 65.3%. This 17.3 percentage point gap highlights the importance of task categorization.

2. Structured Prompts Drive Better Outcomes

Clusters 1 and 5, which use structured issue templates with clear sections ((issue_title), (issue_description), (comments)), show above-average success rates of 77.5% and 79.2% respectively.

3. Complexity is Manageable for Focused Tasks

Cluster 2 demonstrates that even complex tasks (19.1 files changed) can succeed when objectives are specific and well-defined. The agent handles large-scale refactoring effectively when given clear patterns to follow.

4. Infrastructure Work Needs More Support

Cluster 4 (MCP & Infrastructure) shows:

Lowest success rate: 65.3%
Highest file changes: 31.5 files
Highest comment count: 3.5 (most back-and-forth)
Suggests these tasks may need better documentation or more specialized tooling

5. Small Scope Correlates with Success

Clusters 5 and 6 with the fewest file changes (9.6 and 8.9 respectively) achieve 79.2% and 78.9% success rates. Breaking large tasks into smaller pieces likely improves outcomes.

Recommendations

Based on the clustering analysis, here are actionable improvements for copilot agent task design:

1. Adopt Structured Issue Templates for All Tasks

Why: Clusters 1 and 5 with structured templates show 77-79% success vs. 73.8% for general tasks.

Action:

Create standard template with sections: (issue_title), (issue_description), (acceptance_criteria)
Include (comments) section for maintainer guidance
Use consistent formatting across all copilot tasks

Expected Impact: +3-5% success rate improvement

2. Break Down Infrastructure Tasks

Why: Cluster 4 (MCP/Infrastructure) has only 65.3% success due to complexity.

Action:

Split large MCP/infrastructure tasks into smaller sub-tasks
Create "design phase" tasks separate from "implementation phase"
Provide more detailed technical context and architecture diagrams
Consider requiring human review at intermediate checkpoints

Expected Impact: +10-15% success rate improvement for infrastructure tasks

3. Prioritize Refactoring and Code Quality Tasks

Why: Cluster 2 has highest success rate (82.6%) and clear value proposition.

Action:

Increase automated code analysis (Serena) to generate more refactoring tasks
Create systematic approach to duplicate code detection
Template refactoring prompts with "before/after" examples
These tasks provide high value with low risk

Expected Impact: More high-confidence task assignments, improved codebase quality

4. Optimize General Update Tasks (Cluster 0)

Why: Largest cluster (41.7%) with below-average success (73.8%) - high impact opportunity.

Action:

Add more specificity to general update prompts
Break "update" tasks into "add", "modify", "remove" categories
Provide clear acceptance criteria even for simple updates
Consider if some "general updates" should be recategorized

Expected Impact: +2-4% overall success rate (due to large cluster size)

5. Standardize Prompt Length and Detail Level

Why: Current average prompt length is 689 characters with high variance.

Action:

Establish minimum prompt detail requirements (~500 chars)
Include context about WHY the change is needed, not just WHAT
Provide links to relevant documentation or similar PRs
Use bullet points for multiple requirements

Expected Impact: More consistent task understanding, fewer clarifying comments needed

6. Create Task Type Guidelines

Why: Success rates vary dramatically by task type (65% to 83%).

Action:

Publish internal guide on task types with success rate expectations
Route high-complexity infrastructure tasks through additional review
Fast-track refactoring and small fix tasks
Set appropriate timeout and turn limits per cluster type

Expected Impact: Better task routing, appropriate resource allocation

Conclusion

This clustering analysis reveals significant patterns in copilot agent task types and success factors. The 7 distinct clusters identified show that:

Task type matters more than size: Focused refactoring (82.6%) outperforms general updates (73.8%)
Structure aids success: Issue templates and clear acceptance criteria correlate with better outcomes
Infrastructure is challenging: MCP/tooling tasks need additional support to reach success parity
Small scope wins: Fewer file changes generally means higher success rates

By implementing the recommendations above—especially standardizing prompts and breaking down complex tasks—we can target a 5-10% overall improvement in success rates, with even larger gains for infrastructure work.

The analysis establishes a baseline for ongoing monitoring and demonstrates the value of data-driven task assignment strategies.

Analysis Period: Last 30 days
Total Tasks: 1,392
Clusters: 7
Overall Success Rate: 75.4%
Generated: 2025-12-03

Analysis performed using NLP clustering (TF-IDF + K-means) on copilot agent task prompts

AI generated by Copilot Agent Prompt Clustering Analysis

2025-12-04T19:26:22Z

github-actions[bot]
bot Dec 4, 2025
Author

⚓ Avast! This discussion be marked as outdated by Copilot Agent Prompt Clustering Analysis.
🗺️ A newer treasure map awaits ye at Discussion #5566.
Fair winds, matey! 🏴‍☠️

0 replies

[prompt-clustering] Copilot Agent Prompt Clustering Analysis - Dec 3, 2025 #5453

Uh oh!

github-actions[bot] bot Dec 3, 2025

🔬 Copilot Agent Prompt Clustering Analysis - December 3, 2025

Executive Summary

Analysis Methodology

Data Collection

NLP Techniques Applied

Cluster Analysis

Cluster 0: General Updates & Enhancements (41.7%)

Cluster 5: Repository-Specific Tasks (20.7%)

Cluster 3: Agentic Workflow Tasks (13.4%)

Cluster 1: Issue-Driven Development (9.3%)

Cluster 4: MCP & Infrastructure (8.9%)

Cluster 2: Code Quality & Refactoring (3.3%)

Cluster 6: Bug Fixes & Testing (2.7%)

Success Rate Analysis

Success Rate by Cluster (Sorted by Rate)

Key Observations

Complexity Metrics Analysis

Files Changed vs Success Rate

Comments Count (Interaction Metric)

Key Findings

1. Task Type Significantly Impacts Success

2. Structured Prompts Drive Better Outcomes

3. Complexity is Manageable for Focused Tasks

4. Infrastructure Work Needs More Support

5. Small Scope Correlates with Success

Recommendations

1. Adopt Structured Issue Templates for All Tasks

2. Break Down Infrastructure Tasks

3. Prioritize Refactoring and Code Quality Tasks

4. Optimize General Update Tasks (Cluster 0)

5. Standardize Prompt Length and Detail Level

6. Create Task Type Guidelines

Conclusion

Replies: 1 comment

Uh oh!

github-actions[bot] bot Dec 4, 2025 Author

github-actions[bot]
bot Dec 3, 2025

github-actions[bot]
bot Dec 4, 2025
Author