[prompt-clustering] Copilot Agent Prompt Clustering Analysis - December 2025 #6553
Closed
Replies: 1 comment
-
|
⚓ Avast! This discussion be marked as outdated by Copilot Agent Prompt Clustering Analysis. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Daily NLP-based clustering analysis of 951 copilot agent task prompts from the last 30 days reveals 10 distinct work patterns and actionable insights for improving agent performance.
Summary
Analysis of 951 copilot agent tasks identified 10 distinct clusters of work patterns with an overall success rate of 77.0% (732/951 PRs merged). The most common task pattern is bugfix tasks (350 tasks, 78.9% success rate).
Key Highlights:
Cluster Overview
Full Analysis Report
Detailed Cluster Analysis
Cluster 1: Bugfix Tasks (Largest Cluster)
Size: 350 tasks (36.8% of total)
Outcomes:
Complexity Metrics:
Characteristics: This is the dominant cluster, representing more than one-third of all agent tasks. These are primarily issue-driven bugfixes with standardized prompt formatting (e.g., "This section details on the original issue you should resolve"). The success rate is solid at 78.9%, and complexity is moderate with an average of 13 files changed.
Top Keywords: section details original, section details, issue resolve, original issue resolve, details original issue
Example Tasks:
Cluster 2: Update Tasks (High Complexity)
Size: 139 tasks (14.6% of total)
Outcomes:
Complexity Metrics:
Characteristics: This cluster shows the highest complexity with 19.6 files changed on average. These are substantial update tasks involving documentation, error handling, and JavaScript modifications. Despite high complexity, the success rate remains strong at 79.1%.
Top Keywords: update, error, docs, file, javascript, md, actions, script
Example Tasks:
Cluster 3: Feature Tasks (General)
Size: 111 tasks (11.7% of total)
Outcomes:
Complexity Metrics:
Characteristics: General feature development tasks with moderate complexity and slightly lower success rate (73.9%). These tasks involve making code work with GitHub and Copilot integrations, requiring more iterations (4.1 commits on average).
Top Keywords: code, make, run, agent, use, github, copilot, sure
Example Tasks:
Cluster 4: Agentic Workflow Tasks (Best Success Rate)
Size: 98 tasks (10.3% of total)
Outcomes:
Complexity Metrics:
Characteristics: This cluster achieves the second-highest success rate (84.7%) and lowest complexity (9.0 files). These tasks focus on creating and updating agentic workflows themselves - a meta-pattern where the agent works on workflow automation. The high success rate suggests clear, well-scoped tasks.
Top Keywords: agentic, workflow, agentic workflow, workflows, agentic workflows, update, file, daily
Example Tasks:
Cluster 5: MCP & Command Feature Tasks
Size: 72 tasks (7.6% of total)
Outcomes:
Complexity Metrics:
Characteristics: Tasks focused on adding commands, MCP server integrations, and compilation features. Lower success rate (69.4%) suggests these are more challenging tasks requiring domain-specific knowledge.
Top Keywords: add, command, json, mcp, support, server, mcp server, compile
Example Tasks:
Cluster 6: Small Bugfix Tasks
Size: 58 tasks (6.1% of total)
Outcomes:
Complexity Metrics:
Characteristics: Smaller, focused bugfix tasks with standardized formatting (issue_title, issue_description). Lower complexity (6.4 files) and fewer review comments (0.7) indicate these are straightforward fixes.
Top Keywords: issue_title, workflow, summary, summary section details, issue_title issue_description
Example Tasks:
Cluster 7: Pull Request & GitHub Integration Tasks
Size: 39 tasks (4.1% of total)
Outcomes:
Complexity Metrics:
Characteristics: Tasks involving pull requests, discussions, and GitHub integrations. Despite adding the most lines (1949.8 on average), these tasks achieve a strong 79.5% success rate.
Top Keywords: pull, pull request, request, comment, discussion, branch, workflow, commit
Example Tasks:
Cluster 8: Firewall & Feature Tasks (Lowest Success Rate)
Size: 36 tasks (3.8% of total)
Outcomes:
Complexity Metrics:
Characteristics: This cluster has the lowest success rate (58.3%). Many tasks involve enabling firewall features or adding logs - areas that may require more careful testing and validation.
Top Keywords: firewall, add, logs, workflow, feature, update, enable, field
Example Tasks:
Cluster 9: CLI & Version Update Tasks
Size: 25 tasks (2.6% of total)
Outcomes:
Complexity Metrics:
Characteristics: CLI-related updates with high complexity (19.8 files changed). These tasks involve updating CLI tools, checking versions, and ensuring consistency across commands.
Top Keywords: cli, version, copilot, help, issue_title, tools, update, commands
Example Tasks:
Cluster 10: Test-Focused Bugfix Tasks (Best Performance)
Size: 23 tasks (2.4% of total)
Outcomes:
Characteristics: The highest performing cluster with 87.0% success rate! These are bugfix tasks with strong focus on testing and JavaScript. The combination of test focus and clear scope leads to excellent outcomes.
Complexity Metrics:
Top Keywords: fix, tests, test, javascript, issues, add, warning, task
Example Tasks:
Key Findings
1. Bugfix Tasks Dominate Agent Work
Observation: Bugfix tasks represent 36.8% of all agent work (350 tasks), with an additional 81 bugfixes distributed across other clusters.
Insight: Nearly half of all copilot agent work involves fixing issues rather than building new features. This suggests:
Recommendation: Continue to optimize bugfix workflows, as they represent the largest use case. Consider creating specialized sub-workflows for common bugfix patterns.
2. Success Rate Varies Significantly Across Task Types
Observation: Success rates range from 58.3% (Cluster 8 - Firewall features) to 87.0% (Cluster 10 - Test-focused bugfixes), with a standard deviation of 8.1%.
Insight: Task type significantly impacts success probability:
Recommendation:
3. Complexity Does Not Directly Correlate with Failure
Observation: Cluster 4 (Agentic workflows) has the lowest complexity (9.0 files) and high success rate (84.7%), but Cluster 7 (PR integrations) has high complexity (1950 lines added) and still achieves 79.5% success.
Insight: Task complexity (measured by files changed or lines added) is not a strong predictor of success. Instead, success correlates more with:
Recommendation: Focus on task scoping and prompt engineering rather than avoiding complex tasks. Large, well-scoped tasks can succeed; small, vague tasks may fail.
4. Review Iteration Patterns Reveal Engagement
Observation: Average review comments range from 0.7 (Cluster 6 - small bugfixes) to 2.1 (Cluster 4 - agentic workflows).
Insight: Higher review engagement (more comments) doesn't necessarily indicate problems. Clusters with more review comments (Cluster 4: 2.1 comments, 84.7% success) actually perform better than those with fewer (Cluster 8: 1.7 comments, 58.3% success). This suggests active review collaboration improves outcomes.
Recommendation: Encourage iterative feedback on agent PRs. Active review engagement helps refine implementations and catch issues early.
5. Meta-Tasks Show Highest Success
Observation: Cluster 4 (agentic workflow tasks) - where the agent works on workflow automation itself - achieves 84.7% success rate with low complexity.
Insight: The agent performs best when working on domain-aligned tasks (creating/updating agentic workflows). This "meta-work" pattern shows:
Recommendation: Consider expanding agent use for meta-tasks like workflow maintenance, documentation generation, and tooling improvements where domain alignment is strongest.
Recommendations
1. Leverage High-Performing Patterns
Action: Study and replicate patterns from Cluster 10 (87.0% success) and Cluster 4 (84.7% success).
Specific steps:
Expected impact: 5-10% improvement in success rate for similar task types
2. Improve Low-Performing Task Types
Action: Target Cluster 8 (firewall features, 58.3% success) and Cluster 5 (MCP integration, 69.4% success) for improvement.
Specific steps:
Expected impact: Increase success rate to 70%+ for these task types
3. Optimize for Task Clarity Over Complexity Reduction
Action: Focus prompt engineering efforts on clarity, scope, and context rather than reducing task size.
Specific steps:
Expected impact: Enable successful completion of larger, more impactful tasks
4. Standardize High-Volume Task Templates
Action: Since bugfixes represent 36.8% of work, invest in optimizing bugfix prompt templates.
Specific steps:
Expected impact: More consistent bugfix success across all categories
5. Establish Task-Type Routing Guidelines
Action: Help users choose appropriate task types based on success patterns.
Specific steps:
Expected impact: Better task-agent matching, higher overall success rate
Methodology
This analysis used NLP clustering techniques on 951 copilot agent task prompts:
Validation: Manual inspection of sample tasks from each cluster confirmed semantic coherence and meaningful groupings.
Visualization Charts: See the workflow run artifacts for visual analysis:
Data Access: Full analysis report available at
/tmp/gh-aw/pr-data/clustering-report.mdReferences:
Beta Was this translation helpful? Give feedback.
All reactions