[prompt-clustering] Copilot Agent Prompt Clustering Analysis - 2025-12-13 #6369

2025-12-13T19:21:28Z

github-actions[bot]
bot Dec 13, 2025

Daily NLP-based clustering analysis of copilot agent task prompts to identify patterns, opportunities for optimization, and insights into agent performance.

Executive Summary

Analyzed 1,943 copilot-created PRs from the last 30 days using advanced NLP clustering techniques (TF-IDF + K-means). The analysis identified 7 distinct task clusters with an overall success rate of 74.3% (1,444 merged PRs).

Key Findings:

Most Common: Update tasks (36.4% of all tasks)
Highest Success: Agentic workflow tasks (78.3% merge rate)
Most Complex: Safe-output tasks (avg 31.4 files changed, 4,014 lines added)
Average Task: 18.2 files changed, 3.7 commits

Cluster Overview

1. Update Tasks (Cluster 6) - 36.4%

708 tasks | 74.7% success rate

The largest cluster focuses on general updates, fixes, and additions across the repository.

Keywords: update, agent, add, github, fix
Metrics: 19.6 files/task, 979 lines added, 3.7 commits
Characteristics: Broad maintenance and enhancement tasks
Examples:
- Recompile workflows
- Fix TypeScript errors
- Update configurations

2. CLI/Command Tasks (Cluster 3) - 19.0%

370 tasks | 73.2% success rate

Tasks related to the gh-aw CLI tool, command-line functionality, and workflow management.

Keywords: gh, aw, gh aw, issue, workflow
Metrics: 13.1 files/task, 494 lines added, 3.1 commits
Characteristics: CLI feature additions, flag support, command enhancements
Examples:
- Add --ref flag to gh aw status
- Enable GH_DEBUG for gh commands
- Document workflow features

3. Agentic Workflow Tasks (Cluster 2) - 12.1%

235 tasks | 78.3% success rate ⭐ Highest Success Rate

Tasks involving agentic workflow creation, modification, and management.

Keywords: agentic, workflow, workflows, update, add
Metrics: 12.1 files/task, 1,496 lines added, 3.7 commits
Characteristics: Workflow creation and management with high success
Notable: Best performing cluster despite moderate complexity
Examples:
- Add workflow_dispatch triggers
- Recompile agentic workflows
- Remove obsolete workflows

4. Package/Compiler Tasks (Cluster 1) - 11.6%

226 tasks | 77.0% success rate

Tasks related to the pkg/ directory, workflow compilation, and validation logic.

Keywords: pkg, pkg workflow, workflow, code, validation
Metrics: 13.7 files/task, 988 lines added, 3.6 commits
Characteristics: Core compiler and package functionality
Examples:
- Extend action SHA pinning
- Add validation for workflows
- Refactor compiler code

5. Safe-Output Tasks (Cluster 4) - 7.9%

154 tasks | 67.5% success rate 🔥 Most Complex

Tasks involving safe-output mechanisms, security boundaries, and output handling.

Keywords: safe, output, add, outputs, issue
Metrics: 31.4 files/task, 4,014 lines added, 4.7 commits
Characteristics: Highly complex, touches many files, below-average success
Challenge: Complexity impacts success rate
Examples:
- Add safe-output types
- Add safe-inputs documentation
- Set default modes

6. MCP Server Tasks (Cluster 5) - 6.9%

134 tasks | 68.7% success rate

Tasks related to MCP (Model Context Protocol) servers, tools, and integrations.

Keywords: mcp, server, tools, tool, add
Metrics: 21.7 files/task, 2,526 lines added, 4.3 commits
Characteristics: Integration work, server configuration, tool additions
Examples:
- Upgrade MCP server versions
- Add MCP capabilities
- Configure server timeouts

7. Version/Release Tasks (Cluster 0) - 6.0%

116 tasks | 77.6% success rate

Tasks focused on versioning, releases, changesets, and CLI updates.

Keywords: version, release, cli, changes, package
Metrics: 25.3 files/task, 366 lines added, 3.0 commits
Characteristics: Version management with high success despite touching many files
Examples:
- Update CLI versions
- Manage changesets
- Configure version checkers

Full Success Metrics Table

Cluster	Type	Tasks	Success Rate	Avg Files	Avg Lines	Avg Commits
2	Agentic	235	78.3%	12.1	1,496	3.7
0	Version	116	77.6%	25.3	366	3.0
1	Package	226	77.0%	13.7	988	3.6
6	Update	708	74.7%	19.6	979	3.7
3	CLI	370	73.2%	13.1	494	3.1
5	MCP	134	68.7%	21.7	2,526	4.3
4	Safe-Output	154	67.5%	31.4	4,014	4.7

Key Insights

1. Task Complexity vs Success Rate

There's a clear inverse relationship between task complexity and success rate:

Simple tasks (agentic, version): High success (77-78%)
Medium tasks (update, CLI, package): Good success (73-77%)
Complex tasks (MCP, safe-output): Lower success (67-69%)

Safe-output tasks are particularly challenging:

Touch 31.4 files on average (vs 18.2 overall)
Add 4,014 lines on average (vs 1,184 overall)
Require 4.7 commits (vs 3.7 overall)
Only 67.5% success rate

2. Agentic Workflow Tasks Perform Best

Despite moderate complexity, agentic workflow tasks achieve the highest success rate (78.3%):

Well-defined patterns and templates
Clear structure and conventions
Strong existing examples in codebase
Agent familiarity with workflow syntax

Opportunity: Use agentic workflow tasks as a template for other task types.

3. Task Distribution Insights

Top 3 clusters account for 67.5% of all tasks
Update tasks dominate at 36.4% (general maintenance burden)
Specialized tasks (MCP, safe-output) represent only 14.8%

4. Moderate Task Scope is Optimal

Tasks with 12-20 files and 3-4 commits perform best:

Cluster 2 (agentic): 12.1 files, 3.7 commits → 78.3% success
Cluster 1 (package): 13.7 files, 3.6 commits → 77.0% success

Tasks exceeding this scope see declining success rates.

Recommendations

1. 🎯 Focus on Strengths

Prioritize agentic workflow tasks for critical work - they have the highest success rate (78.3%). Consider using similar patterns for other task types.

2. 🔧 Improve Complex Task Support

Safe-output and MCP tasks need better support:

Break down into smaller subtasks
Provide more context and examples upfront
Consider increasing turn limits for these categories
Create better documentation and patterns

3. 📊 Task Scope Management

Limit task scope to 12-20 files when possible:

Tasks touching >30 files have lower success rates
Consider splitting large tasks into phases
Use incremental approaches for complex changes

4. 📚 Documentation & Examples

Top-performing clusters benefit from clear patterns:

Agentic workflows have strong conventions → 78.3% success
Version tasks follow established patterns → 77.6% success
Action: Create similar pattern libraries for lower-performing categories

5. 🔍 Monitor Update Task Volume

Update tasks represent 36.4% of all work:

Consider if this indicates high maintenance burden
Look for opportunities to automate common update patterns
May indicate need for better initial implementations

6. 🚀 Optimize for 3-4 Commits

Most successful tasks take 3-4 commits:

Cluster 2: 3.7 commits, 78.3% success
Cluster 1: 3.6 commits, 77.0% success
Action: Calibrate turn limits and guidance to target this range

Visualizations

The analysis generated comprehensive visualizations showing:

Cluster size distribution
Success rate by cluster
Average files changed by cluster
PCA visualization of task similarity

Methodology

Data Sources:

1,943 copilot-created PRs with valid task prompts
Full PR data including body, title, comments, reviews, commits, and files
Analysis period: Last 30 days

NLP Techniques:

Text Preprocessing: Extracted task prompts from PR bodies, cleaned markdown and formatting
Feature Extraction: TF-IDF vectorization with 1-3 gram features (100 features max)
Clustering: K-means with optimal k=7 (determined via elbow method + silhouette score)
Validation: Manual review of cluster coherence and representative examples

Tools:

Python 3.12 with scikit-learn, pandas, matplotlib, seaborn
TF-IDF for semantic similarity
K-means for cluster assignment
PCA for dimensionality reduction and visualization

Data Access

Full analysis artifacts available in workflow run artifacts:

clustering-report.md - Complete report
clustered-prs.json - All PRs with cluster assignments
cluster-summary.json - Cluster statistics and metrics
cluster_analysis.png - Visualization charts

Analysis Date: 2025-12-13
Workflow Run: §20196505612
Generated by: Prompt Clustering Analysis Agent

AI generated by Copilot Agent Prompt Clustering Analysis

2025-12-14T19:22:13Z

github-actions[bot]
bot Dec 14, 2025
Author

⚓ Avast! This discussion be marked as outdated by Copilot Agent Prompt Clustering Analysis.
🗺️ A newer treasure map awaits ye at Discussion #6449.
Fair winds, matey! 🏴‍☠️

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prompt-clustering] Copilot Agent Prompt Clustering Analysis - 2025-12-13 #6369

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[prompt-clustering] Copilot Agent Prompt Clustering Analysis - 2025-12-13 #6369

Uh oh!

github-actions[bot] bot Dec 13, 2025

Executive Summary

Cluster Overview

1. Update Tasks (Cluster 6) - 36.4%

2. CLI/Command Tasks (Cluster 3) - 19.0%

3. Agentic Workflow Tasks (Cluster 2) - 12.1%

4. Package/Compiler Tasks (Cluster 1) - 11.6%

5. Safe-Output Tasks (Cluster 4) - 7.9%

6. MCP Server Tasks (Cluster 5) - 6.9%

7. Version/Release Tasks (Cluster 0) - 6.0%

Key Insights

1. Task Complexity vs Success Rate

2. Agentic Workflow Tasks Perform Best

3. Task Distribution Insights

4. Moderate Task Scope is Optimal

Recommendations

1. 🎯 Focus on Strengths

2. 🔧 Improve Complex Task Support

3. 📊 Task Scope Management

4. 📚 Documentation & Examples

5. 🔍 Monitor Update Task Volume

6. 🚀 Optimize for 3-4 Commits

Visualizations

Methodology

Data Access

Replies: 1 comment

Uh oh!

github-actions[bot] bot Dec 14, 2025 Author

github-actions[bot]
bot Dec 13, 2025

github-actions[bot]
bot Dec 14, 2025
Author