[prompt-clustering] Daily Copilot Agent Prompt Clustering Analysis - Dec 8, 2025 #5900
Closed
Replies: 1 comment
-
|
⚓ Avast! This discussion be marked as outdated by Copilot Agent Prompt Clustering Analysis. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This report presents an NLP-based clustering analysis of 946 copilot agent task prompts from the past 30 days (Oct 22 - Nov 18, 2025). Using TF-IDF vectorization and K-means clustering, we identified 7 distinct task pattern clusters with an overall success rate of 77.0% (728/946 merged).
Key Findings: Function-focused refactoring tasks (Cluster 5) achieved the highest success rate at 83.3%, while tasks involving documentation updates and command improvements (Cluster 1) also performed well at 80.5%. The most common task type involves adding agent functionality and JSON configuration (Cluster 3, 29.1% of tasks). Notably, tasks requiring fewer commits correlate with higher success rates, and merged PRs average 3.7 commits compared to 3.3 for unmerged tasks.
Summary Statistics
Top Performing Clusters:
Largest Task Categories:
Cluster Visualizations
Cluster Distribution (PCA Projection)
This scatter plot shows the 7 identified clusters projected into 2D space using Principal Component Analysis.
Success Rate by Cluster
Bar chart showing merge success rates for each cluster. Cluster 5 (Functions) leads with 83.3% success.
Detailed Cluster Analysis
Cluster 5: Function Refactoring & File Operations (83.3% Success) ✅
Profile: 78 tasks (8.2%) | 3.4 avg commits | 7.3 avg files | +442 avg lines
Characteristics: Tasks focused on refactoring functions, file operations, and validation logic. These tasks show the highest success rate, likely due to their focused scope and clear objectives.
Top Keywords: functions, function, file, files, validation
Representative Examples:
Cluster 1: Documentation & Command Updates (80.5% Success) ✅
Profile: 77 tasks (8.1%) | 3.9 avg commits | 14.7 avg files | +663 avg lines
Characteristics: Tasks involving documentation updates, command improvements, and markdown file changes. High success rate indicates these are well-understood task types.
Top Keywords: update, md, command, use, documentation, workflow, agent, add
Representative Examples:
Cluster 7: Code Analysis & Deduplication (80.0% Success) ✅
Profile: 60 tasks (6.3%) | 3.7 avg commits | 18.8 avg files | +724 avg lines
Characteristics: Tasks focused on code analysis, duplicate code elimination, and shared logic extraction. Despite touching many files, these tasks maintain high success rates.
Top Keywords: code, duplicate, analysis, generated, shared, logic, files, lines
Representative Examples:
Cluster 4: Agentic Workflow Development (77.0% Success)
Profile: 139 tasks (14.7%) | 3.7 avg commits | 10.2 avg files | +1644 avg lines
Characteristics: The most complex tasks in terms of code volume, involving new agentic workflows and major feature additions. Success rate is solid despite high complexity.
Top Keywords: agentic, workflow, workflows, update, add, create, file, github
Representative Examples:
Cluster 2: Workflow Management (76.6% Success)
Profile: 201 tasks (21.2%) | 3.3 avg commits | 11.2 avg files | +349 avg lines
Characteristics: Second-largest cluster, focused on workflow configuration, gh-aw tooling, and repository management tasks.
Top Keywords: workflow, workflows, gh, gh aw, aw, githubnext
Representative Examples:
Cluster 6: Version Updates & CLI Issues (75.0% Success)
Profile: 116 tasks (12.3%) | 3.3 avg commits | 18.5 avg files | +381 avg lines
Characteristics: Tasks involving version updates, CLI improvements, and issue resolution. Despite touching many files, maintains reasonable success rate.
Top Keywords: version, cli, issue, section, copilot, update, changes, resolve
Representative Examples:
Cluster 3: Agent Configuration & JSON Management (74.5% Success)
Profile: 275 tasks (29.1%) | 4.0 avg commits | 16.3 avg files | +580 avg lines
Characteristics: The largest cluster, representing core agent functionality additions, JSON configuration, and error handling. Lower success rate may reflect the broad scope and complexity of these tasks.
Top Keywords: add, agent, run, command, copilot, error, json, file
Representative Examples:
Statistical Analysis
Cluster Performance Metrics
Success Correlations
Task Complexity Analysis
Low Complexity (< 10 files, < 500 lines):
Medium Complexity (10-20 files, 300-800 lines):
High Complexity (> 10 files, > 1000 lines):
Key Insights & Recommendations
✅ What Works Well
1. Focused Refactoring Tasks (83.3% success)
Tasks with clear, focused objectives like function refactoring and file operations show the highest success rates. These tasks typically:
Recommendation: Frame prompts with specific refactoring goals rather than broad improvement requests.
2. Documentation Updates (80.5% success)
Documentation and command improvement tasks perform exceptionally well, suggesting:
Recommendation: Continue prioritizing documentation tasks for the agent.
3. Code Analysis & Deduplication (80.0% success)
Despite touching many files, deduplication tasks succeed at high rates because:
Recommendation: Leverage the agent for code quality and refactoring workflows.
1. Large-Scale Agent Configuration Tasks (74.5% success)
The largest cluster (29.1% of tasks) has the lowest success rate. Challenges include:
Recommendation: Break large configuration tasks into smaller, focused subtasks.
2. High Code Volume Tasks
Tasks with 1000+ line changes show slightly lower success rates, though Cluster 4 (agentic workflows) performs surprisingly well at 77.0% despite 1644 avg lines.
Recommendation: For high-volume tasks, provide detailed context and examples.
💡 General Patterns
Success Factors:
Risk Factors:
🎯 Prompt Engineering Recommendations
For High Success Rates:
DO:
DON'T:
Example Transformations:
Low Success Prompt:
High Success Prompt:
Complete Task Data (Most Recent 100 PRs)
[Table truncated - Full data includes all 946 analyzed tasks]
Methodology
Analysis Pipeline
1. Data Collection
2. Text Processing
3. Feature Extraction
4. Clustering
5. Cluster Analysis
6. Visualization
Limitations
Reproducibility
All analysis code and data are available:
/tmp/gh-aw/analyze-prompts.py/tmp/gh-aw/prompt-cache/pr-full-data//tmp/gh-aw/pr-data/clustering-report.md/tmp/gh-aw/cache-memory/trending/prompt-clustering/history.jsonlReferences:
Analysis generated on 2025-12-08 19:20 UTC analyzing 946 copilot agent tasks from October-November 2025
Beta Was this translation helpful? Give feedback.
All reactions