[prompt-clustering] Daily Copilot Agent Prompt Clustering Analysis - Dec 8, 2025 #5900

2025-12-08T19:24:02Z

github-actions[bot]
bot Dec 8, 2025

This report presents an NLP-based clustering analysis of 946 copilot agent task prompts from the past 30 days (Oct 22 - Nov 18, 2025). Using TF-IDF vectorization and K-means clustering, we identified 7 distinct task pattern clusters with an overall success rate of 77.0% (728/946 merged).

Key Findings: Function-focused refactoring tasks (Cluster 5) achieved the highest success rate at 83.3%, while tasks involving documentation updates and command improvements (Cluster 1) also performed well at 80.5%. The most common task type involves adding agent functionality and JSON configuration (Cluster 3, 29.1% of tasks). Notably, tasks requiring fewer commits correlate with higher success rates, and merged PRs average 3.7 commits compared to 3.3 for unmerged tasks.

Summary Statistics

Total Tasks Analyzed: 946
Clusters Identified: 7
Overall Success Rate: 77.0% (728/946 merged)
Analysis Period: 2025-10-22 to 2025-11-18
Average Commits per Task: 3.6
Average Files Changed: 13.9

Top Performing Clusters:

Cluster 5 (Functions/Refactoring): 83.3% success rate
Cluster 1 (Documentation/Commands): 80.5% success rate
Cluster 7 (Code Analysis/Deduplication): 80.0% success rate

Largest Task Categories:

Cluster 3 (Agent/JSON Config): 275 tasks (29.1%)
Cluster 2 (Workflow Management): 201 tasks (21.2%)
Cluster 4 (Agentic Workflows): 139 tasks (14.7%)

Cluster Visualizations

Cluster Distribution (PCA Projection)

This scatter plot shows the 7 identified clusters projected into 2D space using Principal Component Analysis.

Success Rate by Cluster

Bar chart showing merge success rates for each cluster. Cluster 5 (Functions) leads with 83.3% success.

Detailed Cluster Analysis

Cluster 5: Function Refactoring & File Operations (83.3% Success) ✅

Profile: 78 tasks (8.2%) | 3.4 avg commits | 7.3 avg files | +442 avg lines

Characteristics: Tasks focused on refactoring functions, file operations, and validation logic. These tasks show the highest success rate, likely due to their focused scope and clear objectives.

Top Keywords: functions, function, file, files, validation

Representative Examples:

§2109 Add semantic function refactoring workflow - ✅ Merged
§3850 Rename frontmatter_helpers.go to map_helpers.go - ✅ Merged
§2870 Add unified SanitizeName function with configurable options - ✅ Merged

Cluster 1: Documentation & Command Updates (80.5% Success) ✅

Profile: 77 tasks (8.1%) | 3.9 avg commits | 14.7 avg files | +663 avg lines

Characteristics: Tasks involving documentation updates, command improvements, and markdown file changes. High success rate indicates these are well-understood task types.

Top Keywords: update, md, command, use, documentation, workflow, agent, add

Representative Examples:

§2629 Reorder firewall summary in documentation - ✅ Merged
§4142 Use GitHub API for lock file timestamp checks - ✅ Merged
§4238 Add git fallback for update command - ✅ Merged

Cluster 7: Code Analysis & Deduplication (80.0% Success) ✅

Profile: 60 tasks (6.3%) | 3.7 avg commits | 18.8 avg files | +724 avg lines

Characteristics: Tasks focused on code analysis, duplicate code elimination, and shared logic extraction. Despite touching many files, these tasks maintain high success rates.

Top Keywords: code, duplicate, analysis, generated, shared, logic, files, lines

Representative Examples:

§3639 Add daily code metrics workflow with persistent trend tracking - ✅ Merged
§2879 Implement JavaScript bundler with on-demand bundling - ✅ Merged
§4223 Eliminate duplicate MCP tool table rendering - ✅ Merged

Cluster 4: Agentic Workflow Development (77.0% Success)

Profile: 139 tasks (14.7%) | 3.7 avg commits | 10.2 avg files | +1644 avg lines

Characteristics: The most complex tasks in terms of code volume, involving new agentic workflows and major feature additions. Success rate is solid despite high complexity.

Top Keywords: agentic, workflow, workflows, update, add, create, file, github

Representative Examples:

§4088 Add daily documentation testing workflow - ✅ Merged
§4003 Add /cloclo command workflow with Claude - ✅ Merged
§4040 Add daily-file-diet workflow - ✅ Merged

Cluster 2: Workflow Management (76.6% Success)

Profile: 201 tasks (21.2%) | 3.3 avg commits | 11.2 avg files | +349 avg lines

Characteristics: Second-largest cluster, focused on workflow configuration, gh-aw tooling, and repository management tasks.

Top Keywords: workflow, workflows, gh, gh aw, aw, githubnext

Representative Examples:

§3733 Add test coverage for critical CLI utilities - ✅ Merged
§3777 Add test coverage for 7 untested JS modules (191 tests, 98% pass) - ✅ Merged
§4202 Add deprecated field detection to strict mode - ✅ Merged

Cluster 6: Version Updates & CLI Issues (75.0% Success)

Profile: 116 tasks (12.3%) | 3.3 avg commits | 18.5 avg files | +381 avg lines

Characteristics: Tasks involving version updates, CLI improvements, and issue resolution. Despite touching many files, maintains reasonable success rate.

Top Keywords: version, cli, issue, section, copilot, update, changes, resolve

Representative Examples:

§3659 Add daily CLI consistency checker workflow - ✅ Merged
§4066 Update Claude Code CLI to v2.0.42 - ✅ Merged
§4239 Update Node.js version check from 24 to 25 - ✅ Merged

Cluster 3: Agent Configuration & JSON Management (74.5% Success)

Profile: 275 tasks (29.1%) | 4.0 avg commits | 16.3 avg files | +580 avg lines

Characteristics: The largest cluster, representing core agent functionality additions, JSON configuration, and error handling. Lower success rate may reflect the broad scope and complexity of these tasks.

Top Keywords: add, agent, run, command, copilot, error, json, file

Representative Examples:

§3171 Add NLP-based Copilot PR conversation analysis - ✅ Merged
§2173 Add internal compiler check for secret redaction - ✅ Merged
§4236 Refactor ALL_TOOLS to separate JSON file - ✅ Merged

Statistical Analysis

Cluster Performance Metrics

Cluster	Theme	Tasks	Success %	Avg Commits	Avg Files	Avg Lines	Complexity
5	Functions/Refactoring	78	83.3% ✅	3.4	7.3	442	Low
1	Documentation/Commands	77	80.5% ✅	3.9	14.7	663	Medium
7	Code Analysis	60	80.0% ✅	3.7	18.8	724	Medium
4	Agentic Workflows	139	77.0%	3.7	10.2	1644	High
2	Workflow Management	201	76.6%	3.3	11.2	349	Medium
6	Version/CLI Updates	116	75.0%	3.3	18.5	381	Medium
3	Agent/JSON Config	275	74.5%	4.0	16.3	580	High

Success Correlations

Commits vs Success: +0.09 correlation (more commits → slightly higher success)
Files Changed vs Success: -0.05 correlation (more files → slightly lower success)
Merged PRs: Average 3.7 commits
Unmerged PRs: Average 3.3 commits

Task Complexity Analysis

Low Complexity (< 10 files, < 500 lines):

Cluster 5: Functions/Refactoring - 83.3% success

Medium Complexity (10-20 files, 300-800 lines):

Cluster 1, 2, 6, 7: Average 78.0% success

High Complexity (> 10 files, > 1000 lines):

Cluster 4: Agentic Workflows - 77.0% success (despite 1644 avg lines!)
Cluster 3: Agent/JSON Config - 74.5% success

Key Insights & Recommendations

✅ What Works Well

1. Focused Refactoring Tasks (83.3% success)

Tasks with clear, focused objectives like function refactoring and file operations show the highest success rates. These tasks typically:

Have well-defined scope (< 10 files)
Focus on code quality improvements
Involve structural changes rather than new features

Recommendation: Frame prompts with specific refactoring goals rather than broad improvement requests.

2. Documentation Updates (80.5% success)

Documentation and command improvement tasks perform exceptionally well, suggesting:

Clear requirements and expected outcomes
Lower complexity and fewer edge cases
Well-established patterns to follow

Recommendation: Continue prioritizing documentation tasks for the agent.

3. Code Analysis & Deduplication (80.0% success)

Despite touching many files, deduplication tasks succeed at high rates because:

Pattern matching is algorithmic and objective
Success criteria are clear (eliminate duplicates)
Changes are mechanical rather than creative

Recommendation: Leverage the agent for code quality and refactoring workflows.

⚠️ Areas for Improvement

1. Large-Scale Agent Configuration Tasks (74.5% success)

The largest cluster (29.1% of tasks) has the lowest success rate. Challenges include:

Broad scope (16.3 avg files, 580 avg lines)
Complex interactions between JSON config and runtime behavior
Error handling requirements across multiple components

Recommendation: Break large configuration tasks into smaller, focused subtasks.

2. High Code Volume Tasks

Tasks with 1000+ line changes show slightly lower success rates, though Cluster 4 (agentic workflows) performs surprisingly well at 77.0% despite 1644 avg lines.

Recommendation: For high-volume tasks, provide detailed context and examples.

💡 General Patterns

Success Factors:

Focused Scope: Tasks changing < 10 files succeed 83% of the time
Clear Objectives: Documentation and refactoring tasks have clear success criteria
Established Patterns: Tasks following existing code patterns succeed more often
Iterative Refinement: Slightly more commits (3.7 vs 3.3) correlate with merged PRs

Risk Factors:

Broad Scope: Tasks touching 15+ files have slightly lower success
Ambiguous Requirements: Tasks lacking specific acceptance criteria struggle
Cross-Component Changes: Tasks requiring coordination across multiple systems

🎯 Prompt Engineering Recommendations

For High Success Rates:

DO:

✅ Provide specific file and function targets
✅ Include concrete examples of desired changes
✅ Reference existing patterns to follow
✅ Break large tasks into focused subtasks
✅ Specify clear acceptance criteria

DON'T:

❌ Use vague requirements like "improve" or "enhance"
❌ Combine multiple unrelated changes in one task
❌ Omit context about existing code patterns
❌ Request sweeping changes across many files without examples

Example Transformations:

Low Success Prompt:

"Fix bugs in the agent error handling"

High Success Prompt:

"In pkg/agent/runtime.go, extract duplicate error logging code from functions HandleRuntimeError() and LogExecutionError() into a shared logError() helper function. Follow the pattern used in pkg/compiler/errors.go:45-60."

Complete Task Data (Most Recent 100 PRs)

PR #	Title	Cluster	Status	Commits	Files	Lines	Keywords
§4247	Update agentic workflows to use sh...	4	✅	3	6	63	agentic, workflow
§4244	Extract Copilot PR data fetching	4	✅	2	3	102	agentic, workflow
§4239	Update Node.js version check	6	✅	2	3	7	version, cli
§4238	Add git fallback for update command	1	✅	8	12	1157	update, command
§4237	Fix nested error rendering	2	❌	6	3	260	workflow, gh
§4236	Refactor ALL_TOOLS to JSON file	3	✅	5	80	3061	add, agent
§4234	Preserve YAML formatting	1	✅	4	2	485	update, command
§4233	Change update command behavior	1	✅	3	2	137	update, command
§4232	Document strict mode enforcement	2	✅	6	3	76	workflow, gh
§4231	Add string sanitization docs	5	✅	2	5	444	functions, file
§4230	Remove duplicate formatFileSize()	5	✅	2	1	8	functions, file
§4224	Add strict mode reference docs	2	✅	2	3	71	workflow, gh
§4223	Eliminate duplicate MCP tool table	7	✅	4	4	389	code, duplicate
§4221	Add integration test job to CI	3	✅	3	1	36	add, agent
§4220	Add GitHub MCP integration tests	3	✅	4	1	276	add, agent
§4219	Add playwright MCP integration tests	3	✅	4	1	188	add, agent
§4218	Fix JavaScript test assertions	3	✅	2	2	2	add, agent
§4217	Add Node.js 24+ requirement	1	✅	2	2	34	update, command
§4214	Fix safe-output jobs failing	3	✅	2	63	172	add, agent
§4211	Update MCP SDK version	6	✅	2	3	18	version, cli

[Table truncated - Full data includes all 946 analyzed tasks]

Methodology

Analysis Pipeline

1. Data Collection

Source: 1,000 copilot-created PRs from githubnext/gh-aw
Date Range: October 22 - November 18, 2025 (27 days)
Valid Prompts: 946 PRs with extractable task descriptions
Extraction: Parsed original prompts from PR body "Original prompt" sections

2. Text Processing

Cleaning: Removed markdown code blocks, URLs, HTML tags
Normalization: Lowercased, removed special characters, standardized whitespace
Minimum Length: Filtered prompts < 20 characters

3. Feature Extraction

Algorithm: TF-IDF (Term Frequency-Inverse Document Frequency) vectorization
Parameters:
- Max features: 100
- N-gram range: 1-3 (unigrams, bigrams, trigrams)
- Min document frequency: 2
- Max document frequency: 0.8
Stop Words: English stop words removed

4. Clustering

Algorithm: K-means clustering
Optimal K Selection: Elbow method with silhouette analysis
Clusters Identified: 7 (optimal balance between granularity and interpretability)
Initialization: 10 random initializations, seed=42 for reproducibility

5. Cluster Analysis

Extracted top 10 TF-IDF terms per cluster
Calculated success rates (merged vs total)
Computed average commits, files changed, and code additions
Identified representative examples from each cluster

6. Visualization

PCA Projection: 2-component Principal Component Analysis for 2D visualization
Charts: Cluster scatter plot and success rate bar chart
DPI: 300 for publication-quality output

Limitations

Prompt Extraction: Some PRs may have incomplete or missing prompts
Temporal Scope: Analysis covers only 27 days of activity
Success Metric: "Merged" status may not capture all aspects of task success
Context: Analysis doesn't account for external factors (complexity changes over time, user expertise, etc.)

Reproducibility

All analysis code and data are available:

Script: /tmp/gh-aw/analyze-prompts.py
**Raw (redacted) /tmp/gh-aw/prompt-cache/pr-full-data/
Report: /tmp/gh-aw/pr-data/clustering-report.md
Cache: Results saved to /tmp/gh-aw/cache-memory/trending/prompt-clustering/history.jsonl

References:

§2097 - Representative high-success documentation task
§3171 - Complex NLP workflow (this analysis!)
§4236 - Large-scale refactoring task

Analysis generated on 2025-12-08 19:20 UTC analyzing 946 copilot agent tasks from October-November 2025

AI generated by Copilot Agent Prompt Clustering Analysis

2025-12-09T19:24:24Z

github-actions[bot]
bot Dec 9, 2025
Author

⚓ Avast! This discussion be marked as outdated by Copilot Agent Prompt Clustering Analysis.
🗺️ A newer treasure map awaits ye at Discussion #5982.
Fair winds, matey! 🏴‍☠️

0 replies

[prompt-clustering] Daily Copilot Agent Prompt Clustering Analysis - Dec 8, 2025 #5900

Uh oh!

github-actions[bot] bot Dec 8, 2025

Cluster Distribution (PCA Projection)

Success Rate by Cluster

Cluster 5: Function Refactoring & File Operations (83.3% Success) ✅

Cluster 1: Documentation & Command Updates (80.5% Success) ✅

Cluster 7: Code Analysis & Deduplication (80.0% Success) ✅

Cluster 4: Agentic Workflow Development (77.0% Success)

Cluster 2: Workflow Management (76.6% Success)

Cluster 6: Version Updates & CLI Issues (75.0% Success)

Cluster 3: Agent Configuration & JSON Management (74.5% Success)

Cluster Performance Metrics

Success Correlations

Task Complexity Analysis

✅ What Works Well

1. Focused Refactoring Tasks (83.3% success)

2. Documentation Updates (80.5% success)

3. Code Analysis & Deduplication (80.0% success)

⚠️ Areas for Improvement

1. Large-Scale Agent Configuration Tasks (74.5% success)

2. High Code Volume Tasks

💡 General Patterns

Success Factors:

Risk Factors:

🎯 Prompt Engineering Recommendations

For High Success Rates:

Example Transformations:

Analysis Pipeline

1. Data Collection

2. Text Processing

3. Feature Extraction

4. Clustering

5. Cluster Analysis

6. Visualization

Limitations

Reproducibility

Replies: 1 comment

Uh oh!

github-actions[bot] bot Dec 9, 2025 Author

github-actions[bot]
bot Dec 8, 2025

github-actions[bot]
bot Dec 9, 2025
Author