📊 Agentic Workflow Lock File Statistics - December 3, 2025 #5351

2025-12-03T03:36:38Z

github-actions[bot]
bot Dec 3, 2025

📊 Agentic Workflow Lock File Statistics - December 3, 2025

This comprehensive statistical analysis examines all 100 .lock.yml files in the githubnext/gh-aw repository, revealing usage patterns, architectural decisions, and structural characteristics of agentic workflows.

Key Findings

Repository lockfiles show strong standardization: 82% of workflows use issues as a trigger, 77% support manual dispatch, and 62% run on schedules. The GitHub MCP server dominates with 3,695 tool references across workflows. Safe outputs favor discussions (37 workflows) over direct issue creation (23 workflows), indicating a preference for threaded conversations. The most common workflow pattern combines issues, schedule, and workflow_dispatch triggers (51 workflows), suggesting a robust multi-activation strategy for automation.

Complete Statistical Analysis

Executive Summary

Total Lock Files: 100
Total Size: 29.37 MB
Average File Size: 300 KB
Median File Size: 307 KB
Analysis Date: December 3, 2025
Repository: githubnext/gh-aw

File Size Distribution

Size Range	Count	Percentage
< 10 KB	0	0%
10-50 KB	0	0%
50-100 KB	3	3%
100-200 KB	6	6%
200-300 KB	37	37%
300-400 KB	53	53%
> 400 KB	1	1%

Size Statistics:

Smallest: arxiv.lock.yml (81 KB) - Minimal MCP server configuration
2nd Smallest: context7.lock.yml (81 KB) - Another minimal MCP server config
3rd Smallest: test-skip-if-match-object.lock.yml (86 KB) - Test workflow
Largest: poem-bot.lock.yml (608 KB) - Complex creative content generator
2nd Largest: pr-nitpick-reviewer.lock.yml (398 KB) - Detailed PR review agent

Distribution Insights:

53% of workflows fall in the 300-400 KB range, indicating strong standardization
Only 1 workflow exceeds 400 KB (poem-bot at 608 KB)
90% of workflows are between 200-400 KB, suggesting consistent complexity

Trigger Analysis

Primary Trigger Distribution

Trigger Type	Count	Percentage	Description
issues	82	82%	Activated by issue events
workflow_dispatch	77	77%	Supports manual triggering
schedule	62	62%	Runs on cron schedules
pull_request	9	9%	PR-related triggers
push	3	3%	Push to repository

Most Common Trigger Combinations

Combination	Count	Use Case
issues + schedule + workflow_dispatch	51	Multi-modal activation: automated, manual, and issue-driven
issues + workflow_dispatch	12	Issue-focused with manual override
issues (only)	8	Pure issue-driven workflows
issues + pull_request + schedule + workflow_dispatch	5	Comprehensive automation covering all events
schedule + workflow_dispatch	4	Periodic tasks with manual option

Trigger Insights

82% issue-driven: The vast majority of workflows respond to GitHub issues, indicating gh-aw is primarily used for issue management and triage
77% manually dispatchable: High flexibility with most workflows supporting on-demand execution
62% scheduled: Strong emphasis on automated, time-based execution
Multi-trigger strategy: 51 workflows (51%) use the triple combination of issues + schedule + workflow_dispatch, providing maximum flexibility
Limited PR focus: Only 9% of workflows trigger on pull requests, suggesting issue-centric workflow design

Safe Outputs Analysis

Safe Output Types Distribution

Type	Count	Percentage	Description
create-discussion	37	37%	Creates GitHub Discussions
add-comment	24	24%	Adds comments to issues/PRs/discussions
create-issue	23	23%	Creates new GitHub issues
create-pull-request	14	14%	Creates pull requests
noop	1	1%	No output (silent completion)

Example Workflows:

create-discussion: daily-news, weekly-issue-summary, audit-workflows
add-comment: daily-fact, issue-triage-agent, grumpy-reviewer
create-issue: ci-doctor, breaking-change-checker, security-fix-pr
create-pull-request: tidy, repository-quality-improver, semantic-function-refactor

Discussion Categories Used

Category	Count	Notes
audits	13	Security and workflow audits
General	8	General discussions
Audits (capitalized)	4	Alternative spelling
artifacts	2	Build artifacts discussions
dev	2	Development discussions
daily-news	1	Repository activity summaries
reports	1	Various agent analysis reports
security	1	Security-related findings
research	1	Research and investigation
Ideas	1	Feature ideas and proposals

Key Insight: The "audits" category (combined case variations: 17 total) is the most popular destination for workflow outputs, indicating gh-aw is heavily used for automated auditing and quality checks.

Safe Output Patterns

Threaded conversations preferred: 37% use create-discussion vs 23% create-issue, showing preference for discussion threads
Commentary workflows: 24% add comments to existing threads, enabling continuous engagement
Automated PRs: 14% create pull requests, demonstrating code modification capabilities
Minimal silent workflows: Only 1 workflow uses noop, indicating most agents produce visible outputs

Structural Characteristics

Job Complexity

Based on statistical analysis across all 100 workflows:

Average Jobs per Workflow: 7.23 jobs
Maximum Jobs: 19 jobs (in a single workflow)
Minimum Jobs: 2 jobs
Standard Pattern: activation → agent → detection → add_comment/create_discussion → conclusion

Job Distribution Insights:

Most workflows follow a 7-stage pipeline: activation, agent execution, detection, output generation, and conclusion
High job count (19 max) indicates complex orchestration with multiple conditional paths
Minimum of 2 jobs suggests even simple workflows maintain activation + agent pattern

Typical Lock File Structure

Based on median and average values, a typical .lock.yml file has:

Size: ~307 KB (median)
Jobs: ~7 jobs
Triggers: issues + schedule + workflow_dispatch (51% of workflows)
Permissions: contents:read, issues:read, pull-requests:read, actions:read
Timeout: 10-20 minutes (most common: 10 minutes with 290 occurrences)
Safe Output: create-discussion or add-comment
Engine: Claude (based on available data)
MCP Server: GitHub MCP server with default + specialized toolsets

Timeout Distribution

Timeout (minutes)	Occurrences	Use Case
5	25	Quick checks and simple operations
10	290	Default for most workflows
12	2	Slightly extended operations
15	39	Medium complexity tasks
20	117	Complex analysis and research
30	24	Deep analysis and heavy processing
45	7	Very complex operations
60	2	Maximum timeout for intensive tasks

Insight: 10 minutes is overwhelmingly the most common timeout (290 occurrences), indicating most workflows complete quickly. Only 33 workflows (7%) require 30+ minutes.

Permission Patterns

Most Common Permissions

Permission	Count	Type	Purpose
contents: read	94	Read	Access repository files
pull-requests: read	80	Read	View PR information
issues: read	79	Read	Access issue data
actions: read	44	Read	View workflow run data
discussions: read	11	Read	Access discussions
security-events: read	6	Read	View security alerts
issues: write	3	Write	Create/modify issues
discussions: write	2	Write	Create/modify discussions
pull-requests: write	1	Write	Create/modify PRs
contents: write	1	Write	Modify repository files

Permission Analysis

Read-heavy: 94% of workflows use read-only permissions for contents
Minimal write permissions: Only 7 workflows have any write permissions
Security-conscious: Low write permission usage indicates safe, non-invasive automation
GitHub API focused: High usage of issues:read (79) and pull-requests:read (80) shows GitHub-centric operations

Security Posture: The repository demonstrates excellent security practices with minimal write permissions and predominant use of read-only access patterns.

Tool & MCP Server Usage

MCP Server Distribution

MCP Server	Tool References	Percentage	Purpose
github	3,695	95.3%	GitHub API operations
playwright	210	5.4%	Browser automation and web scraping
arxiv	6	0.2%	Academic paper research
deepwiki	6	0.2%	Wikipedia/knowledge queries
context7	4	0.1%	Context management

Dominance of GitHub MCP: The GitHub MCP server accounts for 95.3% of all MCP tool references, showing workflows are heavily GitHub-focused.

Top 20 GitHub MCP Tools (Each: 66 uses)

All of the following tools appear exactly 66 times across workflows, indicating they are part of a standard toolset configuration:

download_workflow_run_artifact
get_code_scanning_alert
get_commit
get_dependabot_alert
get_discussion
get_discussion_comments
get_file_contents
get_job_logs
get_label
get_latest_release
get_me
get_notification_details
get_pull_request
get_pull_request_comments
get_pull_request_diff
get_pull_request_files
get_pull_request_review_comments
get_pull_request_reviews
get_pull_request_status
get_release_by_tag

Standardization Insight: The uniform count of 66 uses across all these tools suggests they are configured as a comprehensive toolset bundle that 66 workflows include by default, rather than being individually selected.

MCP Server Specialization

Playwright (210 references): Used for workflows requiring web browsing, visual testing, or external website analysis
Research Tools: arxiv and deepwiki provide knowledge augmentation for research-oriented workflows
Minimal External Dependencies: Only 5 MCP servers total, showing focused, well-curated tool selection

Engine Distribution

Engine	Count	Usage
claude	5	Claude AI (Anthropic)
copilot	4	GitHub Copilot
codex	2	OpenAI Codex
Other	6	Various specialized engines

Note: Engine data is extracted from workflow comments and may not represent all workflows. Most workflows likely use a default engine not explicitly specified in the lockfile.

Interesting Findings

1. Standardized Toolset Pattern

66 workflows share an identical GitHub MCP toolset with exactly 20 tools, indicating a well-established template or base configuration for new workflows.

2. Size-Complexity Correlation

The poem-bot workflow (608 KB) is nearly 2x larger than the average, suggesting creative/generative workflows require significantly more configuration than analytical workflows.

3. Multi-Trigger Flexibility

51 workflows (51%) use the triple combination of issues + schedule + workflow_dispatch, providing three independent activation paths and maximizing workflow accessibility.

4. Audit-Centric Architecture

With "audits" as the top discussion category (17 combined occurrences) and high usage of read permissions, gh-aw workflows are primarily designed for monitoring, auditing, and reporting rather than direct code modification.

5. Conservative Timeout Strategy

290 workflows use 10-minute timeouts, but 117 use 20 minutes, suggesting two distinct workflow classes: quick operations and in-depth analysis.

6. Minimal External Dependencies

Despite having access to various MCP servers, 95.3% of tool usage is GitHub-focused, showing workflows stay within the GitHub ecosystem.

7. Discussion > Issue Creation

37 workflows create discussions vs 23 that create issues, indicating a preference for conversational, threaded outputs over actionable issue tracking.

8. Test Workflows Are Minimal

The smallest workflows are in tests/ and shared/mcp/ directories (81-86 KB), showing test workflows are intentionally simplified for focused validation.

Recommendations

1. Standardize Engine Configuration

Only 17 workflows explicitly specify engines in analyzed data. Consider documenting the default engine and when to specify alternatives.

2. Consolidate Discussion Categories

Multiple variations of "audits" (audits, Audits) exist. Standardize on a single category name to improve organization.

3. Optimize Large Workflows

Investigate poem-bot.lock.yml (608 KB) to identify opportunities for refactoring or template reuse that could reduce size.

4. Document Timeout Guidelines

With clear clustering at 10 and 20 minutes, establish guidelines for when to use each timeout tier.

5. Expand PR Workflows

Only 9% of workflows trigger on pull requests. Consider expanding PR-triggered automation for code review and quality checks.

6. MCP Server Documentation

Document the standard 66-workflow toolset pattern so new workflow authors understand what's included by default.

7. Safe Output Best Practices

With 37 discussion creators vs 24 comment adders, establish guidelines for when to create new discussions vs comment on existing ones.

Methodology

Data Collection

Source: 100 .lock.yml files in .github/workflows/ and subdirectories
Analysis Tools:
- Bash scripts for file parsing and pattern extraction
- Python 3 script for detailed YAML analysis and statistics
Cache Memory: Scripts and results stored in /tmp/gh-aw/cache-memory/ for reproducibility

Analysis Approach

File discovery and size distribution analysis
Trigger extraction from frontmatter YAML comments
Safe output detection from workflow configuration
MCP tool reference counting via regex pattern matching
Permission parsing from GitHub Actions YAML
Statistical aggregation and correlation analysis

Limitations

Engine data extracted from comments may not cover all workflows
Some categories have variations in naming (audits vs Audits)
Tool counts represent references, not unique usage per workflow
Step-level analysis not included (focused on workflow-level patterns)

Historical Context

This is the first comprehensive statistical analysis of gh-aw lockfiles. Future analyses can compare against this baseline to track:

Growth in total workflow count
Evolution of trigger patterns
Changes in safe output preferences
MCP server adoption trends
File size trends as workflows mature

Analysis Date: December 3, 2025
Generated by: Lockfile Statistics Analysis Agent
Cache Location: /tmp/gh-aw/cache-memory/
Reproducible: Run python3 /tmp/gh-aw/cache-memory/scripts/detailed_analysis.py from workflows directory

AI generated by Lockfile Statistics Analysis Agent

2025-12-07T00:24:10Z

github-actions[bot]
bot Dec 7, 2025
Author

This discussion was automatically closed because it was created by an agentic workflow more than 3 days ago.

0 replies