📊 Agentic Workflow Lock File Statistics - November 2025 #3024

2025-11-03T03:37:46Z

github-actions[bot]
bot Nov 3, 2025

📊 Agentic Workflow Lock File Statistics - 2025-11-03

This comprehensive analysis examines 66 agentic workflow lock files (.lock.yml) in the githubnext/gh-aw repository to identify structural patterns, usage trends, and characteristics of agentic workflows at scale.

Executive Summary

Dataset Overview:

Total Lock Files Analyzed: 66 workflows
Total Size: 13.36 MB (14,008,239 bytes)
Average File Size: 207.3 KB
Analysis Date: November 3, 2025

Key Findings:

85% of workflows support manual triggering via workflow_dispatch
50% of workflows run on automated schedules
Create-discussion is the most popular safe output (24 workflows, 36%)
Average workflow has ~7 jobs and ~9 steps per job
Most common timeout is 10 minutes (42% of workflows)
GitHub MCP server is used universally across all workflows (1,888 tool invocations detected)

File Size Distribution

File Size Distribution

The majority of lock files fall in the 200-300 KB range, indicating consistent workflow complexity across the repository.

Size Range	Count	Percentage
< 100 KB	8	12.1%
100-200 KB	9	13.6%
200-300 KB	47	71.2%
> 300 KB	2	3.0%

Size Statistics:

Smallest: example-permissions-warning.lock.yml (82.3 KB)
Largest: poem-bot.lock.yml (371.7 KB)
Mean: 207.3 KB
Median: 211.3 KB
Standard Deviation: 56.8 KB

The tight clustering around 200 KB suggests standardized workflow patterns and consistent instruction complexity.

Trigger Analysis

Trigger Analysis

Most Popular Triggers

Workflows in this repository use a diverse set of triggers, with strong preference for manual invocation combined with scheduled automation.

Trigger Type	Count	Percentage	Use Case
`workflow_dispatch`	56	84.8%	Manual execution, testing, on-demand runs
`schedule`	33	50.0%	Daily/periodic automation
`pull_request`	8	12.1%	PR-triggered workflows
`issue_comment`	8	12.1%	Comment-driven workflows
`issues`	6	9.1%	Issue event triggers
`workflow_run`	3	4.5%	Chained workflow execution
`discussion_comment`	3	4.5%	Discussion interaction
`discussion`	2	3.0%	Discussion creation/update
`pull_request_review_comment`	2	3.0%	PR review comments
`push`	2	3.0%	Push to repository

Key Insights:

84.8% of workflows support manual triggering, enabling flexible execution and testing
50% run on automated schedules for periodic maintenance, reporting, or monitoring
Event-driven workflows (PR, issues, comments) make up ~25% of triggers

Common Trigger Combinations

Combination	Count	Notes
`schedule+workflow_dispatch`	27	Standard pattern: automated + manual
`workflow_dispatch` only	17	Pure on-demand workflows
`pull_request+schedule+workflow_dispatch`	5	Multi-modal PR workflows
`workflow_run`	3	Dependent/chained workflows
`issue_comment`	2	Comment-only triggered
`issues`	2	Issue-only triggered

The dominance of schedule+workflow_dispatch (40.9%) shows a preference for workflows that run periodically but can also be invoked on-demand.

Schedule Patterns

Most Common Cron Schedules:

Schedule (Cron)	Count	Description	UTC Time
`0 0,6,12,18 * * *`	5	Four times daily	00:00, 06:00, 12:00, 18:00
`0 9 * * *`	3	Daily morning	09:00
`0 6 * * 0`	2	Weekly Sunday	06:00 Sunday
`0 2 * * 1-5`	2	Weekday nights	02:00 Mon-Fri
`0 0 * * *`	2	Daily midnight	00:00
All others	1 each	Various custom schedules	-

Schedule Frequency Distribution:

Daily: 16 workflows (48.5%)
Multiple times per day: 5 workflows (15.2%)
Weekdays only: 5 workflows (15.2%)
Weekly: 4 workflows (12.1%)
Other patterns: 3 workflows (9.1%)

Observation: The 4x daily schedule (0 0,6,12,18 * * *) is popular for monitoring and reporting workflows that need regular but not continuous updates.

Safe Outputs Analysis

Safe Outputs Analysis

Safe outputs are the approved mechanisms for agentic workflows to communicate results back to GitHub.

Safe Output Types Distribution

Type	Count	Percentage	Use Case
`create-discussion`	24	36.4%	Reports, summaries, analysis results
`create-issue`	15	22.7%	Bug reports, tasks, action items
`add-comment`	15	22.7%	Responses, updates, feedback
`create-pull-request`	13	19.7%	Code changes, documentation updates
`update-issue`	1	1.5%	Issue modifications

Total workflows with safe outputs: 68 job instances across 66 workflows (some use multiple)

Key Findings:

Create-discussion dominates: 36% of safe output jobs use discussions, making it the preferred method for sharing analysis, reports, and summaries
Balanced action types: Issues (22.7%), comments (22.7%), and PRs (19.7%) are roughly evenly distributed
Multiple outputs: Some workflows use multiple safe output types (e.g., poem-bot.lock.yml uses 5 different types)

Workflows with Multiple Safe Outputs

Workflow	Safe Output Count	Types Used
`poem-bot.lock.yml`	5	issue, comment, PR, PR review comment, update-issue
`craft.lock.yml`	3	issue, comment, PR
`technical-doc-writer.lock.yml`	2	comment, PR
`unbloat-docs.lock.yml`	2	comment, PR
`q.lock.yml`	2	comment, PR
`smoke-detector.lock.yml`	2	issue, comment
`ci-doctor.lock.yml`	2	issue, comment

Insight: Workflows that interact with PRs or issues often need multiple output types to provide comprehensive feedback (e.g., create issue + add comment to explain).

Structural Characteristics

Structural Characteristics

Job Complexity

Metric	Value	Range
Average Jobs per Workflow	6.6	2 - 16 jobs
Median Jobs	6	-
Most Complex Workflow	16 jobs	`poem-bot.lock.yml`
Simplest Workflows	2 jobs	8 test/firewall workflows

Job Count Distribution:

2 jobs: 8 workflows (12.1%) - mostly test/firewall workflows
5-7 jobs: 41 workflows (62.1%) - standard pattern
8-10 jobs: 13 workflows (19.7%) - complex workflows
11+ jobs: 4 workflows (6.1%) - highly complex workflows

Steps per Job

Metric	Value	Range
Average Steps per Job	8.8	5 - 16 steps
Median Steps	9	-
Maximum Steps	16	Some complex agent jobs
Minimum Steps	5	Minimal workflows

Steps Distribution:

5-7 steps: 18 instances (27.3%)
8-10 steps: 41 instances (62.1%)
11+ steps: 7 instances (10.6%)

Average Lock File Structure

Based on median values, a typical agentic workflow in this repository has:

File Size: ~211 KB
Jobs: 6 jobs (usually: activation, agent, detection, safe outputs, missing tool, upload assets)
Steps per Job: 9 steps
Timeout: 10-15 minutes
Permissions: Read-only (contents, issues, pull-requests)
Triggers: schedule+workflow_dispatch
Safe Output: 1 safe output type (most commonly create-discussion)
Concurrency Group: gh-aw-${{ github.workflow }}

Permission Patterns

Permission Patterns

Most Common Permissions

Permission	Count	Typical Access Level
`contents`	65	read
`pull-requests`	59	read/write
`issues`	58	read/write
`actions`	23	read
`discussions`	7	write
`security-events`	4	read
`repository-projects`	3	read
`attestations`	2	read
`checks`	2	read
`deployments`	2	read
`models`	2	read
`packages`	2	read
`pages`	2	read
`statuses`	2	read

Total Permission Grants: 233 read permissions, 0 explicit write permissions in analyzed data

Key Observations:

Universal read access: Nearly all workflows (65/66) request contents: read to access repository code
Issue/PR management: ~90% request issues and pull-requests permissions for safe outputs
Security-conscious: All detected permissions are read-only at the global level; write permissions are granted via safe output mechanisms
Actions access: 35% request actions: read to inspect workflow runs and artifacts

Permission Distribution

Based on the number of permissions requested per workflow:

Minimal permissions (1-5): ~40% of workflows
Standard permissions (6-10): ~50% of workflows
Comprehensive permissions (11+): ~10% of workflows

Security Posture: The repository follows the principle of least privilege, with workflows requesting only the permissions they need. Write operations are funneled through safe output mechanisms rather than broad write permissions.

Timeout Configuration

Timeout Configuration

Timeout Distribution

Timeout	Count	Percentage	Use Case
10 min	28	42.4%	Standard workflows
20 min	17	25.8%	Complex analysis
15 min	11	16.7%	Medium complexity
5 min	5	7.6%	Quick checks
30 min	5	7.6%	Heavy processing

Timeout Statistics:

Average: 14.5 minutes
Median: 12.5 minutes
Most Common: 10 minutes (42.4%)
Range: 5 - 30 minutes

Insights:

10-minute standard: The 10-minute timeout is the most popular, balancing execution time with cost control
Conservative approach: 85% of workflows complete within 20 minutes
Heavy workflows: Only 7.6% need 30 minutes, indicating most agentic tasks complete quickly

Tool & MCP Patterns

Tool & MCP Patterns

MCP (Model Context Protocol) Server Usage

MCP Tool Invocations Detected:

MCP Server	Tool Invocations	Workflows Using
`github`	1,888	66 (100%)
`playwright`	84	~6-8 workflows
`deepwiki`	6	~1-2 workflows
`arxiv`	6	~1-2 workflows

Universal GitHub MCP Usage: Every single workflow (100%) uses the GitHub MCP server, demonstrating its critical role in agentic workflows. With 1,888 tool invocations across 66 workflows, that's an average of ~28.6 GitHub MCP tool calls per workflow.

Specialized MCP Servers:

Playwright MCP: Used for web automation and browser interactions (84 invocations)
DeepWiki MCP: Wikipedia and knowledge base queries (6 invocations)
ArXiv MCP: Academic paper research and citations (6 invocations)

Common Tool Patterns

Based on the GitHub MCP server dominance and file analysis:

Core Tool Categories:

Repository Operations: Reading files, searching code, listing issues/PRs
Workflow Management: Checking run status, downloading artifacts
Issue/PR Interaction: Creating issues, commenting, reading discussions
Metadata Access: User info, commit details, branch information

Observation: The GitHub MCP server provides the essential toolkit for agentic workflows, handling ~97% of all MCP interactions.

Concurrency Patterns

Concurrency Patterns

Concurrency groups prevent multiple instances of the same workflow from running simultaneously.

Concurrency Group Distribution

Concurrency Group Pattern	Count	Percentage
`gh-aw-${{ github.workflow }}`	46	69.7%
`gh-aw-${{ github.workflow }}-${{ github.event.issue.number \|\| github.event.pull_request.number }}`	9	13.6%
`gh-aw-${{ github.workflow }}-${{ github.event.pull_request.number \|\| github.ref }}`	6	9.1%
`dev-workflow-${{ github.ref }}`	2	3.0%
Other patterns	3	4.5%

Concurrency Strategies:

Workflow-level (69.7%): Only one instance of the workflow runs at a time
Event-scoped (22.7%): Allows parallel runs for different issues/PRs but serializes per issue/PR
Ref-scoped (3.0%): Allows parallel runs per branch/ref

Insight: The dominant pattern (workflow-level concurrency) prevents race conditions and resource conflicts for scheduled/on-demand workflows.

Interesting Findings

Interesting Findings

1. Standardized Workflow Architecture

Analysis reveals a highly consistent workflow structure:

6 common jobs: activation → agent → detection → safe outputs → missing tool → upload assets
Standard size: 71% of workflows fall within 200-300 KB
Consistent naming: Workflows follow clear naming conventions (e.g., daily-*, smoke-*, test-*)

This standardization suggests the repository uses a template or framework for generating workflows.

2. Safe Output Preferences

Create-discussion is the clear winner (36% of outputs), likely because:

Persistent: Discussions are long-lived and searchable
Organized: Can be categorized (e.g., "audits", "reports")
Non-invasive: Don't clutter issues or PRs
Rich formatting: Support full markdown, images, charts

3. Multi-Purpose Workflows

Some workflows support extensive trigger combinations:

poem-bot.lock.yml: Handles discussions, discussion comments, issues, issue comments, PRs, and PR review comments
Shows flexibility of the agentic workflow system to respond to diverse events

4. The "4x Daily" Pattern

Five workflows use the 0 0,6,12,18 * * * schedule (00:00, 06:00, 12:00, 18:00 UTC). This appears to be a standard for:

Monitoring workflows that need regular updates
Reporting workflows that generate time-series data
Detection workflows for catching issues quickly but not continuously

5. Conservative Timeout Philosophy

With 85% of workflows timing out at ≤20 minutes:

Cost control: Shorter timeouts limit compute costs
Fail-fast: Helps identify stuck or inefficient agents
Practical: Most agentic tasks complete in <15 minutes

6. Read-Only Permission Model

Despite workflows creating issues, PRs, and discussions, the base permissions are read-only. Write operations go through:

Safe output mechanisms: Vetted, controlled write paths
GitHub Actions: Using github-script or direct API calls with job tokens

This architecture provides security through forced code review of write operations.

7. Test/Firewall Workflows

8 workflows (12%) are minimal "test" or "firewall" workflows with only 2 jobs and <100 KB size:

test-post-steps.lock.yml
test-svelte.lock.yml
test-secret-masking.lock.yml
test-jqschema.lock.yml
test-manual-approval.lock.yml
dev.firewall.lock.yml
firewall.lock.yml
example-permissions-warning.lock.yml

These likely serve as integration tests or safety checks for the agentic workflow system.

Recommendations

Recommendations

Based on this statistical analysis, here are recommendations for optimizing and evolving agentic workflows:

1. Standardize on Common Patterns

Finding: 71% of workflows are 200-300 KB with 6 jobs and 9 steps/job.

Recommendation:

Formalize the 6-job pattern (activation, agent, detection, outputs, missing tool, assets) as a best practice
Create workflow generation templates to maintain consistency
Document deviations from the standard (e.g., poem-bot.lock.yml with 16 jobs)

2. Optimize Timeout Configurations

Finding: 42% use 10-minute timeouts, but average is 14.5 minutes.

Recommendation:

Profile actual execution times for each workflow
Consider dynamic timeouts based on workflow complexity
For the 5 workflows with 30-minute timeouts, investigate if they can be optimized

3. Expand Safe Output Usage

Finding: Create-discussion is most popular (36%), but some workflows have no safe outputs (10%).

Recommendation:

Audit workflows without safe outputs to determine if they should have them
Consider adding discussion categories for better organization
Explore create-pr-review-comment for more PR-focused workflows (currently only 1 workflow uses this)

4. Leverage Multi-Trigger Patterns

Finding: Only 5 workflows use pull_request+schedule+workflow_dispatch.

Recommendation:

More workflows could benefit from multi-modal triggers
Consider enabling PR triggers for analysis workflows (code review, testing)
Add issue_comment triggers for interactive "command" workflows

5. Investigate Outliers

High-complexity workflows:

poem-bot.lock.yml (16 jobs, 371 KB, 5 safe output types)
q.lock.yml (14 jobs, 308 KB)
scout.lock.yml (14 jobs)

Recommendation: Review these for potential optimization or splitting into multiple workflows.

6. Expand MCP Server Usage

Finding: GitHub MCP is universal, but specialized servers (playwright, deepwiki, arxiv) are rare.

Recommendation:

Document successful use cases for playwright, deepwiki, arxiv
Explore additional MCP servers for specialized tasks
Create examples for integrating new MCP servers

7. Schedule Optimization

Finding: 5 workflows run 4x daily, but most run 1x daily.

Recommendation:

Review if 4x daily workflows truly need that frequency
Consider event-driven triggers (e.g., push, workflow_run) instead of frequent schedules
For monitoring workflows, explore webhook-based alternatives

Methodology

Methodology

Analysis Approach

This statistical analysis was performed using automated scripts stored in /tmp/gh-aw/cache-memory/scripts/:

Data Collection:
- Used find and glob to locate all .lock.yml files in .github/workflows/
- Parsed YAML files using Python's yaml.safe_load()
- Extracted file sizes using du -b
Statistical Analysis:
- Calculated descriptive statistics (mean, median, std dev) using Python's statistics module
- Counted frequencies using collections.Counter
- Categorized data into distributions (file size ranges, timeout buckets, etc.)
Pattern Detection:
- Used regex to extract MCP tool invocations (mcp__*__)
- Parsed YAML sections for triggers, permissions, safe outputs
- Identified trigger combinations and concurrency patterns
Data Validation:
- Cross-referenced counts with manual file inspection
- Verified YAML parsing with sample files
- Checked for parsing errors or malformed files

Data Sources

Lock Files Analyzed: 66 .lock.yml files in .github/workflows/
Total Size: 13.36 MB
Date Range: Analysis performed on 2025-11-03
Repository: githubnext/gh-aw
Branch: main

Limitations

Static Analysis: This analysis examines lock file structure, not runtime behavior
No Historical Data: First-time analysis with no trend comparison
Model/Engine Detection: Could not extract AI model or engine information from current lock file structure
Discussion Categories: Category extraction failed (may require reading job outputs rather than inputs)

Scripts Used

All analysis scripts are stored in /tmp/gh-aw/cache-memory/scripts/:

analyze_lockfiles.py: Main analysis script for file sizes, jobs, steps, permissions
file_sizes.sh: Shell script for file size collection
Python inline scripts for trigger analysis, safe outputs, and statistics

Generated by Lockfile Statistics Analysis Agent
Date: 2025-11-03
Repository: githubnext/gh-aw
Analysis Version: 1.0
Total Lock Files: 66
Total Size Analyzed: 13.36 MB

AI generated by Lockfile Statistics Analysis Agent

2025-11-28T23:06:14Z

github-actions[bot]
bot Nov 28, 2025
Author

This discussion was automatically closed because it was created by an agentic workflow more than 1 week ago.

0 replies

📊 Agentic Workflow Lock File Statistics - November 2025 #3024

Uh oh!

github-actions[bot] bot Nov 3, 2025

📊 Agentic Workflow Lock File Statistics - 2025-11-03

Executive Summary

File Size Distribution

Trigger Analysis

Most Popular Triggers

Common Trigger Combinations

Schedule Patterns

Safe Outputs Analysis

Safe Output Types Distribution

Workflows with Multiple Safe Outputs

Structural Characteristics

Job Complexity

Steps per Job

Average Lock File Structure

Permission Patterns

Most Common Permissions

Permission Distribution

Timeout Configuration

Timeout Distribution

Tool & MCP Patterns

MCP (Model Context Protocol) Server Usage

Common Tool Patterns

Concurrency Patterns

Concurrency Group Distribution

Interesting Findings

1. Standardized Workflow Architecture

2. Safe Output Preferences

3. Multi-Purpose Workflows

4. The "4x Daily" Pattern

5. Conservative Timeout Philosophy

6. Read-Only Permission Model

7. Test/Firewall Workflows

Recommendations

1. Standardize on Common Patterns

2. Optimize Timeout Configurations

3. Expand Safe Output Usage

4. Leverage Multi-Trigger Patterns

5. Investigate Outliers

6. Expand MCP Server Usage

7. Schedule Optimization

Methodology

Analysis Approach

Data Sources

Limitations

Scripts Used

Replies: 1 comment

Uh oh!

github-actions[bot] bot Nov 28, 2025 Author

github-actions[bot]
bot Nov 3, 2025

github-actions[bot]
bot Nov 28, 2025
Author