📊 Lockfile Statistics Report - November 23, 2025 #4580

2025-11-23T03:41:19Z

github-actions[bot]
bot Nov 23, 2025

📊 Agentic Workflow Lock File Statistics - 2025-11-23

This report provides a comprehensive statistical analysis of all .lock.yml files in the githubnext/gh-aw repository, offering insights into the structure, patterns, and characteristics of agentic workflows.

Executive Summary

Key Findings:

Total Lock Files: 95 workflow lock files analyzed
Total Size: 23.6 MB across all files
Average File Size: 248.45 KB per lock file
Most Common Trigger: issues (89 workflows, 93.7%)
Most Popular Safe Output: create-discussion (30 workflows, 31.6%)
Dominant Permission: contents: read (93 workflows, 97.9%)
MCP Server Usage: GitHub MCP server used in 100% of workflows

The analysis reveals a mature ecosystem of agentic workflows with strong consistency in structure, comprehensive permission models, and heavy reliance on GitHub issue tracking and discussion capabilities for safe outputs.

Full Report Details

File Size Distribution

Size Range	Count	Percentage
< 100 KB	11	11.6%
100-200 KB	4	4.2%
200-300 KB	57	60.0%
300-400 KB	22	23.2%
>= 400 KB	1	1.1%

Size Statistics:

Smallest: .github/workflows/shared/mcp/arxiv.lock.yml (80.23 KB)
Largest: .github/workflows/poem-bot.lock.yml (424.43 KB)
Average: 248.45 KB
Median: 257.54 KB
Standard Deviation: 71.59 KB

Insight: The majority (60%) of lock files fall within the 200-300 KB range, indicating a consistent workflow complexity level across the repository. The single outlier at 424 KB (poem-bot) suggests unique functionality requiring additional instructions or tool configurations.

Workflow Naming Patterns

Analysis of workflow filenames reveals clear organizational patterns:

Pattern	Count	Examples
`test-*`	12	test-claude-assign-milestone, test-secret-masking
`daily-*`	8	daily-news, daily-doc-updater, daily-firewall-report
`smoke-*`	4	smoke-claude, smoke-copilot, smoke-codex, smoke-detector
`copilot-*`	6	copilot-pr-nlp-analysis, copilot-session-insights
`tests/` directory	7	Workflows in the tests subdirectory
`shared/` directory	2	Shared MCP configurations

Common Theme Words (appearing in workflow names):

test (12 occurrences) - Testing and validation workflows
daily (8 occurrences) - Scheduled daily automation
copilot (6 occurrences) - Copilot-specific analysis
analysis/summary/report (10 combined) - Analytical workflows
checker (4 occurrences) - Code quality and consistency checks

Trigger Analysis

Most Popular Triggers

Trigger Type	Count	Percentage	Description
`issues`	89	93.7%	Triggered by issue events
`workflow_dispatch`	72	75.8%	Manual workflow execution
`schedule`	45	47.4%	Time-based scheduled runs
`pull_request`	10	10.5%	PR events
`issue_comment`	12	12.6%	Comments on issues
`discussion_comment`	5	5.3%	Discussion comments
`discussion`	4	4.2%	Discussion events
`push`	2	2.1%	Push events
`workflow_run`	2	2.1%	Triggered by other workflows

Key Insights:

Issues-first approach: 93.7% of workflows are triggered by issue events, demonstrating that issues are the primary interface for agentic workflows in this repository
Manual override capability: 75.8% include workflow_dispatch, allowing maintainers to manually trigger workflows when needed
Scheduled automation: Nearly half (47.4%) run on schedules, providing regular automated maintenance and reporting
Limited PR automation: Only 10.5% trigger on pull requests, suggesting most PR workflows may be handled through issue comments or manual dispatch

Schedule Patterns

45 workflows use scheduled triggers. Sample cron patterns include:

Schedule	Frequency	Example Workflow
`0 0 * * *`	Daily at midnight	audit-workflows
`0 15 * * *`	Daily at 3 PM	cli-version-checker
`0 6 * * 0`	Weekly on Sunday at 6 AM	artifacts-summary
`0 12 * * 3`	Weekly on Wednesday at noon	blog-auditor
`0 13 * * 1-5`	Weekdays at 1 PM	cli-consistency-checker

Scheduling Strategy: Workflows are distributed across different times of day, with several patterns:

Daily maintenance tasks typically run at midnight or early morning
Regular checks run during business hours (1-3 PM)
Weekly summaries run on Sundays or mid-week

Safe Outputs Analysis

Safe outputs are the primary mechanism for agentic workflows to communicate results to users.

Safe Output Types Distribution

Type	Count	Percentage	Use Cases
`create-discussion`	30	31.6%	Publishing reports, analyses, insights
`add-comment`	21	22.1%	Responding to issues/PRs with findings
`create-issue`	19	20.0%	Creating tracked work items
`create-pull-request`	17	17.9%	Proposing code changes
`create-pull-request-review-comment`	4	4.2%	In-line code review comments
`close-pull-request`	3	3.2%	Closing PRs programmatically
`update-issue`	2	2.1%	Modifying existing issues

Safe Output Insights:

Discussions dominate (31.6%): The create-discussion output type is most popular, indicating workflows frequently generate comprehensive reports and analyses that benefit from the discussion format's rich formatting and organization capabilities.
Comment-based interactions (22.1%): The second most common output is adding comments, showing workflows often provide contextual feedback directly on issues or PRs.
Balanced creation patterns: The relatively even distribution between creating issues (20%), creating PRs (17.9%), and adding comments (22.1%) suggests workflows serve diverse purposes - reporting, proposing changes, and interactive feedback.
Limited destructive actions: Only 3 workflows (3.2%) close PRs, showing conservative use of workflow automation for potentially disruptive actions.

Discussion Categories

While comprehensive category extraction proved complex due to dynamic category resolution in workflows, the analysis shows:

Most workflows using create-discussion dynamically select categories based on workflow context
Category selection often uses variables like ${discussionCategories[0].name} for flexibility
The "audits" category is explicitly mentioned in several workflows for publishing analysis results

Structural Characteristics

Job Complexity

Finding: 100% of workflows use exactly 1 job per workflow.

This demonstrates a consistent architectural pattern:

Single-job workflows are simpler to maintain and debug
Each workflow has a clear, focused purpose
Job orchestration happens at the workflow level, not within workflows

Average Lock File Structure

Based on statistical analysis, a typical .lock.yml file has:

Characteristic	Value
File Size	~248 KB (257 KB median)
Jobs	1 job
Steps (sample)	~62 steps
Triggers	2-3 triggers (commonly issues + workflow_dispatch + schedule)
Safe Outputs	1-2 output types
Permissions	4-5 permissions (typically read content, write issues/PRs)
Timeout	10-20 minutes (most common)

Timeout Configuration

Timeout (minutes)	Workflow Count	Percentage
5	10	2.2%
10	301	66.7%
12	1	0.2%
15	18	4.0%
20	106	23.5%
30	9	2.0%
45	3	0.7%

Timeout Strategy:

10 minutes is the default for 66.7% of timeout configurations (likely step timeouts)
20 minutes is the second most common at 23.5%
Only 3 workflows require extended 45-minute timeouts
Conservative timeout values prevent runaway workflows while allowing complex analysis

Permission Patterns

Most Common Permissions

Permission	Count	Percentage	Purpose
`contents: read`	93	97.9%	Reading repository files
`issues: write`	84	88.4%	Creating/updating issues
`pull-requests: write`	83	87.4%	Creating/updating PRs
`pull-requests: read`	75	78.9%	Reading PR data
`issues: read`	71	74.7%	Reading issue data
`contents: write`	28	29.5%	Modifying repository content

Permission Distribution Analysis

Read vs. Write Patterns:

Universal read access: 97.9% require contents: read for code analysis
High write access to metadata: 88.4% have issues: write and 87.4% have pull-requests: write
Limited write access to code: Only 29.5% have contents: write

Security Posture:

Workflows follow the principle of least privilege
Most workflows can read code but not modify it
Write permissions are primarily for creating discussions, issues, and PRs, not direct code changes
Only specialized workflows (like automated refactoring or documentation updates) have contents: write

Tool & MCP Configuration

Tool Allowlist Analysis

Tool	Workflow Count	Percentage
`Bash`	92	96.8%
`WebFetch`	33	34.7%
`WebSearch`	32	33.7%

Tool Usage Insights:

Bash dominates: 96.8% of workflows use Bash, indicating shell commands are fundamental to workflow operations
Web capabilities: Approximately one-third of workflows can fetch web content or search the web, enabling research and external data gathering
Specialized tools: The analysis shows tool allowlists are customized per workflow, with many workflows having carefully scoped tool access

MCP Server Usage

MCP Server	Count	Percentage
`github`	95	100%

Universal GitHub MCP: Every workflow uses the GitHub MCP server, providing standardized access to GitHub APIs for:

Reading repository data
Managing issues and pull requests
Accessing commits, releases, and tags
Searching code and metadata

Engine Detection

Based on filename and content analysis:

Engine Reference	Workflow Count	Examples
Mentioning "copilot"	67	copilot-pr-nlp-analysis, test-copilot-assign-milestone
Mentioning "claude"	37	test-claude-assign-milestone, smoke-claude
Mentioning "codex"	20	smoke-codex, test-codex-assign-milestone

Note: Engine references often appear in test workflows or engine-specific analysis workflows. Most production workflows are engine-agnostic.

Interesting Findings

1. Homogeneous Structure, Diverse Purposes

Despite 95 different workflows, there's remarkable structural consistency:

All use exactly 1 job
97.9% read repository contents
93.7% trigger on issues
100% use GitHub MCP server

Yet workflow purposes are highly diverse, covering code analysis, documentation, testing, reporting, maintenance, and more. This demonstrates excellent architectural standardization while supporting varied use cases.

2. The "Daily" Automation Pattern

8 workflows follow the daily-* naming pattern, representing automated maintenance tasks:

daily-news: Aggregates repository news
daily-doc-updater: Keeps documentation fresh
daily-firewall-report: Security monitoring
daily-file-diet: Repository cleanup
daily-team-status: Team activity summaries

This shows systematic automation of routine tasks that would otherwise require manual effort.

3. Comprehensive Testing Infrastructure

19 workflows (20%) are dedicated to testing:

12 with test-* prefix
7 in tests/ directory
Testing covers multiple engines (Claude, Copilot, Codex)
Tests validate safe outputs, permissions, secret masking, and more

The investment in testing infrastructure ensures workflow reliability and safety.

4. Size Outliers Tell Stories

Smallest (80 KB): Shared MCP configurations are minimal, containing just configuration data
Largest (424 KB): poem-bot.lock.yml is nearly 2x the median size, likely containing extensive creative writing instructions or examples
Tight distribution: 60% of workflows fall within a 100 KB range (200-300 KB), showing consistent instruction complexity

5. Conservative Safety Posture

The data reveals a security-conscious design:

Only 29.5% can write to repository contents
Close operations (closing PRs) are rare (3.2%)
Delete operations are not present in safe outputs
Most workflows communicate through read-only analysis and creating new items rather than modifying existing state

6. Discussion-First Communication

With 31.6% using create-discussion, discussions are the preferred method for:

Publishing comprehensive reports
Sharing analysis results
Organizing findings by category
Enabling community engagement

This aligns with the repository's collaborative, transparent approach to agentic workflow development.

Recommendations

Based on this analysis, here are suggestions for workflow developers:

1. Maintain Structural Consistency

The current 1-job-per-workflow pattern works well. Continue this pattern for:

Simpler debugging
Clear separation of concerns
Easier workflow composition

2. Optimize File Sizes

Most workflows are 200-300 KB. If a workflow exceeds 350 KB:

Consider splitting into multiple workflows
Extract common instructions to shared configurations
Review if all instructions are necessary

3. Leverage Existing Patterns

For new workflows:

Use issues as the primary trigger for user-facing workflows
Include workflow_dispatch for manual testing
Default to 10-20 minute timeouts
Follow naming patterns: test-*, daily-*, etc.

4. Balance Safe Outputs

Consider combining output types strategically:

Use create-discussion for comprehensive reports
Use add-comment for immediate, contextual feedback
Reserve create-issue for actionable work items
Use create-pull-request for automated fixes

5. Apply Least Privilege Principle

Only request contents: write when necessary. Most workflows can function with:

contents: read
issues: write
pull-requests: write

6. Standardize Timeout Values

Use consistent timeouts:

10 minutes: Standard operations
20 minutes: Complex analysis
30 minutes: Large-scale processing
45 minutes: Reserved for exceptional cases

7. Document Engine Compatibility

If a workflow is engine-specific:

Use clear naming (e.g., copilot-*, claude-*)
Document engine requirements
Consider creating engine-agnostic alternatives

Historical Trends

This is the baseline analysis for lockfile statistics. Future analyses can compare against these metrics:

Baseline Metrics (2025-11-23):

Total workflows: 95
Average size: 248.45 KB
Issues trigger adoption: 93.7%
Create-discussion adoption: 31.6%
Contents write permission: 29.5%

Recommended tracking metrics:

Growth rate of lockfile count
Average file size trends (indicator of complexity growth)
Adoption rates of new safe output types
Permission pattern evolution
Test coverage ratio (test workflows / total workflows)

Methodology

Analysis Approach:

Data Collection: Python and Bash scripts parsed all 95 .lock.yml files
Statistical Analysis: Calculated distributions, averages, medians, and standard deviations
Pattern Recognition: Regular expressions extracted triggers, outputs, permissions, and tools
Cross-validation: Multiple analysis passes verified data accuracy

Tools Used:

Python 3 for structured data parsing and JSON generation
Bash scripts for pattern matching and file system operations
grep, awk, sed for text processing
Statistical analysis using Python statistics module

Data Sources:

All .lock.yml files in .github/workflows/ and subdirectories
Workflow YAML structure (triggers, jobs, steps, permissions)
Safe output configurations
Tool allowlists and MCP server configurations

Cache Memory:
Analysis scripts and data stored in /tmp/gh-aw/cache-memory/ for:

Reproducibility
Historical comparison
Future analysis runs
Pattern library development

Limitations:

Discussion categories are often dynamically resolved and couldn't be fully extracted statically
Step counts were sampled rather than comprehensively analyzed due to YAML complexity
Engine detection relies on file content mentions, not explicit configuration
Concurrency group analysis limited by data format

Analysis Date: 2025-11-23
Repository: githubnext/gh-aw
Analyzer: Lockfile Statistics Analysis Agent
Lockfiles Analyzed: 95
Total Data Size: 23.6 MB

AI generated by Lockfile Statistics Analysis Agent

2025-12-01T00:26:06Z

github-actions[bot]
bot Dec 1, 2025
Author

This discussion was automatically closed because it was created by an agentic workflow more than 1 week ago.

0 replies