📊 Lockfile Statistics Analysis - November 6, 2025 #3305

2025-11-06T03:35:28Z

github-actions[bot]
bot Nov 6, 2025

📊 Agentic Workflow Lock File Statistics - November 6, 2025

Executive Summary

This comprehensive analysis examines 71 lock files across the gh-aw repository, revealing patterns in workflow design, trigger usage, safe outputs, and structural characteristics. The analysis shows that workflows in this repository are predominantly manual (workflow_dispatch: 82%) and scheduled (schedule: 51%), with an average file size of 207 KB and 5.4 jobs per workflow. The GitHub MCP server dominates tool usage, and the "audits" discussion category is the most popular destination for workflow outputs.

Key Highlights:

71 lock files analyzed with an average of 56 steps per workflow
Manual triggers (workflow_dispatch) present in 82% of workflows
Create-discussion is the most popular safe output mechanism (25 workflows)
GitHub MCP server tools used extensively across all workflows
File sizes range from 22 KB to 381 KB, with 72% over 200 KB

Full Report Details

File Size Distribution

The lock files in this repository are generally substantial in size, reflecting comprehensive workflow definitions with detailed agent instructions, MCP configurations, and multi-job orchestration.

Size Range	Count	Percentage
< 10 KB	0	0.0%
10-50 KB	1	1.4%
50-100 KB	12	16.9%
100-200 KB	7	9.9%
> 200 KB	51	71.8%

Statistics:

Smallest: opencode.lock.yml (22 KB) - A shared configuration file
Largest: poem-bot.lock.yml (381 KB) - Feature-rich creative bot with extensive prompts
Average: 207 KB
Total Size: ~15 MB for all lock files

Size Distribution Analysis

The predominance of large files (>200 KB) indicates that most workflows include:

Extensive agent system prompts and instructions
Multiple MCP server configurations
Complex job dependency graphs
Comprehensive security and permission settings
Detailed safe output configurations

Trigger Analysis

Most Popular Triggers

Workflows in this repository utilize a diverse set of triggers, with manual and scheduled triggers dominating:

Trigger Type	Count	Percentage	Use Case
workflow_dispatch	58	81.7%	Manual execution, testing, on-demand analysis
schedule	36	50.7%	Daily reports, periodic analysis, maintenance
issue_comment	12	16.9%	Comment-activated agents (e.g., /archie)
issues	11	15.5%	Issue creation/edit triggers
pull_request	10	14.1%	PR-based analysis and validation
push	5	7.0%	CI/CD, automatic updates
discussion_comment	3	4.2%	Discussion-based interactions
workflow_run	2	2.8%	Workflow chaining
pull_request_review_comment	2	2.8%	Review comment triggers
discussion	2	2.8%	Discussion events
workflow_call	1	1.4%	Reusable workflow

Insights

Manual Control Dominates: 82% of workflows support workflow_dispatch, enabling developers to run analysis and agents on-demand
Scheduled Automation: 51% run on schedules, providing regular automated insights (daily reports, weekly summaries)
Interactive Agents: 17% respond to comments, enabling conversational workflow activation patterns like /archie, /scout
Mixed Strategies: Many workflows combine multiple triggers (e.g., both manual and scheduled)

Common Trigger Combinations

Based on the data, typical combinations include:

workflow_dispatch + schedule: On-demand with periodic automation (e.g., daily-news, daily-firewall-report)
issues + issue_comment + pull_request: Full GitHub event coverage for interactive agents (e.g., archie, scout)
workflow_dispatch only: Pure manual workflows for specialized tasks (e.g., testing, one-off analysis)

Schedule Patterns

The repository uses various cron schedules for periodic execution:

Schedule (Cron)	Count	Description
`0 9 * * *`	3	Daily at 9:00 AM UTC
`0 0,6,12,18 * * *`	3	Four times daily (every 6 hours)
`0/10 * * * *`	3	Every 10 minutes (frequent monitoring)
`0 6 * * 0`	2	Weekly on Sunday at 6:00 AM
`0 2 * * 1-5`	2	Weekdays at 2:00 AM
`0 15 * * 1`	2	Monday at 3:00 PM
Various others	17	Range from midnight to evening, daily to weekly

Typical Patterns:

Daily Morning Reports: 9 AM UTC for daily summaries and news
High-Frequency Monitoring: Every 10 minutes for critical checks
Weekly Audits: Sunday or Monday for weekly summaries
Off-Hours Processing: Night/early morning for heavy analysis

Safe Outputs Analysis

Agentic workflows use "safe outputs" to communicate results securely without direct repository write access. This analysis shows diverse output strategies across the repository.

Safe Output Types Distribution

Type	Count	Workflows Using	Description
create-discussion	25	35.2%	Post analysis results to GitHub Discussions
create-pull-request	18	25.4%	Propose code changes via PR
add-comment	17	23.9%	Comment on existing issues/PRs
create-issue	15	21.1%	Create new issues for findings
update-issue	2	2.8%	Update existing issues

Discussion Categories

For workflows using create-discussion, the most popular categories are:

Category	Count	Use Case
audits (lowercase)	11	Audit reports, compliance checks, workflow analysis
Audits (capitalized)	3	Similar to above (case variation)
dev	2	Development discussions
artifacts	2	Artifact summaries
ideas	2	Feature ideas and suggestions
security	1	Security findings
research	1	Research outputs
daily	1	Daily reports

Note: "audits" and "Audits" appear to be the same category with case inconsistency.

Safe Output Strategy Insights

Discussion-First Approach: 35% of workflows prefer Discussions for long-form reports and analysis
Code Changes via PRs: 25% propose actual code modifications through pull requests
Conversational Agents: 24% use comments for interactive, contextual responses
Issue Tracking: 21% create issues for actionable findings
Multi-Output Workflows: Some workflows use multiple safe output types (e.g., comment first, then create PR)

Example Workflows by Output Type

create-discussion: lockfile-stats, audit-workflows, daily-firewall-report, github-mcp-tools-report
create-pull-request: daily-doc-updater, security-fix-pr, unbloat-docs, semantic-function-refactor
add-comment: archie, scout, dev-hawk, ci-doctor
create-issue: issue-classifier, smoke-detector, duplicate-code-detector

Structural Characteristics

Job Complexity

Workflows in this repository exhibit moderate to high complexity with multi-job orchestration:

Average Jobs per Workflow: 5.37 jobs
Average Steps per Job: 10.45 steps (56.14 total steps / 5.37 jobs)
Total Steps per Workflow: 56.14 steps (average)
Maximum Jobs in Single Workflow: 14 jobs
Minimum Jobs: 2 jobs
Maximum Steps: 101 steps
Minimum Steps: 26 steps

Typical Lock File Structure

Based on statistical analysis, a typical .lock.yml file has:

Size: ~207 KB
Jobs: ~5-6 jobs
Steps per Job: ~10 steps
Total Steps: ~56 steps
Permissions: Read access to contents, issues, PRs; write access to specific resources
Triggers: workflow_dispatch + schedule (most common)
Timeout: 30-60 minutes (typical)
Safe Outputs: 1-2 output mechanisms

Job Orchestration Patterns

Workflows typically follow these job patterns:

pre_activation: Initial checks and environment setup
activation: Workflow trigger validation and context extraction
agent: Core AI agent execution with LLM calls
detection: Analysis, parsing, or detection logic
safe_output: Create discussion/issue/comment/PR with results
missing_tool: Report missing tool functionality
update_reaction: Update GitHub reactions for status

Complexity Distribution

Most workflows (>70%) are moderately complex with:

4-7 jobs handling different phases
40-70 total steps across all jobs
Multi-stage orchestration with job dependencies
Conditional execution based on triggers

Permission Patterns

Agentic workflows follow a least-privilege model with specific permissions per job.

Most Common Permissions

Permission	Occurrences	Typical Access Level
contents	67	read (mostly), write (for PRs)
pull-requests	62	read/write (for PR operations)
issues	61	read/write (for issue management)
actions	25	read (for workflow metadata)
discussions	7	write (for posting results)
security-events	4	read (for security analysis)
repository-projects	3	read (for project access)
statuses, checks, deployments	2 each	read/write varies

Permission Distribution Analysis

Read-Heavy: Most workflows request read access to contents, issues, and PRs
Selective Write: Write permissions limited to specific jobs (e.g., safe output jobs)
Minimal Global Permissions: Top-level permissions are restrictive; jobs override as needed
Security-Conscious: Only 4 workflows access security-events (those doing security analysis)

Typical Permission Patterns

Interactive Agents (archie, scout):
- Top-level: contents: read, issues: read, pull-requests: read
- Agent job: Same read permissions
- Output job: issues: write, pull-requests: write, discussions: write
PR Creation Workflows:
- Top-level: contents: read
- Create-PR job: contents: write, pull-requests: write
Analysis/Audit Workflows:
- Top-level: contents: read, actions: read
- Output job: discussions: write

Tool & MCP Patterns

MCP Server Usage

The GitHub MCP server is ubiquitous across workflows, providing GitHub API interaction capabilities.

MCP Server	Tool Mentions	Workflows
mcp__github__*	199+	Virtually all 71 workflows

Most Common GitHub MCP Tools:

mcp__github__search_* (code, issues, PRs, users, repos)
mcp__github__pull_request_read
mcp__github__list_* (workflows, commits, branches, releases, issues, PRs)
mcp__github__issue_read
mcp__github__get_* (commit, file_contents, me, pull_request, etc.)

Tool Configuration Patterns

Based on the lock file structure, typical tool configurations include:

GitHub Tools: Enabled in nearly all workflows for repository interaction
Bash Tools: Available for file operations, git commands, build processes
Web Tools: WebFetch and WebSearch for research and external data
Task Tool: For launching sub-agents and complex multi-step operations
File Tools: Read, Write, Edit, Glob, Grep for codebase manipulation

MCP Server Insights

GitHub Dominance: The GitHub MCP server is the primary tool provider
Single-Server Strategy: Most workflows rely on one primary MCP server (GitHub)
Shared Configurations: MCP configs are shared via imports (e.g., shared/mcp/serena.md)
Custom MCPs: A few workflows reference custom/experimental MCPs (arxiv, context7)

Interesting Findings

1. Workflow_Dispatch Ubiquity

82% of workflows support manual triggering, suggesting a developer-first approach where engineers want control over when agents run. This contrasts with traditional CI/CD where automatic triggers dominate.

2. Large File Sizes Indicate Rich Prompts

The average 207 KB file size is substantial for YAML. This suggests workflows embed:

Extensive agent system prompts (multi-thousand token instructions)
Detailed security guidelines
Complex permission matrices
Comprehensive MCP server configurations

3. "Audits" Discussion Category Dominance

The "audits" category is used by 14+ workflows, making it the primary destination for automated reports. This suggests the repository uses Discussions as a centralized reporting dashboard.

4. Multi-Output Workflows Are Rare

Most workflows use a single safe output mechanism. Only 2 workflows (lockfile-stats, poem-bot) use update-issue, suggesting workflows prefer creating new content over updating existing content.

5. Comment-Activated Agents

17% of workflows respond to comments, enabling conversational UX where developers invoke agents via slash commands (/archie, /scout) similar to Slack bots.

6. High-Frequency Monitoring

3 workflows run every 10 minutes (0/10 * * * *), indicating real-time monitoring needs for critical infrastructure or firewall protection.

7. Job Naming Conventions

Common job names follow a pattern: pre_activation, activation, agent, detection, add_comment, create_discussion, missing_tool, update_reaction. This suggests a standardized workflow template.

8. Shared MCP Configurations

Multiple workflows import from shared/mcp/serena.md, indicating a reusable MCP configuration strategy that reduces duplication.

9. Minimal Push Triggers

Only 7% use push triggers, suggesting workflows are designed for analysis and reporting rather than traditional CI/CD build/test automation.

10. Concurrency Control

Most workflows use concurrency groups like gh-aw-${{ github.workflow }}-${{ github.event.issue.number }} to prevent multiple simultaneous runs on the same issue/PR, ensuring sequential processing.

Recommendations

Based on this analysis, here are recommendations for workflow authors and repository maintainers:

1. Standardize Discussion Categories

Consolidate "audits" and "Audits" to a single category (lowercase recommended) to avoid fragmentation.

2. Consider File Size Optimization

At 207 KB average, lock files are substantial. Consider:

Extracting common prompts to shared imports
Using external references for lengthy instructions
Compressing repeated patterns

3. Document Trigger Combinations

Create a guide explaining when to use:

workflow_dispatch only (manual tasks)
schedule only (periodic reports)
workflow_dispatch + schedule (flexible automation)
issues + issue_comment (interactive agents)

4. Expand MCP Server Usage

While GitHub MCP dominates, consider integrating additional MCP servers for:

External APIs (Notion, Slack, etc.)
Database access
Cloud provider APIs
Specialized analysis tools

5. Template Workflows

Given the consistent job naming patterns, create template workflows with:

Standard job structure
Reusable activation logic
Common permission patterns
Shared MCP configurations

6. Monitor Large Workflows

poem-bot.lock.yml (381 KB) is notably large. Review for:

Optimization opportunities
Prompt refactoring
Splitting into multiple workflows

7. Enhance Update-Issue Pattern

Only 2 workflows use update-issue. This could be useful for:

Progressive updates during long-running analysis
Status dashboards
Tracking multi-stage processes

8. Security Permissions Review

Only 4 workflows access security-events. Consider if more workflows should integrate security scanning for:

Dependabot alerts
Code scanning findings
Secret scanning results

Methodology

Analysis Tools & Approach

Bash Scripts: File discovery, text processing, pattern extraction
Python: Statistical analysis, JSON data aggregation
YAML Parsing: Structure analysis via awk, sed, grep
Manual Inspection: Validation of automated findings

Data Sources

71 .lock.yml files from .github/workflows/ and subdirectories
Generated by gh-aw compile from corresponding .md files
Analyzed on November 6, 2025

Cache Memory

Analysis scripts and historical data stored in /tmp/gh-aw/cache-memory/:

scripts/: Reusable analysis scripts
history/2025-11-06-analysis.json: Aggregated statistics
patterns/: Extracted patterns for future analysis

Lock Files Analyzed

All 71 lock files in the repository, including:

Core workflows (dev, firewall, plan, research, etc.)
Daily automation (daily-news, daily-firewall-report, daily-doc-updater, etc.)
Interactive agents (archie, scout, dev-hawk, etc.)
Testing workflows (smoke-, test-)
Specialized tools (pdf-summary, video-analyzer, python-data-charts, etc.)
Shared configurations (shared/mcp/*, shared/opencode.lock.yml)

Generated by Lockfile Statistics Analysis Agent
Analysis Date: November 6, 2025
Total Files Analyzed: 71
Repository: githubnext/gh-aw

AI generated by Lockfile Statistics Analysis Agent

2025-11-28T23:05:03Z

github-actions[bot]
bot Nov 28, 2025
Author

This discussion was automatically closed because it was created by an agentic workflow more than 1 week ago.

0 replies