[Schema Consistency] Schema Constraint Enforcement Gap Analysis - December 15, 2025 #6460
Closed
Replies: 1 comment
-
|
⚓ Avast! This discussion be marked as outdated by Schema Consistency Checker. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
🔍 Schema Consistency Check - December 15, 2025
This analysis reveals a systematic under-constraint problem: the schema accepts values that will fail at runtime or violate external API limits. With zero maxLength constraints on 1,888 string fields and zero maxItems on 1,608 arrays, users can inadvertently create configurations that pass schema validation but fail during execution. This creates a validation gap where external tools cannot catch errors that gh-aw will encounter at runtime.
The findings complement previous analyses (Strategy-023 found schema too strict on 'required' fields, Strategy-024 finds schema too permissive on value bounds) and reveal opportunities to prevent resource exhaustion, DOS attacks, and API rejections through stricter schema validation.
Full Report Details
Summary
Critical Issues
1.⚠️ ZERO maxLength Constraints on ANY String Field
Severity: CRITICAL - 100% of strings unbounded
Issue: The schema has 1,888 string fields, and EXACTLY ZERO have maxLength constraints. This allows unbounded strings that violate GitHub API limits.
Evidence:
Total string fields: 1,888 String fields with maxLength: 0 (ZERO!) Unconstrained strings: 1,074 (57% have NO constraints at all)GitHub API Limits NOT Enforced:
Real Impact Example:
Files Affected:
pkg/parser/schemas/main_workflow_schema.json- All string field definitionspkg/parser/schemas/included_file_schema.json- All string fieldspkg/parser/schemas/mcp_config_schema.json- All string fieldsImpact:
Recommendation:
Add maxLength to all user-facing string fields:
2.⚠️ ZERO maxItems Constraints on ANY Array
Severity: CRITICAL - DOS attack risk
Issue: The schema has 1,608 array fields, and EXACTLY ZERO have maxItems constraints. This creates DOS vulnerability through unbounded arrays.
Evidence:
Total arrays: 1,608 Arrays with maxItems: 0 (ZERO!) Arrays without minItems: 1,366 (85% could be empty)Critical Unbounded Arrays:
network: Domain allowlist (unlimited domains = slow validation)labels: Workflow labels (unlimited labels = parsing overhead)@import: Import specifications (unlimited imports = import explosion)branches: Branch filters (unlimited = regex overhead)paths: Path filters (unlimited = filesystem traversal overhead)DOS Attack Scenario:
Impact:
Recommendation:
Add maxItems to resource-intensive arrays:
3.⚠️ 32 Numeric Fields Without Maximum Bounds
Severity: HIGH - Resource exhaustion risk
Issue: 57% of numeric fields (32 out of 56) lack maximum constraints, allowing absurd values like
timeout: 999999999(31 years).Critical Unbounded Fields:
timeoutstartup-timeoutmax-turnstimeout-minutesmax(safe-outputs)expires(artifacts)Real Examples from Schema:
Impact Scenarios:
Timeout Overflow:
API Spam:
Typo Amplification:
GitHub Actions Limits Not Enforced:
timeout-minutes: max 360 (6 hours) ❌ Schema allows unlimitedRecommendation:
Add sensible maximum constraints aligned with platform limits:
4.⚠️ Empty Arrays Accepted Where Invalid
Severity: MEDIUM - Undefined behavior
Issue: 109 arrays lack
minItems: 1constraint, allowing empty arrays that cause runtime errors or undefined behavior.Evidence:
Arrays without minItems: 109 Arrays that SHOULD require at least one item but don't: MultipleProblematic Examples:
Good Counter-Example (schedule has minItems: 1):
Impact:
branches: []- does it filter nothing? allow all?types: []- which events trigger? none?Recommendation:
Add
minItems: 1to arrays that must have values:Documentation Gaps
5. campaign Field: minLength but NO maxLength
Issue: The
campaignfield requires minimum 8 characters but has no maximum limit.Schema Definition:
Impact:
Recommendation: Add
"maxLength": 64(sensible identifier length)6. Pattern Constraints Underutilized
Issue: Only 5% of string fields use pattern constraints for format validation.
Evidence:
Fields That SHOULD Have Patterns:
name: Workflow names (should match^[a-zA-Z0-9_-]+$)command: Command names (should match^[a-zA-Z][a-zA-Z0-9_-]*$)Good Examples Where Patterns ARE Used:
Recommendation: Add patterns to user-facing identifier and name fields.
7. Numeric Type Permissiveness
Issue: Many fields accept
number(float) when they should beinteger, causing silent truncation.Examples:
Impact:
Recommendation: Use
"type": "integer"for countable values (timeout, max, expires) with clear units in description.8. Mutually Exclusive Arrays Not Enforced
Issue: Schema comments say
branchesandbranches-ignoreare mutually exclusive, but schema doesn't enforce it.Schema $comment:
Problem: Schema allows BOTH simultaneously:
Impact: Runtime behavior undefined if both specified.
Recommendation: Add oneOf constraint:
9. Format Constraints Completely Unused
Issue: JSON Schema supports
formatconstraints (uri, email, date-time), but gh-aw schemas use ZERO format constraints.Evidence:
String fields with format constraint: 0 (ZERO!)JSON Schema Formats Available:
"format": "uri"- URL validation"format": "email"- Email validation"format": "date-time"- ISO 8601 dates"format": "regex"- Regular expressionsFields That Should Use format:
Impact: Format validation is deferred to parser/compiler instead of being declarative in schema.
Recommendation: Add format constraints to URL and email fields for better IDE integration.
Schema Improvements Needed
Constraint Gap Summary Table
Key Insight: Three constraint types have 100% gap (maxLength, maxItems, format) and three have >80% gap (maximum 57%, minItems 85%, pattern 95%). This is a systematic under-constraint problem, not isolated cases.
Parser Updates Required
Finding: Parser does NOT enforce constraints that schema lacks (which is correct behavior).
Validation Analysis:
Key Files:
pkg/parser/frontmatter.go- Frontmatter parsing (no extra length checks)pkg/workflow/compiler.go- Main compilation (no extra bound checks)pkg/workflow/safe_outputs.go- Safe-outputs (no max limit validation)pkg/workflow/tools.go- Tools config (no array size validation)Evidence: No length/bound validation in parser code beyond schema:
Conclusion: Parser correctly trusts schema. Schema improvements will automatically improve runtime validation.
Workflow Violations
Analysis: Checked 78 production workflows in
.github/workflows/*.mdfor constraint violations.Findings: NO violations found (workflows stay within sensible bounds)
Actual Values in Production:
Insight: Real workflows are conservative and reasonable. The schema constraint gaps are not currently exploited, but they CREATE RISK for future workflows and external contributions.
No Problematic Patterns Detected:
Risk: Absence of violations doesn't mean absence of risk. Schema should prevent invalid values proactively.
Recommendations
Priority 1: Add Maximum Constraints (32 numeric fields)
Critical Fields:
Impact: Prevents resource exhaustion and catches typos before runtime.
Priority 2: Add maxLength to Strings (1,888 fields)
GitHub API-Facing Fields (highest priority):
Impact: Prevents GitHub API rejections at validation time instead of runtime.
Priority 3: Add maxItems to Arrays (1,608 arrays)
Resource-Intensive Arrays:
Impact: Prevents DOS attacks and performance degradation.
Priority 4: Add minItems: 1 (109 arrays)
Arrays That Must Have Values:
Impact: Prevents undefined behavior from empty arrays.
Priority 5: Add format Constraints
URL and Email Fields:
Impact: Better IDE integration and validation tooling support.
Positive Findings
✅ minItems Used Correctly for Critical Arrays
Good Examples:
The
schedulearray correctly requires at least one cron expression. This prevents the common mistake of empty schedule arrays.✅ Pattern Constraints Are Excellent Where Used
High-Quality Patterns:
The patterns that DO exist are well-crafted, RFC-compliant, and precise. The issue is not pattern quality but pattern coverage (only 5% of strings).
✅ Minimum Constraints Used Consistently
Good Examples:
All timeout and count fields correctly enforce
minimum: 1to prevent obviously invalid values (zero or negative).Strategy Performance
Unique Value:
Key Metrics:
Methodology Strengths:
Next Steps
Immediate Actions
Add maximum constraints to 32 numeric fields (Priority 1)
Add maxLength to GitHub API-facing string fields (Priority 2)
Add maxItems to network and import arrays (Priority 3)
@import: 50, labels: 20Schema Enhancement Process
Total Effort: ~10-15 hours to address all findings
Testing Requirements
After schema changes:
Comparison with Previous Findings
Strategy-023 (Dec 14): Schema too STRICT
Strategy-024 (Dec 15): Schema too PERMISSIVE
Combined Insight: Schema diverges from runtime reality in BOTH directions:
Resolution Path: Both findings require schema updates to match runtime behavior.
References:
pkg/parser/schemas/main_workflow_schema.json- Main schema file (3,552 fields analyzed)pkg/parser/schemas/included_file_schema.json- Included file schemapkg/parser/schemas/mcp_config_schema.json- MCP configuration schemaBeta Was this translation helpful? Give feedback.
All reactions