[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1583

2026-04-01T22:25:51Z

github-actions[bot]
bot Apr 1, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature and comprehensive CI/CD pipeline with 40+ workflows spanning build verification, security scanning, integration testing, and AI-powered quality checks. Recent runs on main show a healthy baseline: Build Verification ✅, Lint ✅, TypeScript Type Check ✅, Integration Tests ✅, Chroot Integration Tests ✅, Dependency Vulnerability Audit ✅, CodeQL ✅, Test Coverage ✅, Examples Test ✅, and Test Setup Action ✅. The only consistent failure observed is Daily Token Usage Analyzer.

✅ Existing Quality Gates

On Every PR

Check	Workflow	Scope
ESLint + Markdownlint	`lint.yml`	TypeScript & Markdown
TypeScript type check	`build.yml` (via `npm run type-check`)	Full `src/`
Build verification	`build.yml`	Node 20 & 22 matrix
API proxy unit tests	`build.yml`	`containers/api-proxy/`
Unit test coverage	`test-coverage.yml`	~38% statement coverage enforced
Coverage regression gate	`test-coverage.yml`	Fails if coverage drops
Integration tests	`test-integration.yml`	34 test suites in Docker
Chroot integration tests	`test-chroot.yml`	Multi-language chroot support
Examples tests	`test-examples.yml`	Shell script examples
Setup action test	`test-action.yml`	`action.yml` install flow
Semantic PR title	`pr-title.yml`	Conventional commits enforcement
CodeQL analysis	`codeql.yml`	JavaScript/TypeScript + Actions
Dependency audit	`dependency-audit.yml`	npm audit (high/critical → fail)
AI security review	`security-guard.md` (Claude)	Domain boundary / security posture
AI build test suite	`build-test.md` (Copilot)	Cross-runtime builds (Bun, Deno, Hono…)
Smoke tests	`smoke-claude.md`, `smoke-copilot.md`, `smoke-codex.md`, `smoke-chroot.md`	End-to-end agent validation
Link check	`link-check.yml`	Triggered on `*.md` changes only

On Schedule

Weekly performance benchmarks (performance-monitor.yml)
Daily security review + dependency monitoring (agentic workflows)
Hourly secret scanning (Claude, Codex, Copilot)
Weekly test coverage improvement suggestions

🔍 Identified Gaps

🔴 High Priority

1. Critically Low Test Coverage on Core Files

The two most important source files have dangerously low unit test coverage:

docker-manager.ts: 18% statements, 4% functions (250 statements, 25 functions) — this is the primary orchestration layer
cli.ts: 0% coverage entirely (69 statements, 10 functions) — the CLI entry point

The global coverage threshold is set at only 38% (statements/lines) and 30% (branches), which is far below industry standards of 70–80%. These thresholds effectively institutionalize low coverage.

2. Coverage Regression Check Does Not Block PRs on Low Absolute Coverage

In test-coverage.yml, the comparison step uses continue-on-error: true and only fires a failure if coverage regresses from the PR base. There is no gate that prevents merging code with < N% absolute coverage. A PR adding all new code to cli.ts with 0% coverage will never fail the coverage check because it starts from 0%.

3. No Container Image Security Scanning on PRs

dependency-audit.yml audits Node.js package manifests, and CodeQL scans TypeScript. However, there is no Trivy or Grype scan of the Docker images (containers/squid/, containers/agent/, containers/api-proxy/) on PRs. Container OS-level CVEs (e.g., in ubuntu:22.04, ubuntu/squid:latest) are never caught before merge.

🟡 Medium Priority

4. Performance Benchmarks Not Gated on PRs

performance-monitor.yml runs benchmarks weekly only (Monday 06:00 UTC). Performance regressions introduced in a PR are only discovered up to a week after merge. There is no PR-time baseline comparison.

5. No Mutation Testing

Test coverage percentages measure line execution but not test quality. A test suite that never asserts anything would still show 100% coverage. Adding mutation testing (e.g., Stryker for TypeScript) would reveal whether tests actually catch regressions.

6. Smoke Tests Are Role-Gated and Reaction-Based, Not Mandatory

smoke-claude.md, smoke-copilot.md, smoke-codex.md run on PRs, but they require roles: all with the trigger also being a reaction (heart / eyes / hooray). These run on every PR open/sync/reopen event, but they consume AI credits — it's worth validating whether a "required" status check is configured for smoke tests in branch protection rules.

7. Link Check Only Triggers on Markdown File Changes

link-check.yml has a paths: ['**/*.md', '.github/lychee.toml'] filter. A PR that adds a new broken URL in a code comment or TypeScript source file will never trigger a link check.

8. No Enforced Test File Naming/Co-location Convention

jest.config.js roots tests to src/ but 34 integration tests live under tests/integration/ which is excluded from unit test coverage collection. There is no CI guard ensuring that new source files in src/ have a corresponding .test.ts file.

🟢 Low Priority

9. No dist/ Artifact Size Monitoring on PRs

There is no check that warns or blocks when dist/ bundle size increases significantly. A PR that accidentally bundles a large dependency would be silently merged.

10. No Automated License Compatibility Check

There is no license-checker step that validates newly added npm dependencies are compatible with the project's license (MIT). A PR introducing a GPL dependency would pass all CI checks.

11. Performance Monitor Uses Unpinned Actions

performance-monitor.yml uses actions/checkout@v4, actions/setup-node@v4, actions/upload-artifact@v4, actions/github-script@v7 — all unpinned mutable tag references. All other workflows in this repo use SHA-pinned action references (e.g., actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd). This is a supply chain security inconsistency.

12. No Automated CHANGELOG / Release Notes Verification

There is no check on PRs that categorizes whether a change requires a CHANGELOG entry or validates that release notes are updated for user-visible changes.

📋 Actionable Recommendations

Gap 1 & 2: Raise and Enforce Absolute Coverage Thresholds

Issue: Core files (docker-manager.ts, cli.ts) have near-zero coverage and thresholds are too permissive.

Solution:

Raise global thresholds in jest.config.js incrementally (target: 60% statements, 50% branches within 3 months)
Add per-file coverageThreshold overrides to enforce minimums on critical files:

coverageThreshold: {
  global: { branches: 30, functions: 35, lines: 38, statements: 38 },
  './src/docker-manager.ts': { statements: 25, functions: 20 },
  './src/cli.ts': { statements: 10 },
}

Remove continue-on-error: true from the comparison step in test-coverage.yml

Complexity: Low | Impact: High

Gap 3: Add Container Image Vulnerability Scanning

Issue: No Docker image CVE scanning on PRs.

Solution: Add a new workflow step (or standalone workflow) using Trivy:

- name: Scan agent container image
  uses: aquasecurity/trivy-action@(sha)
  with:
    image-ref: 'ghcr.io/github/gh-aw-firewall/agent:latest'
    format: 'sarif'
    output: 'trivy-agent.sarif'
    severity: 'HIGH,CRITICAL'
    exit-code: '1'

Complexity: Low | Impact: High

Gap 4: Add PR-Time Performance Regression Check

Issue: Performance only measured weekly.

Solution: Add a lightweight benchmark step to build.yml that runs a subset of benchmarks (e.g., startup time only) and comments on the PR if it exceeds a threshold. The existing scripts/ci/benchmark-performance.ts infrastructure can be reused.

Complexity: Medium | Impact: Medium

Gap 5: Add Mutation Testing

Issue: Coverage metrics don't validate test quality.

Solution: Integrate [Stryker Mutator]((strykermutator.io/redacted) for TypeScript. Run on a weekly schedule rather than every PR to manage CI time.

Complexity: Medium | Impact: Medium

Gap 11: Pin Actions in `performance-monitor.yml`

Issue: Unpinned action references create supply chain risk.

Solution: Replace @v4/@v7 tags with SHA digests, matching the pattern used by all other workflows in the repo.

Complexity: Low | Impact: Medium (security best practice consistency)

Gap 10: Add License Compatibility Check

Issue: GPL or incompatible dependencies could be silently introduced.

Solution: Add to dependency-audit.yml:

- name: Check license compatibility
  run: npx license-checker --production --failOn "GPL;AGPL;LGPL" --summary

Complexity: Low | Impact: Low-Medium

📈 Metrics Summary

Metric	Value
Total workflows	40+ (24 agentic `.md` + ~18 YAML)
Workflows running on PRs	15+
Recent main-branch success rate	~95% (1 known failing: token-usage-analyzer)
Unit test coverage — statements	38.39% (threshold: 38%)
Unit test coverage — branches	31.78% (threshold: 30%)
Unit test coverage — functions	37.03% (threshold: 35%)
`docker-manager.ts` coverage	18% statements / 4% functions 🔴
`cli.ts` coverage	0% 🔴
Integration test suites	34
Languages tested in chroot	Python, Go, Java, .NET, Ruby, Rust
Security scans	CodeQL, npm audit (→ SARIF), AI security guard, hourly secret diggers
Performance testing	Weekly only (not PR-gated)
Container image scanning	❌ None

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Apr 8, 2026, 10:25 PM UTC

2026-04-02T03:38:29Z

github-actions[bot]
bot Apr 2, 2026
Author

🔮 The ancient spirits stir in these halls.
By moonlit runes, the smoke-test agent has walked this thread.
The warding circle holds, and the signal is recorded.
So witnessed under starlit logs.

🔮 The oracle has spoken through Smoke Codex

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1583

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1583

Uh oh!

github-actions[bot] bot Apr 1, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

On Every PR

On Schedule

🔍 Identified Gaps

🔴 High Priority

1. Critically Low Test Coverage on Core Files

2. Coverage Regression Check Does Not Block PRs on Low Absolute Coverage

3. No Container Image Security Scanning on PRs

🟡 Medium Priority

4. Performance Benchmarks Not Gated on PRs

5. No Mutation Testing

6. Smoke Tests Are Role-Gated and Reaction-Based, Not Mandatory

7. Link Check Only Triggers on Markdown File Changes

8. No Enforced Test File Naming/Co-location Convention

🟢 Low Priority

9. No dist/ Artifact Size Monitoring on PRs

10. No Automated License Compatibility Check

11. Performance Monitor Uses Unpinned Actions

12. No Automated CHANGELOG / Release Notes Verification

📋 Actionable Recommendations

Gap 1 & 2: Raise and Enforce Absolute Coverage Thresholds

Gap 3: Add Container Image Vulnerability Scanning

Gap 4: Add PR-Time Performance Regression Check

Gap 5: Add Mutation Testing

Gap 11: Pin Actions in performance-monitor.yml

Gap 10: Add License Compatibility Check

📈 Metrics Summary

Replies: 1 comment

Uh oh!

github-actions[bot] bot Apr 2, 2026 Author

github-actions[bot]
bot Apr 1, 2026

Gap 11: Pin Actions in `performance-monitor.yml`

github-actions[bot]
bot Apr 2, 2026
Author