[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1497

2026-03-29T22:21:32Z

github-actions[bot]
bot Mar 29, 2026

📊 Current CI/CD Pipeline Status

The repository has a well-structured, multi-layered CI/CD pipeline with 40+ workflow files. The pipeline is healthy and actively maintained, covering builds, linting, security, integration tests, and AI-assisted PR review. All compiled agentic workflows (.lock.yml) are up-to-date.

Workflow inventory summary:

Category	Count	PR-triggered
Standard GitHub Actions (`.yml`)	18	Yes (most)
Agentic workflows (`.lock.yml`)	22	Varies

✅ Existing Quality Gates

The following checks currently run on pull requests:

Check	Workflow	Scope
TypeScript build (Node 20 & 22 matrix)	`build.yml`	All PRs
ESLint	`lint.yml` + `build.yml`	All PRs
Markdown lint	`lint.yml`	All PRs
TypeScript strict type check	`test-integration.yml` (alias)	All PRs
Unit tests + coverage report	`test-coverage.yml`	All PRs (non-MD)
Coverage regression detection	`test-coverage.yml`	PRs vs. base
Integration tests (5 jobs: domain, network, protocol, container, API proxy)	`test-integration-suite.yml`	All PRs
Chroot integration tests	`test-chroot.yml`	All PRs
Examples tests	`test-examples.yml`	All PRs (non-MD)
Setup action test	`test-action.yml`	All PRs
CodeQL (JS/TS + Actions)	`codeql.yml`	All PRs
npm audit (high/critical)	`dependency-audit.yml`	All PRs (non-MD)
PR title semantic convention	`pr-title.yml`	All PRs
Documentation link check	`link-check.yml`	PRs touching `*.md`
AI security review	`security-guard.lock.yml`	All PRs
AI build/test suite	`build-test.lock.yml`	All PRs
Documentation build preview	`docs-preview.yml`	PRs touching docs

Scheduled/non-PR checks: weekly performance benchmarks, daily dependency monitoring, daily security review, weekly CLI flag consistency checker, hourly secret digger workflows.

🔍 Identified Gaps

🔴 High Priority

1. Critically low unit test coverage with dangerously low thresholds

The two most security-critical files in the codebase have near-zero unit test coverage:

cli.ts (entry point): 0% coverage — zero tests
docker-manager.ts (container orchestration): 18% statement coverage, 4% function coverage

The global thresholds in jest.config.js are set to 30–38%, which formally allows 0% on individual files. This means entire CLI code paths (argument parsing, signal handling, cleanup logic) and all container lifecycle code can silently regress with no unit-level detection.

2. Performance benchmarks do not run on PRs

performance-monitor.yml runs only on a weekly schedule (Mondays at 06:00 UTC). There is no per-PR performance comparison, meaning a PR could double container startup time or introduce a memory leak that only surfaces the following week. The benchmark script exists (scripts/ci/benchmark-performance.ts) — it just isn't wired to PRs.

3. No container/Docker image security scanning

dependency-audit.yml audits npm packages, but no workflow scans Docker container images for vulnerabilities in the base OS layers (ubuntu:22.04, ubuntu/squid). A CVE in the base image or a system package would go undetected until a user reports it. Tools like Trivy or Grype can scan built images.

4. No shell script linting (ShellCheck)

The containers/agent/ directory contains 6 security-critical shell scripts (entrypoint.sh, setup-iptables.sh, docker-stub.sh, etc.) that handle iptables rules, chroot, and credential isolation. No ShellCheck or equivalent linter runs in CI. Shell bugs in these files are a direct attack surface for container escapes or credential leaks.

🟡 Medium Priority

5. Smoke tests for real agents are not automatically required on every PR

smoke-claude.lock.yml, smoke-codex.lock.yml, and smoke-copilot.lock.yml require a specific emoji reaction to trigger on PRs (❤️, 🎉, 👀 respectively). Only smoke-chroot.lock.yml runs automatically on PRs touching src/** or containers/**. End-to-end agent smoke tests are therefore skipped on most PRs, meaning a breaking change in proxy configuration or entrypoint could ship without being caught by the real-agent path.

6. No per-file coverage thresholds for high-risk modules

jest.config.js only enforces global thresholds. There are no per-file or per-module minimums. The security-critical modules (docker-manager.ts, host-iptables.ts, cli.ts) could remain at 0% indefinitely as long as the global average stays above the low watermark.

7. Integration tests not confirmed as required branch protection status checks

The test-integration-suite.yml and test-chroot.yml workflows run on PRs, but there is no evidence they are configured as required status checks in branch protection rules. If they're optional, a PR author (or Copilot agent) could merge despite integration test failures.

8. No static analysis for `redact-secrets.ts` / DLP code paths

src/redact-secrets.ts and src/dlp.ts are security-sensitive modules handling secret redaction and DLP URL scanning. dlp.ts has no corresponding test file at all, and redact-secrets.ts lacks visible coverage. Regressions here would directly impact the security promise of the firewall.

9. Documentation preview build only runs for `docs-site/**` changes

docs-preview.yml only triggers when docs-site/**, docs/**, *.md, or the workflow file itself changes. Changes to TypeScript source that affect documented CLI behavior (e.g., new flags) do not validate the docs build.

🟢 Low Priority

10. No multi-architecture (arm64) container build testing

All CI runs on ubuntu-latest (x86_64). The Dockerfile-based containers are not built or tested for linux/arm64. Users running AWF on Apple Silicon-based GitHub-hosted runners or self-hosted ARM machines may encounter silent architecture incompatibilities.

11. No SBOM (Software Bill of Materials) generation

No workflow generates an SBOM for container images or the npm package. SBOM generation is increasingly expected for security-sensitive tooling distributed as container images and is required by some enterprise consumers (SLSA level 2+).

12. Performance benchmarks use unpinned action versions

performance-monitor.yml uses actions/checkout@v4, actions/setup-node@v4, actions/upload-artifact@v4, and actions/github-script@v7 without SHA pinning — inconsistent with the rest of the repository where all actions are pinned to full commit SHAs.

13. No mutation testing

The high reported coverage numbers for squid-config.ts (100%) and logger.ts (100%) are not validated by mutation testing. Mutation testing (e.g., Stryker) would verify that tests actually detect logic regressions rather than just execute the code path.

📋 Actionable Recommendations

Rec 1 — Add per-file coverage thresholds for high-risk modules

Issue: cli.ts at 0%, docker-manager.ts at 18%.
Solution: Add coverageThreshold entries in jest.config.js for individual files:

"./src/cli.ts": { lines: 50, functions: 50, branches: 30 },
"./src/docker-manager.ts": { lines: 40, functions: 40, branches: 25 },

Complexity: Low
Impact: High — forces incremental coverage improvement without blocking current work

Rec 2 — Add container image scanning with Trivy

Issue: No Docker image vulnerability scanning.
Solution: Add a step to build.yml (or a dedicated container-scan.yml) after container builds:

- uses: aquasecurity/trivy-action@(sha)
  with:
    image-ref: 'ghcr.io/github/gh-aw-firewall/agent:latest'
    format: 'sarif'
    output: 'trivy-results.sarif'
    severity: 'HIGH,CRITICAL'
- uses: github/codeql-action/upload-sarif@(sha)
  with:
    sarif_file: 'trivy-results.sarif'

Complexity: Low
Impact: High — fills a significant gap in container supply chain security

Rec 3 — Add ShellCheck to CI

Issue: No shell script linting on security-critical container scripts.
Solution: Add a job to lint.yml:

shellcheck:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@(sha)
    - run: shellcheck containers/agent/*.sh scripts/ci/*.sh

Complexity: Low
Impact: High — catches common shell bugs in iptables and chroot scripts

Rec 4 — Wire performance benchmarks to PRs

Issue: No performance comparison between PR and base.
Solution: Add a pull_request trigger to performance-monitor.yml (or a separate performance-pr.yml). Run benchmarks on both the PR branch and base, then post a comparison comment. Even running without regression gating is valuable for visibility.
Complexity: Medium
Impact: Medium — prevents silent performance regressions shipping in minor PRs

Rec 5 — Enforce required status checks for integration tests

Issue: Integration tests may not be required for merging.
Solution: In repository Settings → Branches → Branch protection for main, ensure Integration Tests / Domain Tests, Integration Tests / Network Tests, Chroot Integration Tests / ... etc. are listed as required status checks.
Complexity: Low (configuration, no code change)
Impact: High — ensures integration failures block merges

Rec 6 — Automatically trigger one smoke test on every PR

Issue: Smoke tests require manual emoji reactions.
Solution: Change smoke-chroot.lock.yml path filter to cover all source changes (it already does), and optionally configure one of the agent smoke tests to run automatically on PRs touching src/** or containers/**. Consider routing to smoke-copilot (lightest agent) as the default automatic smoke test.
Complexity: Medium
Impact: Medium — catches proxy/entrypoint breakage before merge

Rec 7 — Add `dlp.ts` and `redact-secrets.ts` unit tests

Issue: Zero visible test coverage for DLP and secret redaction modules.
Solution: Create src/dlp.test.ts and src/redact-secrets.test.ts testing key public APIs. These are security-invariant modules where regressions are most dangerous.
Complexity: Medium
Impact: High — DLP and redaction are core security features

Rec 8 — Pin action versions in `performance-monitor.yml`

Issue: Unpinned action SHAs break supply chain consistency.
Solution: Replace @v4/@v7 references with full SHA pins, matching the pattern used in all other workflows.
Complexity: Low
Impact: Low — consistency and supply chain hardening

📈 Metrics Summary

Metric	Value
Total workflows	40 (18 standard + 22 agentic)
Workflows running on every PR	~12 standard workflows
Agentic workflows running on every PR	2 (Security Guard, Build Test Suite)
Unit test files	14
Integration test files	34
Overall unit test statement coverage	38.4%
Coverage thresholds	Branches: 30%, Functions: 35%, Lines/Statements: 38%
`cli.ts` coverage	0%
`docker-manager.ts` coverage	18%
Container image scanning	❌ None
Shell script linting	❌ None
Per-PR performance benchmarking	❌ None (weekly only)
Per-file coverage thresholds	❌ None (global only)
CodeQL scanning	✅ JS/TS + GitHub Actions
npm dependency audit	✅ High/critical blocking
PR title enforcement	✅ Conventional commits
AI-assisted PR security review	✅ Claude (Security Guard)

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Apr 5, 2026, 10:21 PM UTC

2026-03-30T02:53:48Z

github-actions[bot]
bot Mar 30, 2026
Author

🔮 The ancient spirits stir in the wires.
The smoke-test oracle has passed through this chamber,
and the runes of validation were cast at workflow run 23725710566.
May the firewall wards hold and the signals remain true.

🔮 The oracle has spoken through Smoke Codex

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1497

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1497

Uh oh!

github-actions[bot] bot Mar 29, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

🔍 Identified Gaps

🔴 High Priority

1. Critically low unit test coverage with dangerously low thresholds

2. Performance benchmarks do not run on PRs

3. No container/Docker image security scanning

4. No shell script linting (ShellCheck)

🟡 Medium Priority

5. Smoke tests for real agents are not automatically required on every PR

6. No per-file coverage thresholds for high-risk modules

7. Integration tests not confirmed as required branch protection status checks

8. No static analysis for redact-secrets.ts / DLP code paths

9. Documentation preview build only runs for docs-site/** changes

🟢 Low Priority

10. No multi-architecture (arm64) container build testing

11. No SBOM (Software Bill of Materials) generation

12. Performance benchmarks use unpinned action versions

13. No mutation testing

📋 Actionable Recommendations

Rec 1 — Add per-file coverage thresholds for high-risk modules

Rec 2 — Add container image scanning with Trivy

Rec 3 — Add ShellCheck to CI

Rec 4 — Wire performance benchmarks to PRs

Rec 5 — Enforce required status checks for integration tests

Rec 6 — Automatically trigger one smoke test on every PR

Rec 7 — Add dlp.ts and redact-secrets.ts unit tests

Rec 8 — Pin action versions in performance-monitor.yml

📈 Metrics Summary

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 30, 2026 Author

github-actions[bot]
bot Mar 29, 2026

8. No static analysis for `redact-secrets.ts` / DLP code paths

9. Documentation preview build only runs for `docs-site/**` changes

Rec 7 — Add `dlp.ts` and `redact-secrets.ts` unit tests

Rec 8 — Pin action versions in `performance-monitor.yml`

github-actions[bot]
bot Mar 30, 2026
Author