[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1396

2026-03-22T22:19:27Z

github-actions[bot]
bot Mar 22, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature and multi-layered CI/CD pipeline with 39 workflow files covering static analysis, unit/integration tests, container security, AI-agent smoke tests, and agentic maintenance workflows. Core PR gates are healthy and comprehensive, but several meaningful gaps exist that could allow quality regressions to slip through.

✅ Existing Quality Gates

PR-Triggered Checks (run automatically on every PR)

Workflow	File	What It Does
Build Verification	`build.yml`	TypeScript build, ESLint, API proxy unit tests — Node 20 & 22 matrix
Lint	`lint.yml`	ESLint + Markdownlint
TypeScript Type Check	`test-integration.yml`	`tsc --noEmit` strict type check
Test Coverage	`test-coverage.yml`	Jest with coverage, PR comment with diff vs base, regression gate
CodeQL	`codeql.yml`	SAST on JavaScript/TypeScript + GitHub Actions
Dependency Audit	`dependency-audit.yml`	`npm audit` → SARIF upload; blocks on high/critical CVEs
PR Title Check	`pr-title.yml`	Semantic commit title enforced by `action-semantic-pull-request`
Integration Tests	`test-integration-suite.yml`	4 parallel jobs: domain/network, protocol/security, container/ops, API proxy
Chroot Integration Tests	`test-chroot.yml`	Language support, package managers, procfs, edge cases
Examples Test	`test-examples.yml`	Runs live example shell scripts with local container builds
Test Setup Action	`test-action.yml`	Validates the `action.yml` across latest, specific version, and invalid-version cases
Link Check	`link-check.yml`	Lychee dead-link check on Markdown files
Security Guard	`security-guard.md`	AI (Claude) reviews security-relevant diffs and posts findings
Build Test Suite	`build-test.md`	AI (Copilot) exercises multi-language build compatibility (Bun, Deno, etc.)

PR-Triggered Smoke Tests (opt-in via reaction)

Workflow	Trigger	What It Does
`smoke-claude`	`:heart:` reaction	Full Claude agent smoke test inside AWF firewall
`smoke-codex`	`:hooray:` reaction	Full Codex agent smoke test inside AWF firewall
`smoke-copilot`	`:eyes:` reaction	Full Copilot agent smoke test inside AWF firewall
`smoke-chroot`	`:rocket:` reaction	Chroot-specific path-filtered smoke test

Scheduled / Background Checks

Performance Monitor — Weekly benchmarks with regression issue creation
Secret Diggers (Claude, Codex, Copilot) — Hourly secret scanning
Dependency Security Monitor — Daily CVE monitoring
Documentation Maintainer — Daily doc quality patrol
CodeQL — Weekly scheduled SAST scan

🔍 Identified Gaps

🔴 High Priority

1. Critically Low Unit Test Coverage on Core Files

Current state: Overall coverage is ~38% statements, but the two most critical files have near-zero coverage:

cli.ts → 0% (0/69 statements, 0/10 functions)
docker-manager.ts → 18% (45/250 statements, 1/25 functions)

These files contain the entire orchestration logic — container lifecycle, compose generation, API proxy enablement, env var injection, cleanup. A regression in these files would not be caught by unit tests.

2. Coverage Thresholds Are Too Low

Current state: Coverage gates enforce 38% statements, 30% branches, 35% functions — set to match the current (low) baseline rather than a meaningful quality bar. The enforcement stops regressions but does not drive improvement.

3. No Container Image Security Scanning (Trivy/Grype)

Current state: dependency-audit.yml scans npm packages only. The container images (ubuntu/squid, ubuntu:22.04, Node.js base) are never scanned for OS-level CVEs. As a security-focused firewall product, this is a significant gap.

4. Performance Benchmarks Not Run on PRs

Current state: performance-monitor.yml runs only on a weekly schedule. A PR that regresses startup time or proxy latency would not be caught until the next Monday.

5. Several Integration Test Files Have No CI Coverage

The following test files exist under tests/integration/ but are not referenced in any workflow's --testPathPatterns:

gh-host-injection.test.ts
ghes-auto-populate.test.ts
skip-pull.test.ts
api-proxy-observability.test.ts
api-proxy-rate-limit.test.ts
chroot-capsh-chain.test.ts
chroot-copilot-home.test.ts

These tests exist but are never invoked in CI, meaning regressions in these areas go undetected.

🟡 Medium Priority

6. No Dockerfile Linting (hadolint)

Current state: Container Dockerfiles (containers/squid/Dockerfile, containers/agent/Dockerfile, containers/api-proxy/Dockerfile) are never linted. Hadolint detects anti-patterns, deprecated instructions, and security issues in Dockerfiles.

7. Smoke Tests Are Opt-In, Not Automatic

Current state: Claude/Codex/Copilot smoke tests require a human to apply a specific emoji reaction. An AI agent regression in an otherwise green PR would not surface automatically.

8. Security Guard Is Advisory, Not Blocking

Current state: security-guard.md posts a comment, but the check never blocks merging. Security findings are surfaced but not enforced. For a firewall product, high-severity security findings should be blocking.

9. No DLP Integration Test Coverage

Current state: src/dlp.ts implements opt-in DLP URL scanning via Squid url_regex ACLs, but no integration test file exercises this feature path. The DLP feature could silently break without detection.

10. No SBOM Generation

Current state: No Software Bill of Materials is generated or attested for releases. Modern supply-chain security practices (SLSA, NTIA guidance) recommend SBOMs for distributed binaries and container images.

11. No Node.js Version Matrix in Integration Tests

Current state: Integration tests run only on Node 22. The build workflow tests Node 20 and 22, but the integration test suite does not. A Node.js compatibility regression in integration-tested code would not be caught for Node 20.

🟢 Low Priority

12. No Unused Dependency Check

Current state: No depcheck or knip run to identify unused or extraneous dependencies. As the project grows, dependency hygiene can degrade silently.

13. No Shell Script Linting (ShellCheck)

Current state: Multiple shell scripts are critical to security (setup-iptables.sh, entrypoint.sh, cleanup.sh, examples). No ShellCheck is run on these files.

14. No PR Size / Diff Metrics

Current state: No check flags oversized PRs. Large PRs in a security-sensitive codebase are harder to review thoroughly.

15. No Mutation Testing

Current state: No mutation testing (e.g., Stryker) validates that the existing tests are effective at catching bugs, not just exercising code paths.

📋 Actionable Recommendations

Rec 1: Raise Coverage Thresholds Incrementally and Add Per-File Gates

Complexity: Low | Impact: High
Add per-file coverage minimums in jest.config.js for cli.ts and docker-manager.ts, targeting at least 50% statements within 2 sprints. Raise global thresholds to 50% statements / 40% branches on the same timeline.

// jest.config.js - add coverageThreshold
coverageThreshold: {
  global: { statements: 50, branches: 40, functions: 45, lines: 50 },
  './src/cli.ts': { statements: 30 },
  './src/docker-manager.ts': { statements: 40 },
}

Rec 2: Add Trivy Container Scanning

Complexity: Low | Impact: High
Add a new workflow step (or new workflow) that runs aquasecurity/trivy-action against containers/squid/, containers/agent/, and containers/api-proxy/ on every PR that touches those paths, uploading results to GitHub Security tab as SARIF.

- uses: aquasecurity/trivy-action@master
  with:
    scan-type: 'config'
    scan-ref: 'containers/'
    format: 'sarif'
    output: 'trivy.sarif'

Rec 3: Add Performance Smoke Test on PRs

Complexity: Medium | Impact: High
Extract a lightweight performance check from benchmark-performance.ts (e.g., just startup time / basic proxy latency) and run it on PRs touching src/** or containers/**. Compare against a stored baseline artifact from the last main run.

Rec 4: Register Missing Integration Tests in CI

Complexity: Low | Impact: High
Add the uncovered test patterns to the appropriate integration workflow jobs:

# In test-integration-suite.yml, expand testPathPatterns to include:
"(gh-host-injection|ghes-auto-populate|skip-pull|api-proxy-observability|api-proxy-rate-limit|chroot-capsh-chain|chroot-copilot-home)"

Rec 5: Add hadolint Dockerfile Linting

Complexity: Low | Impact: Medium
Add a job to lint.yml that runs hadolint/hadolint-action against all three Dockerfiles, enforcing at minimum DL3008 (pinned apt packages) and DL3009 (clean apt cache).

Rec 6: Make Smoke Tests Automatic on Relevant File Paths

Complexity: Low | Impact: Medium
Extend smoke workflow triggers to automatically run when containers/** or src/** changes, in addition to the reaction trigger. This eliminates the human-in-the-loop gap for core logic changes.

Rec 7: Add DLP Integration Tests

Complexity: Medium | Impact: Medium
Create tests/integration/dlp.test.ts that exercises --enable-dlp flag with known patterns. Add it to the protocol/security integration job.

Rec 8: Add ShellCheck to Lint Workflow

Complexity: Low | Impact: Medium
Add a shellcheck job to lint.yml targeting containers/**/*.sh and scripts/**/*.sh. Available as ludeeus/action-shellcheck.

Rec 9: Add Security Guard as a Required Check

Complexity: Low | Impact: Medium
Configure the security-guard workflow to use a fail output that blocks merging when high-severity findings are identified, rather than only posting advisory comments.

Rec 10: Add SBOM Generation to Release Workflow

Complexity: Low | Impact: Low-Medium
Use anchore/sbom-action in release.yml to generate and attach an SPDX SBOM to each GitHub Release. This improves supply chain transparency with minimal effort.

📈 Metrics Summary

Metric	Value
Total workflow files	39 (21 standard YAML + 21 agentic `.md` compiled to `.lock.yml`)
PR-triggered workflows	14
Scheduled workflows	8
Agentic workflows	21
Unit test coverage (statements)	~38%
Unit test coverage (functions)	~37%
`cli.ts` coverage	0%
`docker-manager.ts` coverage	18%
Integration test files	30
Integration test files with CI coverage	~23 (77%)
Container image CVE scanning	❌ None
Dockerfile linting	❌ None
Performance regression on PRs	❌ None
Dependency CVE scanning (npm)	✅ On PRs + weekly
SAST (CodeQL)	✅ On PRs + weekly
Secret scanning	✅ Hourly (agentic)
SBOM	❌ None

Top 3 quick wins: Register missing integration tests (Rec 4), add Trivy container scanning (Rec 2), add hadolint + ShellCheck linting (Recs 5, 8).

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Mar 29, 2026, 10:19 PM UTC

2026-03-29T22:49:48Z

github-actions[bot]
bot Mar 29, 2026
Author

This discussion was automatically closed because it expired on 2026-03-29T22:19:27.528Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1396

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1396

Uh oh!

github-actions[bot] bot Mar 22, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

PR-Triggered Checks (run automatically on every PR)

PR-Triggered Smoke Tests (opt-in via reaction)

Scheduled / Background Checks

🔍 Identified Gaps

🔴 High Priority

1. Critically Low Unit Test Coverage on Core Files

2. Coverage Thresholds Are Too Low

3. No Container Image Security Scanning (Trivy/Grype)

4. Performance Benchmarks Not Run on PRs

5. Several Integration Test Files Have No CI Coverage

🟡 Medium Priority

6. No Dockerfile Linting (hadolint)

7. Smoke Tests Are Opt-In, Not Automatic

8. Security Guard Is Advisory, Not Blocking

9. No DLP Integration Test Coverage

10. No SBOM Generation

11. No Node.js Version Matrix in Integration Tests

🟢 Low Priority

12. No Unused Dependency Check

13. No Shell Script Linting (ShellCheck)

14. No PR Size / Diff Metrics

15. No Mutation Testing

📋 Actionable Recommendations

Rec 1: Raise Coverage Thresholds Incrementally and Add Per-File Gates

Rec 2: Add Trivy Container Scanning

Rec 3: Add Performance Smoke Test on PRs

Rec 4: Register Missing Integration Tests in CI

Rec 5: Add hadolint Dockerfile Linting

Rec 6: Make Smoke Tests Automatic on Relevant File Paths

Rec 7: Add DLP Integration Tests

Rec 8: Add ShellCheck to Lint Workflow

Rec 9: Add Security Guard as a Required Check

Rec 10: Add SBOM Generation to Release Workflow

📈 Metrics Summary

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 29, 2026 Author

github-actions[bot]
bot Mar 22, 2026

github-actions[bot]
bot Mar 29, 2026
Author