[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1471

2026-03-26T22:20:57Z

github-actions[bot]
bot Mar 26, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature and well-structured CI/CD pipeline with both standard GitHub Actions workflows and an AI-powered agentic layer. As of March 2026:

44 total workflows: 22 standard YAML workflows + 22 agentic Markdown workflows (compiled to .lock.yml)
16 checks run on PRs (some conditional on file paths)
9 parallel integration test groups covering domain filtering, network, protocol security, container ops, API proxy, and chroot
All standard action SHA-pinning is consistent (except performance-monitor.yml)
Agentic workflows include AI-based security review, build verification, and smoke tests against all three supported AI engines (Claude, Codex, Copilot)

✅ Existing Quality Gates

Workflow	Trigger	What It Checks
Build Verification	PR + push/main	Build on Node 20 & 22, ESLint, dist verification, API proxy unit tests
Lint	PR + push/main	ESLint (TypeScript) + markdownlint
TypeScript Type Check	PR + push/main	`tsc --noEmit` strict type checking
Test Coverage	PR + push/main	Unit tests (135 tests), coverage comparison vs. base branch, regression detection
Integration Tests	PR + push/main	Domain filtering, network security, protocol support, container ops, API proxy (5 parallel jobs)
Chroot Integration Tests	PR + push/main	Language support, package managers, `/proc` filesystem, edge cases (4 parallel jobs)
Examples Test	PR + push/main	End-to-end shell example scripts (basic-curl, domains-file, debugging, blocked-domains)
Test Setup Action	PR + push/main	GitHub Action installation (latest, specific version, with images, invalid version rejection)
CodeQL	PR + push/main + weekly	SAST scanning for JavaScript/TypeScript and GitHub Actions workflow files
Dependency Vulnerability Audit	PR + push/main + weekly	`npm audit` for main package and docs-site, SARIF upload to Security tab, fails on `high`/`critical`
PR Title Check	PR	Conventional commits format enforcement (`feat:`, `fix:`, `docs:`, etc.)
Link Check	PR (`.md` changes only) + weekly	External documentation link validation via lychee
Documentation Preview	PR (docs changes only)	Docs site build verification, artifact upload
Security Guard (agentic)	PR	AI security review of code changes
Build Test (agentic)	PR	AI-assisted build verification and test execution
Smoke Tests (agentic)	PR reaction + every 12h	Real end-to-end runs with Claude, Codex, Copilot, and Chroot
Performance Monitor	Weekly	Startup time, container spin-up benchmarks, P95/P99 metrics
Dependency Security Monitor	Daily	AI-powered dependency vulnerability monitoring
Security Review	Daily	Comprehensive AI security audit

🔍 Identified Gaps

🔴 High Priority

1. Critically Low Test Coverage on Core Modules

The two most important source files have negligible unit test coverage:

cli.ts — 0% coverage (69 statements, 17 branches, 10 functions)
docker-manager.ts — 18% coverage (45/250 statements, 22% branches, 4% functions)

These files implement the entire CLI argument parsing, container orchestration, environment variable handling, and cleanup lifecycle — i.e., the core functionality. The overall threshold of 38% statements is barely enforced and provides false confidence.

2. No Container Image Security Scanning (CVE Scanning)

The three Docker images (containers/squid/, containers/agent/, containers/api-proxy/) are built and used in integration tests but never scanned for OS-level CVEs. The dependency audit only covers npm packages, not:

OS packages installed in Ubuntu 22.04 base images
Python/system packages in the Squid container
Node.js runtime vulnerabilities in the API proxy container

Tools like Trivy, Grype, or Docker Scout can catch this class of vulnerabilities.

3. Performance Regression Not Detected on PRs

The performance-monitor.yml workflow runs weekly only. A PR could introduce a 10x startup time regression and it would not be caught until the weekly run, by which time the change would already be merged into main.

4. Smoke Tests Cannot Block PR Merges

The smoke tests (real AI agent runs against Claude, Codex, Copilot) are triggered by emoji reactions or on a 12-hour schedule — not as required status checks. A breaking change to the firewall or agent container that would cause real-world AI agent runs to fail can be merged to main before smoke tests catch it.

🟡 Medium Priority

5. Shell Script Linting Absent

The project has substantial shell code that is untested statically:

containers/agent/entrypoint.sh (~600 lines)
containers/agent/setup-iptables.sh
containers/squid/entrypoint.sh
scripts/ci/*.sh (cleanup, test scripts)

[shellcheck]((www.shellcheck.net/redacted) would catch common bugs (unquoted variables, missing exit code checks, portability issues) that could create security vulnerabilities in the container entrypoints.

6. Coverage Thresholds Are Too Low for a Security-Critical Tool

Current thresholds: 38% statements, 30% branches, 35% functions, 38% lines. For a network security firewall whose correctness is critical to isolating AI agents, these thresholds provide very weak guarantees. A security bug in uncovered code paths would not be caught by tests.

7. No Dockerfile Linting

Three Dockerfiles exist (containers/*/Dockerfile) but no hadolint or equivalent runs in CI. Common Dockerfile issues (running as root unnecessarily, using latest base tag, missing --no-cache flags, improper layer ordering) could affect security and reproducibility.

8. Documentation Build Failures Don't Fail PRs

In docs-preview.yml, the docs build step uses continue-on-error: true. Documentation build failures are reported as comments but don't fail the PR check. A broken docs site can be merged silently.

9. Integration Tests Have No Path Filtering

Unlike smoke-chroot (which is path-filtered to src/**, containers/**), the main integration test suite (test-integration-suite.yml) runs on all PRs regardless of what changed. Documentation-only PRs run a full 45-minute integration test suite unnecessarily.

🟢 Low Priority

10. Action SHA Pinning Inconsistency in `performance-monitor.yml`

performance-monitor.yml uses floating action tags (actions/checkout@v4, actions/setup-node@v4, actions/github-script@v7, actions/upload-artifact@v4) while all other workflows pin to specific commit SHAs. This creates an inconsistent security posture — a compromised upstream action version could inject code into performance benchmark runs.

11. No SBOM (Software Bill of Materials) Generation

Releases don't include an SBOM artifact. For a security tool distributed as a GitHub Action and CLI, SBOM generation would aid users in auditing the supply chain of the tool they're using to secure their own workflows.

12. No Stale Branch Coverage for Smoke Tests

Smoke tests run on main every 12 hours. There's no easy way to know if the smoke test that ran was against the latest commit or a significantly older one, making it harder to correlate smoke failures with specific changes.

13. API Proxy Container Not Built in Dependency Audit

dependency-audit.yml audits containers/api-proxy npm deps, but doesn't build and scan the api-proxy Docker image itself. The container includes a Node.js runtime that may have separate OS-level vulnerabilities beyond npm packages.

📋 Actionable Recommendations

Recommendation 1: Add Trivy Container Scanning to Build Verification

Complexity: Low | Impact: High

Add a container scanning step to build.yml using Trivy (already available as a GitHub Action, no secrets needed):

- name: Scan container images for CVEs
  uses: aquasecurity/trivy-action@(SHA)
  with:
    image-ref: 'ghcr.io/github/gh-aw-firewall/agent:latest'
    format: 'sarif'
    output: 'trivy-results.sarif'
    severity: 'CRITICAL,HIGH'
    exit-code: '1'
- uses: github/codeql-action/upload-sarif@(SHA)
  with:
    sarif_file: 'trivy-results.sarif'
    category: container-scan

Run this on all three images (squid, agent, api-proxy). Results flow to the GitHub Security tab.

Recommendation 2: Increase Test Coverage Thresholds Incrementally

Complexity: Medium | Impact: High

Add unit tests for cli.ts (focus on argument parsing, flag combinations, error paths) and docker-manager.ts (focus on generateDockerCompose, env var merging, cleanup logic). Then raise thresholds to 60%+. The cli-workflow.test.ts pattern (mocking execa and docker calls) is already established.

Target milestones:

3 months: 50% statements/lines, 45% branches
6 months: 65% statements/lines, 55% branches

Recommendation 3: Add shellcheck to Lint Workflow

Complexity: Low | Impact: Medium

Add a shellcheck job to lint.yml:

shellcheck:
  name: Shell Script Lint
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@(SHA)
    - name: Run shellcheck
      uses: ludeeus/action-shellcheck@(SHA)
      with:
        scandir: './containers'
        additional_files: 'scripts/ci/*.sh'
        severity: warning

Recommendation 4: Add Hadolint Dockerfile Linting

Complexity: Low | Impact: Medium

Add hadolint scanning for the three Dockerfiles. Can be integrated as a job in build.yml or lint.yml.

Recommendation 5: Add Path Filtering to Integration Test Suite

Complexity: Low | Impact: Medium

Add paths-ignore to test-integration-suite.yml to skip integration tests on documentation-only PRs:

pull_request:
  branches: [main]
  paths-ignore:
    - '**/*.md'
    - 'docs/**'
    - 'docs-site/**'
    - '.github/workflows/release.yml'

This mirrors the pattern already used by other workflows and reduces unnecessary CI time.

Recommendation 6: Pin Actions in `performance-monitor.yml`

Complexity: Low | Impact: Low

Replace floating tags with SHA-pinned versions (consistent with all other workflows). This is a one-time fix.

Recommendation 7: Make Smoke Tests a Required Merge Queue Gate

Complexity: High | Impact: High

Consider adding smoke tests to merge queue or requiring at least one smoke test pass before main merges. This would catch real-world integration breakages before they hit main.

Alternatively, add a lightweight smoke test that runs on every PR (not a full AI agent run, but a basic awf --allow-domains example.com curl (example.com/redacted) validation) as a required check.

Recommendation 8: Generate SBOM in Release Workflow

Complexity: Low | Impact: Low

Add anchore/sbom-action to release.yml to generate and attach an SBOM to each release. Useful for enterprise users who need supply chain transparency.

📈 Metrics Summary

Metric	Value
Total workflows	44 (22 standard + 22 agentic)
PR-blocking checks	~16
Integration test groups	9 parallel jobs
Unit test count	135 tests in 6 files
Statement coverage	38.39% (threshold: 38%)
Branch coverage	31.78% (threshold: 30%)
Function coverage	37.03% (threshold: 35%)
Line coverage	38.31% (threshold: 38%)
`cli.ts` coverage	0% ⚠️
`docker-manager.ts` coverage	18% ⚠️
`squid-config.ts` coverage	100% ✅
`logger.ts` coverage	100% ✅
Recent PR Title Check failures	1 of 4 recent runs ❌
Container images scanned for CVEs	0 of 3 ⚠️
Dockerfiles linted	0 of 3 ⚠️
Shell scripts linted	0 ⚠️

Assessment generated on 2026-03-26 by the CI/CD Gaps Assessment workflow.

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Apr 2, 2026, 10:20 PM UTC

2026-04-02T22:49:27Z

github-actions[bot]
bot Apr 2, 2026
Author

This discussion was automatically closed because it expired on 2026-04-02T22:20:57.353Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1471

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1471

Uh oh!

github-actions[bot] bot Mar 26, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

🔍 Identified Gaps

🔴 High Priority

1. Critically Low Test Coverage on Core Modules

2. No Container Image Security Scanning (CVE Scanning)

3. Performance Regression Not Detected on PRs

4. Smoke Tests Cannot Block PR Merges

🟡 Medium Priority

5. Shell Script Linting Absent

6. Coverage Thresholds Are Too Low for a Security-Critical Tool

7. No Dockerfile Linting

8. Documentation Build Failures Don't Fail PRs

9. Integration Tests Have No Path Filtering

🟢 Low Priority

10. Action SHA Pinning Inconsistency in performance-monitor.yml

11. No SBOM (Software Bill of Materials) Generation

12. No Stale Branch Coverage for Smoke Tests

13. API Proxy Container Not Built in Dependency Audit

📋 Actionable Recommendations

Recommendation 1: Add Trivy Container Scanning to Build Verification

Recommendation 2: Increase Test Coverage Thresholds Incrementally

Recommendation 3: Add shellcheck to Lint Workflow

Recommendation 4: Add Hadolint Dockerfile Linting

Recommendation 5: Add Path Filtering to Integration Test Suite

Recommendation 6: Pin Actions in performance-monitor.yml

Recommendation 7: Make Smoke Tests a Required Merge Queue Gate

Recommendation 8: Generate SBOM in Release Workflow

📈 Metrics Summary

Replies: 1 comment

Uh oh!

github-actions[bot] bot Apr 2, 2026 Author

github-actions[bot]
bot Mar 26, 2026

10. Action SHA Pinning Inconsistency in `performance-monitor.yml`

Recommendation 6: Pin Actions in `performance-monitor.yml`

github-actions[bot]
bot Apr 2, 2026
Author