[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1319

2026-03-15T22:20:43Z

github-actions[bot]
bot Mar 15, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature and well-layered CI/CD pipeline with 21 agentic workflow definitions (.md files compiled to .lock.yml) and 17 traditional GitHub Actions YAML workflows. All recent runs of critical workflows (Build Verification: 1,216 total runs; Integration Tests: 324 total runs) show a 100% success rate in the most recent window, indicating a stable baseline. The pipeline covers the full development lifecycle: build verification, unit/integration testing, security scanning, documentation, and release automation.

✅ Existing Quality Gates

The following checks run on every pull request targeting main:

Code Quality

Build Verification (build.yml) — TypeScript compilation + ESLint on Node 20 and 22 matrix
Lint (lint.yml) — ESLint on TypeScript sources + markdownlint on all .md files
TypeScript Type Check (test-integration.yml) — tsc --noEmit with strict config
PR Title Check (pr-title.yml) — Conventional Commits enforcement with allowed scopes

Testing

Unit Tests with Coverage (test-coverage.yml) — Jest coverage comparison vs. base branch; posts PR comment; fails on regression
Integration Tests (test-integration-suite.yml) — Four parallelized job groups: domain/network, protocol security, container operations, API proxy (33 integration test files)
Chroot Integration Tests (test-chroot.yml) — Multi-language chroot support (Python, Go, Java, .NET)
Examples Test (test-examples.yml) — End-to-end smoke tests of shell examples
Test Setup Action (test-action.yml) — Validates the action.yml setup action
API Proxy Unit Tests — Run as part of build.yml

Security

CodeQL (codeql.yml) — SAST for JavaScript/TypeScript and GitHub Actions workflows
Dependency Vulnerability Audit (dependency-audit.yml) — npm audit with SARIF upload, fails on high/critical
Container Security Scan (container-scan.yml) — Trivy scanning of agent and squid containers (triggered on container path changes)
AI Security Guard (security-guard.md) — Claude-based AI review of security-sensitive diffs on every PR

Documentation

Link Check (link-check.yml) — Lychee link validation on Markdown file changes

Release / Agentic Workflows on PRs

Smoke tests (smoke-claude.md, smoke-codex.md, smoke-copilot.md, smoke-chroot.md) — End-to-end firewall tests using each AI engine (reaction-opt-in on PRs, scheduled every 12h)
Build Test (build-test.md) — Agentic build verification on PRs

🔍 Identified Gaps

🔴 High Priority

H1 — Critically Low Coverage Thresholds

The coverage thresholds in jest.config.js are set very low: 38% statements, 31.78% branches, 37% functions. The two most critical files — cli.ts (0% coverage) and docker-manager.ts (18% coverage) — are the core orchestrators of the entire tool. Low thresholds mean the coverage gate provides almost no protection against regressions in these files.

H2 — Container Security Scan Not Triggered on Every PR

container-scan.yml uses paths: filtering limited to containers/** changes. PRs that modify src/docker-manager.ts (which controls container configuration, capabilities, and seccomp) bypass Trivy scanning entirely, even though such changes directly affect the security posture of the containers.

H3 — Smoke Tests Are Opt-In and Non-Blocking on PRs

Smoke tests (smoke-claude, smoke-codex, smoke-copilot) require an emoji reaction to trigger on PRs and are not required status checks. The full end-to-end firewall validation (actual network egress control with a real AI agent) is therefore never a blocking gate on merge. A PR that breaks the core proxy flow can be merged if no reaction is added.

H4 — Performance Benchmarks Run Weekly Only

performance-monitor.yml is scheduled weekly and never runs on PRs. Startup latency and container boot time are important UX properties of a firewall tool, and regressions can be introduced in docker-manager.ts without detection until the following Monday.

🟡 Medium Priority

M1 — Coverage Thresholds Are Not Ratcheted Up Over Time

While the test-coverage-improver.md agentic workflow exists to open PRs improving coverage, the static thresholds in jest.config.js don't automatically rise as coverage improves. There is no mechanism to prevent coverage from drifting back down to the minimum threshold after it has been raised.

M2 — No License Compliance Checking

There is no FOSSA, LicenseChecker, or license-checker step to validate that dependencies comply with the project's license policy. For a security tool distributed as open source, unexpected copyleft or restrictive licenses in dependencies could create legal risk.

M3 — No Mutation Testing

The test suite validates that tests pass, but does not verify test effectiveness. Mutation testing (e.g., Stryker) would reveal tests that pass even when the source code is intentionally broken, which is particularly important for security-critical logic like domain pattern validation and iptables rule generation.

M4 — Docs Site Not Tested in PRs

docs-preview.yml exists but does not appear to run on all PRs. The Astro/Starlight documentation site (docs-site/) has no build validation on code PRs, meaning a documentation build break could go undetected until the deploy workflow runs post-merge.

M5 — No Structured Fuzz / Property-Based Testing

The domain parsing logic (src/domain-patterns.ts), Squid config generation (src/squid-config.ts), and iptables rule construction are security-critical surfaces. Property-based testing (e.g., fast-check) would provide stronger guarantees than example-based unit tests alone.

M6 — Container Scan Only Covers HIGH/CRITICAL — No MEDIUM Tracking

container-scan.yml is configured with severity: 'CRITICAL,HIGH', which is appropriate for blocking but provides no visibility into accumulating MEDIUM vulnerabilities that can become high-risk over time.

🟢 Low Priority

L1 — No macOS / Windows Testing

All CI runs on ubuntu-latest. The tool uses Docker and iptables, which are Linux-specific, but the CLI itself could be installed on macOS. There is no validation that npm install or the action setup step works on macOS runners.

L2 — No Dependabot Auto-Merge for Minor/Patch Dependencies

Dependabot updates are not configured in .github/dependabot.yml (file not found in the directory listing). Dependency freshness relies on the dependency-security-monitor.md agentic workflow rather than automated PRs, which means patch updates may be delayed.

L3 — Agentic Workflow Compilation Not Validated in PRs

Changes to .md workflow files require manual compilation (gh aw compile) to produce .lock.yml files. There is no CI check that validates the compiled .lock.yml matches the .md source, allowing drift between the two.

L4 — No SBOM (Software Bill of Materials) Generation

The release workflow does not produce an SBOM artifact (CycloneDX or SPDX format). For a security-focused tool, publishing an SBOM alongside each release would improve supply chain transparency.

L5 — Link Check Only Triggers on Markdown Changes

link-check.yml uses paths: ['**/*.md'], so it only runs when a Markdown file is modified. A broken external URL in existing docs can persist indefinitely if the PR doesn't touch Markdown. The weekly schedule provides a backstop, but non-blocking.

📋 Actionable Recommendations

Gap	Recommendation	Complexity	Impact
H1 — Low coverage thresholds	Raise thresholds incrementally per quarter: target 60% statements, 50% branches by end of year. Prioritize `cli.ts` and `docker-manager.ts` with dedicated test suites.	Medium	🔴 High
H2 — Container scan path filter	Add `src/**` to `container-scan.yml` `paths:` trigger, OR remove path restriction and run Trivy on every PR (use caching to keep it fast).	Low	🔴 High
H3 — Smoke tests opt-in	Add at least one smoke test variant as a required status check (e.g., `smoke-copilot` runs automatically on all PRs without needing a reaction). Alternatively, run a lightweight `awf --allow-domains example.com curl (example.com/redacted) integration check as a required gate.	Low	🔴 High
H4 — No PR performance gate	Add a lightweight benchmark step to the build workflow that measures container startup time against a threshold (e.g., fail if > 30s). Reuse `scripts/ci/benchmark-performance.ts`.	Medium	🟡 Medium
M1 — Static coverage thresholds	Implement a coverage ratchet: after each merge to main, update `jest.config.js` thresholds to current coverage if they are higher than existing minimums.	Medium	🟡 Medium
M2 — License compliance	Add `npx license-checker --onlyAllow 'MIT;Apache-2.0;BSD-2-Clause;BSD-3-Clause;ISC'` to the build workflow.	Low	🟡 Medium
M3 — No mutation testing	Integrate Stryker Mutator as a scheduled weekly job targeting `src/domain-patterns.ts` and `src/squid-config.ts`.	High	🟡 Medium
M4 — Docs site build	Add `cd docs-site && npm ci && npm run build` as a job in `build.yml` triggered on `docs-site/**` path changes.	Low	🟡 Medium
M5 — No fuzz/property tests	Add `fast-check` property-based tests for `domain-patterns.ts` and `squid-config.ts`.	Medium	🟡 Medium
M6 — MEDIUM vuln visibility	Add a separate non-blocking Trivy scan step with `severity: MEDIUM` that posts results to GitHub Security tab without failing the check.	Low	🟢 Low
L1 — No macOS testing	Add a macOS job to `test-action.yml` to validate the setup action.	Low	🟢 Low
L2 — No Dependabot	Add `.github/dependabot.yml` with monthly npm update schedule and auto-merge for patch versions via a GitHub Actions auto-merge workflow.	Low	🟢 Low
L3 — Lock file drift	Add a CI check: `gh aw compile` all `.md` files and `git diff --exit-code` to detect uncommitted lock file changes.	Low	🟢 Low
L4 — No SBOM	Add `anchore/sbom-action` to `release.yml` to attach a CycloneDX SBOM to each GitHub release.	Low	🟢 Low

📈 Metrics Summary

Metric	Value
Total workflows (YAML)	17
Total agentic workflows (.md)	21
Workflows running on every PR	12+
Unit test files	14
Integration test files	33
Total unit tests	~135
Statement coverage	38.39% (threshold: 38%)
Branch coverage	31.78% (threshold: 30%)
Function coverage	37.03% (threshold: 35%)
Line coverage	38.31% (threshold: 38%)
`cli.ts` coverage	0% 🔴
`docker-manager.ts` coverage	18% 🔴
Build Verification success rate (recent)	100% (10/10 recent runs)
Integration Tests success rate (recent)	100% (10/10 recent runs)
Security workflows	CodeQL + Trivy + npm audit + AI Security Guard

Key Observation

The pipeline infrastructure is comprehensive and well-designed. The most impactful improvement area is test coverage depth — the thresholds are intentionally set low to pass a starting baseline, and the two most important files (cli.ts and docker-manager.ts) remain almost entirely untested. Increasing coverage on these files would directly improve confidence in every PR and catch regressions in the core firewall orchestration logic.

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Mar 22, 2026, 10:20 PM UTC

2026-03-22T22:46:12Z

github-actions[bot]
bot Mar 22, 2026
Author

This discussion was automatically closed because it expired on 2026-03-22T22:20:43.206Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1319

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1319

Uh oh!

github-actions[bot] bot Mar 15, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

Code Quality

Testing

Security

Documentation

Release / Agentic Workflows on PRs

🔍 Identified Gaps

🔴 High Priority

H1 — Critically Low Coverage Thresholds

H2 — Container Security Scan Not Triggered on Every PR

H3 — Smoke Tests Are Opt-In and Non-Blocking on PRs

H4 — Performance Benchmarks Run Weekly Only

🟡 Medium Priority

M1 — Coverage Thresholds Are Not Ratcheted Up Over Time

M2 — No License Compliance Checking

M3 — No Mutation Testing

M4 — Docs Site Not Tested in PRs

M5 — No Structured Fuzz / Property-Based Testing

M6 — Container Scan Only Covers HIGH/CRITICAL — No MEDIUM Tracking

🟢 Low Priority

L1 — No macOS / Windows Testing

L2 — No Dependabot Auto-Merge for Minor/Patch Dependencies

L3 — Agentic Workflow Compilation Not Validated in PRs

L4 — No SBOM (Software Bill of Materials) Generation

L5 — Link Check Only Triggers on Markdown Changes

📋 Actionable Recommendations

📈 Metrics Summary

Key Observation

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 22, 2026 Author

github-actions[bot]
bot Mar 15, 2026

github-actions[bot]
bot Mar 22, 2026
Author