[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1406

2026-03-23T22:22:45Z

github-actions[bot]
bot Mar 23, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature, multi-tier CI/CD system with 19 workflows running on pull requests and an additional 17 scheduled/event-triggered workflows. All compiled agentic workflow .lock.yml files are up-to-date. Recent PR workflow runs show a high success rate (~82% on the most recent PR batch), with two notable failures: Build Test Suite (agentic, external dependency) and Smoke Codex (agentic smoke test).

Workflow Inventory

Tier	Count	Examples
Standard CI (always-on PR)	13	Build, Lint, TypeScript, Coverage, Integration, Chroot, Examples, CodeQL, Audit
Agentic PR workflows	6	Security Guard (Claude), Build Test Suite (Copilot), Smoke Claude/Copilot/Codex/Chroot
Scheduled/push-only	17	Performance Monitor, Secret Digger, Security Review, Doc Maintainer
Release	1	Manual dispatch only

✅ Existing Quality Gates

Code Quality

ESLint — enforces TypeScript code style on every PR (lint.yml)
markdownlint — lints all documentation (lint.yml)
TypeScript strict type checking — tsc --noEmit via test-integration.yml
Semantic PR titles — conventional commits enforced via pr-title.yml

Testing

Unit test coverage — Jest with PR comparison comment and regression gate (test-coverage.yml)
Integration tests — 5 parallel job groups: domain filtering, network security, protocol/security, container ops, API proxy (test-integration-suite.yml)
Chroot integration tests — 4 parallel jobs: languages, package managers, procfs, edge cases (test-chroot.yml)
Examples validation — 4 shell script examples exercised end-to-end (test-examples.yml)
Setup action test — action.yml tested with multiple version scenarios (test-action.yml)
Multi-language build test — Copilot-powered test across 8 ecosystems: Bun, C++, Deno, .NET, Go, Java, Node.js, Rust (build-test.md)
Smoke tests — Claude, Copilot, Codex, and Chroot agents run real-world tasks through AWF (smoke-*.md)

Security

CodeQL — JavaScript/TypeScript and Actions analysis with security-extended + security-and-quality queries
Dependency audit — npm audit for main package and docs-site with SARIF upload to Security tab
AI security review — Claude-powered review checks for iptables/Squid/container security weakening (security-guard.md)

Documentation

Link check — Lychee checks for broken links on markdown-changing PRs
Docs preview — Astro site built and artifact uploaded for docs-changing PRs

Build

Multi-version build matrix — Node 20 and 22 (build.yml)
API proxy unit tests — Separate npm test in containers/api-proxy/ (build.yml)

🔍 Identified Gaps

🔴 High Priority

1. Coverage Thresholds Are Dangerously Low

Current thresholds in jest.config.js: branches: 30%, functions: 35%, lines: 38%, statements: 38%. A project this security-sensitive should have much higher minimums. Additionally, the coverage regression comparison step uses continue-on-error: true, meaning even a significant coverage drop may not fail the PR.

2. No Container Image Security Scanning on PRs

No workflow scans the Docker images (Squid, agent, api-proxy) for known CVEs using Trivy, Grype, or Anchore. The docs/INTEGRATION-TESTS.md coverage heat map lists "Container security scan" as only having CI coverage — but investigation shows this workflow doesn't exist; the ci-doctor references it but it was never created. For a firewall tool, unpatched base image CVEs are a critical risk.

3. Performance Regression Testing Not on PRs

performance-monitor.yml runs weekly only (Mondays at 06:00 UTC). A PR that degrades container startup time by 50% would merge undetected and only surface in the next weekly report. Startup latency is a user-facing metric for this tool.

4. `--block-domains` Feature Has Zero Test Coverage

Per docs/INTEGRATION-TESTS.md coverage heat map: "Domain deny-list (--block-domains) ❌ across all tiers — unit, integration, CI, smoke, build-test". This is a core security feature with no automated validation at any level.

5. `--env-all` Flag Has Zero Test Coverage

Also confirmed by coverage heat map: zero coverage at all levels. The --env-all flag copies the host environment into the container, which has direct security implications (credential exposure risk).

🟡 Medium Priority

6. No Dockerfile Linting (hadolint)

None of the three Dockerfiles (containers/squid/Dockerfile, containers/agent/Dockerfile, containers/api-proxy/Dockerfile) are linted with hadolint or a similar tool. Hadolint catches RUN apt-get without --no-install-recommends, missing USER instructions, shell form vs. exec form, and other best practices.

7. Smoke Test Role-Filtering Creates PR Coverage Gaps

Recent workflow runs show "Smoke Claude: skipped" and "Smoke Copilot: skipped" on PRs. Agentic smoke workflows use roles: all but may be skipped for external contributors or bot-authored PRs. This means the end-to-end AI agent pipeline isn't validated for all PR types.

8. Build Test Suite Fragility

build-test.md clones external repositories (Mossaka/gh-aw-firewall-test-*) for each PR. The most recent run showed a failure. External dependency on third-party repos creates flaky CI; if those repos are unavailable or have breaking changes, every PR gets a spurious failure.

9. No Structured Test Result Reports (JUnit XML)

Jest currently outputs text + LCOV + HTML + JSON summary but no JUnit XML. GitHub Actions can parse JUnit XML to show inline test failures in the PR interface (annotation on the specific line that failed) and track test trends over time. This would significantly improve developer experience when tests fail.

10. Integration Tests Don't Contribute to Coverage Metrics

test-coverage.yml only runs npm run test:coverage targeting src/**/*.ts unit tests. The extensive integration test suite (tests/integration/) exercises the same code paths but is not included in coverage reporting. True coverage is likely much higher than reported.

11. No Docker Image Size Budget

No workflow tracks the size of built Docker images. An accidental dependency addition could bloat the agent or squid images, impacting pull times for users. A size regression gate (e.g., fail if image grows >20%) would catch this.

🟢 Low Priority

12. Link Check Only Triggered on Markdown Changes

link-check.yml uses paths: ['**/*.md', '.github/lychee.toml']. Code-only PRs that remove or rename documentation anchors will break links without the link-check triggering.

13. No Changelog/Release Notes Enforcement

No workflow validates that a CHANGELOG.md entry or release notes update accompanies non-trivial changes. The update-release-notes.md workflow only runs on release publication, not on PRs.

14. Performance Monitor Uses Unpinned Action SHA

performance-monitor.yml uses actions/checkout@v4, actions/setup-node@v4, etc. (tag references, not pinned SHAs). Other workflows are correctly pinned. This inconsistency creates a supply chain risk specifically in the performance monitoring workflow.

15. No Mutation Testing

The codebase has low coverage thresholds and security-critical logic. Mutation testing (e.g., with Stryker) would reveal whether tests actually verify correctness or just achieve line coverage through execution without assertions.

📋 Actionable Recommendations

#	Issue	Recommendation	Complexity	Impact
1	Low coverage thresholds	Raise Jest thresholds to `lines: 60%, branches: 50%, functions: 65%` incrementally; remove `continue-on-error` from comparison step	Low	🔴 High
2	No container image scanning	Add `trivy-action` step scanning all three Dockerfiles in `build.yml`; upload SARIF to Security tab	Low	🔴 High
3	Performance not on PRs	Add a fast startup-time check (~30s) to `build.yml` using existing `benchmark-performance.ts`; fail if >2x regression	Medium	🟡 Medium
4	`--block-domains` untested	Add unit tests in `src/squid-config.test.ts` and integration test in `tests/integration/`	Medium	🔴 High
5	`--env-all` untested	Add integration test covering credential exposure and env propagation with `--env-all`	Medium	🔴 High
6	No hadolint	Add `hadolint/hadolint-action` step to `build.yml` scanning all three Dockerfiles	Low	🟡 Medium
7	Smoke test skipping	Investigate role-filter behavior; consider adding a `workflow_dispatch` fallback test that always runs	Medium	🟡 Medium
8	Build test fragility	Mirror or fork the external test repos into the organization; add `continue-on-error` with explicit issue creation for external failures	Medium	🟡 Medium
9	No JUnit reports	Add `jest-junit` reporter; configure `actions/junit-reporter` to annotate PR with test failures	Low	🟡 Medium
10	Integration tests not in coverage	Add a combined coverage run (`npm run test:all -- --coverage`) for the test-coverage workflow	Medium	🟡 Medium
11	No image size budget	Add a step to `build.yml` that builds images and checks `docker image inspect` size against a threshold file	Low	🟢 Low
12	Link check gap	Change `link-check.yml` trigger to run on all PRs (not just markdown), with a 5-minute timeout	Low	🟢 Low
13	No changelog enforcement	Add a PR check using `dangoslen/changelog-enforcer` or a custom script	Low	🟢 Low
14	Unpinned action SHAs	Pin `performance-monitor.yml` action SHAs using `pinact` or similar	Low	🟢 Low
15	No mutation testing	Evaluate Stryker.js for high-value security modules (`src/squid-config.ts`, `src/domain-patterns.ts`)	High	🟡 Medium

📈 Metrics Summary

Metric	Value
Total workflows	39 (21 standard YAML + 21 agentic .md with compiled .lock.yml)
PR-triggered workflows	19
Scheduled workflows	~10
Recent PR run success rate	~82% (15 success / 2 failure / 5 skipped out of 22 runs)
Current line coverage threshold	38% (very low for security-critical code)
Integration test count	~265 tests across 26 files
Unit test count	~200 tests across 19 files
Integration test CI coverage gap	Domain/Network, Protocol/Security, Container/Ops tests have no dedicated CI (only chroot tests have their own workflow)
Zero-coverage features	`--block-domains`, `--env-all`, `--docker-warning-stub`
Missing workflow	"Container Security Scan" referenced in ci-doctor.md but does not exist

Key Finding

The most significant structural gap is the absence of a container image vulnerability scanner. For a tool whose core value proposition is security isolation, shipping Docker images with unscanned CVEs would be a trust-breaking issue. This should be the first gap addressed.

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Mar 30, 2026, 10:22 PM UTC

2026-03-30T22:52:02Z

github-actions[bot]
bot Mar 30, 2026
Author

This discussion was automatically closed because it expired on 2026-03-30T22:22:45.667Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1406

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1406

Uh oh!

github-actions[bot] bot Mar 23, 2026

📊 Current CI/CD Pipeline Status

Workflow Inventory

✅ Existing Quality Gates

Code Quality

Testing

Security

Documentation

Build

🔍 Identified Gaps

🔴 High Priority

1. Coverage Thresholds Are Dangerously Low

2. No Container Image Security Scanning on PRs

3. Performance Regression Testing Not on PRs

4. --block-domains Feature Has Zero Test Coverage

5. --env-all Flag Has Zero Test Coverage

🟡 Medium Priority

6. No Dockerfile Linting (hadolint)

7. Smoke Test Role-Filtering Creates PR Coverage Gaps

8. Build Test Suite Fragility

9. No Structured Test Result Reports (JUnit XML)

10. Integration Tests Don't Contribute to Coverage Metrics

11. No Docker Image Size Budget

🟢 Low Priority

12. Link Check Only Triggered on Markdown Changes

13. No Changelog/Release Notes Enforcement

14. Performance Monitor Uses Unpinned Action SHA

15. No Mutation Testing

📋 Actionable Recommendations

📈 Metrics Summary

Key Finding

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 30, 2026 Author

github-actions[bot]
bot Mar 23, 2026

4. `--block-domains` Feature Has Zero Test Coverage

5. `--env-all` Flag Has Zero Test Coverage

github-actions[bot]
bot Mar 30, 2026
Author