[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1475

2026-03-27T22:23:50Z

github-actions[bot]
bot Mar 27, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature and well-structured CI/CD pipeline with 20+ active workflows covering build verification, unit and integration testing, security scanning, documentation, and AI-assisted review. The majority of PR-blocking checks pass reliably.

Workflow	Trigger	Recent Success Rate
Build Verification	PR + push	~90% (27/30)
Integration Tests	PR + push	~90% (27/30)
Test Coverage	PR + push	~73% (22/30) — coverage regressions caught
TypeScript Type Check	PR + push	Active
Lint (ESLint + markdownlint)	PR + push	Active
CodeQL	PR + push + weekly	Active
Dependency Vulnerability Audit	PR + push + weekly	Active
PR Title Check	PR	Active
Chroot Integration Tests	PR + push	Active
Security Guard (Claude AI)	PR	Active

Total active workflows: 52 (GitHub Actions API count), including 22 on-disk workflow files plus dynamic Copilot and CodeQL workflows.

✅ Existing Quality Gates

The following checks run on every PR targeting main:

Build & Compilation

build.yml — TypeScript compilation, Node.js 20 + 22 matrix
test-integration.yml — tsc --noEmit strict type checking

Code Quality

lint.yml — ESLint on src/ + markdownlint on all .md files
pr-title.yml — Conventional commit format enforcement (amannn/action-semantic-pull-request)

Testing

test-coverage.yml — Unit tests with coverage; PR comment with comparison to base branch; fails on regression
test-integration-suite.yml — 5 parallel integration test jobs: Domain, Network, Protocol & Security, Container Ops, API Proxy
test-chroot.yml — Chroot language support (Python, Go, Java, .NET, Ruby, Node)
test-examples.yml — Validates example scripts
test-action.yml — Tests the setup GitHub Action

Security

codeql.yml — CodeQL analysis (javascript-typescript + actions)
dependency-audit.yml — npm audit with SARIF upload to Security tab
security-guard.lock.yml — AI-powered PR review (Claude) for security boundary changes

Documentation

docs-preview.yml — Builds documentation site for doc-touching PRs
link-check.yml — Lychee link validation on .md file changes

Automated AI Review

build-test.lock.yml — Copilot agent runs builds/tests and comments on PRs
Smoke tests (Claude, Codex, Copilot) — Opt-in end-to-end agent execution tests

🔍 Identified Gaps

🔴 High Priority

1. Coverage thresholds are dangerously low
Current global thresholds: branches 30%, functions 35%, lines/statements 38%. The two most critical files have alarmingly low coverage:

cli.ts — 0% coverage (0/69 statements)
docker-manager.ts — 18% coverage (45/250 statements, 4% functions)

Both would continue to pass the coverage gate despite being near-zero. A PR can delete all tests for these files and CI would still be green.

2. No container image security scanning
Docker images (containers/squid/, containers/agent/, containers/api-proxy/) are built and published without any image vulnerability scanning (no Trivy, Grype, or Snyk container scan). These images run with elevated privileges and process untrusted agent traffic — a compromised base image is a critical attack vector.

3. Performance benchmarks do not run on PRs
performance-monitor.yml runs weekly on a schedule only. Startup latency and memory regressions introduced by PRs are not detected until one week later. The benchmark script (scripts/ci/benchmark-performance.ts) already exists and works; it just isn't triggered on PRs.

4. Smoke tests are fully opt-in (reaction-based)
End-to-end smoke tests (smoke-claude, smoke-codex, smoke-copilot) require a specific emoji reaction on the PR to run — they are not required checks. A PR that breaks real agent execution (the primary use case of this firewall) can be merged without any end-to-end validation.

🟡 Medium Priority

5. No per-file minimum coverage enforcement for critical security files
Jest's coverageThreshold only enforces global minimums. Critical files like host-iptables.ts (iptables management), squid-config.ts (proxy config generation), and containers/agent/setup-iptables.sh (network isolation) have no individual floor. A PR introducing a new security function without tests cannot be caught by coverage gating.

6. performance-monitor.yml uses unpinned action references
While most workflows pin actions to full SHAs (security best practice), performance-monitor.yml uses floating tags:

uses: actions/checkout@v4       # should be SHA-pinned
uses: actions/setup-node@v4     # should be SHA-pinned  
uses: actions/upload-artifact@v4 # should be SHA-pinned
uses: actions/github-script@v7  # should be SHA-pinned

This is inconsistent with every other workflow in the repo and exposes CI to supply chain attacks.

7. No SBOM (Software Bill of Materials) generation
There is no workflow that generates or attests an SBOM for published releases or container images. As a security-focused tool distributed to AI agent pipelines, the absence of an SBOM makes supply chain verification impossible for consumers.

8. No concurrency group on integration test workflow
test-integration-suite.yml lacks a concurrency: group. Rapid PR pushes can queue multiple simultaneous runs that compete for Docker networks and container names, causing flaky failures. (The build.yml workflow has the same gap.)

9. No mutation testing
With overall coverage at ~38%, it's unclear whether existing tests actually assert correct behavior or just execute code paths. Mutation testing (e.g., Stryker) would reveal whether tests fail when logic is broken — especially important for security-critical logic.

🟢 Low Priority

10. No bundle/artifact size tracking
The compiled dist/ output is not tracked for size regressions. Accidental dependency imports (e.g., bundling all of node_modules) could bloat the published package undetected.

11. No spell check on documentation
With extensive documentation (README, docs-site, many .md files), there is no cspell or similar check. The lint.yml runs markdownlint for formatting but not spelling.

12. Coverage threshold history not tracked
Coverage thresholds are hardcoded in jest.config.js at low values (last raised incrementally). There is no automated PR that raises thresholds as coverage improves — the test-coverage-improver.lock.yml agentic workflow exists but runs weekly and may not keep thresholds current.

13. No changelog enforcement
There is no check that CHANGELOG.md or release notes are updated with user-facing changes. The update-release-notes.lock.yml only runs at release time.

📋 Actionable Recommendations

#	Issue	Solution	Complexity	Impact
1	Low coverage thresholds	Add per-file thresholds for `cli.ts` and `docker-manager.ts` (e.g., statements: 40%); raise global thresholds incrementally to 60%	Low	High
2	No container scanning	Add Trivy or Grype scan step in `test-integration-suite.yml` after `docker build`; fail on CRITICAL vulnerabilities	Low	High
3	Performance not on PRs	Add `pull_request` trigger to `performance-monitor.yml` with fewer iterations (e.g., 3 vs 5)	Low	Medium
4	Opt-in smoke tests	Add a dedicated `smoke-required.yml` that runs a lightweight sanity check (e.g., `awf --allow-domains example.com curl (example.com/redacted)`) without needing a live LLM, triggered on every PR	Medium	High
5	Per-file coverage floor	Add `coverageThreshold` per-file entries for the 5 most critical source files in `jest.config.js`	Low	Medium
6	Unpinned actions in perf-monitor	Pin all action refs to full SHAs (run `gh api` to get SHA for each tag)	Low	Medium
7	No SBOM	Add `actions/attest-sbom` or `anchore/sbom-action` step to `release.yml`; upload SBOM as release artifact	Low	Medium
8	No concurrency group	Add `concurrency: group: ci-$\{\{ github.ref }} cancel-in-progress: true` to `test-integration-suite.yml` and `build.yml`	Low	Low
9	No mutation testing	Add Stryker Mutator for the `src/squid-config.ts` and `src/domain-patterns.ts` modules as a weekly scheduled check	High	Medium
10	Bundle size	Add `bundlesize` or `size-limit` check comparing `dist/` total size against a threshold	Low	Low

📈 Metrics Summary

Metric	Value
Total active workflows	52
On-disk workflow definition files	40 (22 YAML + 18 Markdown/agentic)
PR-blocking workflow count	~12 mandatory + smoke (opt-in)
Unit test files	25 (`src/*/.test.ts`)
Integration test files	34 (`tests/integration/*/.test.ts`)
Current statement coverage	38.39% (threshold: 38%)
Current branch coverage	31.78% (threshold: 30%)
`cli.ts` coverage	0%
`docker-manager.ts` coverage	18%
Build Verification success rate (last 30 runs)	90%
Integration Tests success rate (last 30 runs)	90%
Test Coverage success rate (last 30 runs)	73% (failures = coverage regression caught correctly)
Secret Digger (Copilot) success rate (last 8 runs)	75% (2 failures)

The pipeline is generally healthy with high pass rates for core checks. The primary risks are the critically low coverage on the most important files, absence of container image scanning despite the security-focused nature of the project, and no mandatory end-to-end smoke testing on PRs.

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Apr 3, 2026, 10:23 PM UTC

2026-04-03T22:51:07Z

github-actions[bot]
bot Apr 3, 2026
Author

This discussion was automatically closed because it expired on 2026-04-03T22:23:49.764Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1475

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1475

Uh oh!

github-actions[bot] bot Mar 27, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

🔍 Identified Gaps

🔴 High Priority

🟡 Medium Priority

🟢 Low Priority

📋 Actionable Recommendations

📈 Metrics Summary

Replies: 1 comment

Uh oh!

github-actions[bot] bot Apr 3, 2026 Author

github-actions[bot]
bot Mar 27, 2026

github-actions[bot]
bot Apr 3, 2026
Author