[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1364

2026-03-18T22:24:10Z

github-actions[bot]
bot Mar 18, 2026

This is an automated analysis of the CI/CD pipeline and integration test coverage in this repository, with actionable recommendations for improving PR quality measurement.

📊 Current CI/CD Pipeline Status

The repository has a well-structured, multi-layered CI/CD pipeline with 40 YAML workflows and 21 agentic (.md) workflows — 61 total. The pipeline covers build verification, linting, type checking, unit tests, integration tests, security scanning, documentation, and end-to-end smoke testing.

Workflows running on pull_request events:

Workflow	Type	Purpose
`build.yml`	Static	Build verification (Node 20 + 22 matrix) + API proxy unit tests
`lint.yml`	Static	ESLint + Markdownlint
`test-integration.yml`	Static	Integration tests (4 parallel jobs: domain/network, protocol/security, container-ops, API proxy)
`test-integration-suite.yml`	Static	Integration tests (duplicate of above — same content, same name)
`test-chroot.yml`	Static	Chroot integration tests (languages, package managers, procfs, edge cases)
`test-examples.yml`	Static	Runs all `examples/*.sh` scripts end-to-end
`test-action.yml`	Static	Tests the GitHub Action (setup, versioning, image pull)
`test-coverage.yml`	Static	Unit test coverage with PR comment comparison
`codeql.yml`	Static	CodeQL SAST for JS/TS and Actions
`dependency-audit.yml`	Static	npm audit (fails on high/critical CVEs)
`container-scan.yml`	Static	Trivy scan (only on `containers/**` path changes)
`pr-title.yml`	Static	Semantic PR title enforcement
`docs-preview.yml`	Static	Documentation build preview (only on doc path changes)
`link-check.yml`	Static	Broken link check (only on `*.md` path changes)
`build-test.md`	Agentic (Copilot)	Multi-ecosystem build test (Bun, C++, Deno, .NET, Go, Java, Node.js, Rust)
`security-guard.md`	Agentic (Claude)	AI-powered security review of changes
`smoke-claude.md`	Agentic (Claude)	End-to-end smoke test with Claude agent
`smoke-codex.md`	Agentic (Codex)	End-to-end smoke test with Codex agent
`smoke-copilot.md`	Agentic (Copilot)	End-to-end smoke test with Copilot agent
`smoke-chroot.md`	Agentic (Copilot)	Smoke test for chroot behavior

✅ Existing Quality Gates

Code Quality

✅ ESLint — TypeScript linting on every PR
✅ Markdownlint — Markdown formatting validation
✅ TypeScript type check — Strict type checking via tsc --noEmit
✅ Build verification — Compiles TypeScript on Node 20 + 22 matrix
✅ Semantic PR titles — Enforced via amannn/action-semantic-pull-request

Testing

✅ Unit tests with coverage — Jest with 38% statement threshold; reports coverage delta in PR comments
✅ Integration tests (30+ test files) — Domain filtering, DNS, protocols, container ops, API proxy, chroot languages, package managers
✅ Examples test — All example shell scripts verified end-to-end
✅ Setup action test — GitHub Action versioning and image pull tested

Security

✅ CodeQL SAST — JavaScript/TypeScript and Actions language analysis
✅ Dependency audit — npm audit --audit-level=high for main + docs-site
✅ Container scanning — Trivy (HIGH/CRITICAL) on agent and squid containers
✅ AI security guard — Claude reviews every PR for security boundary changes
✅ Secret diggers — Three hourly agentic workflows scanning for leaked secrets

Documentation

✅ Docs preview — Astro/Starlight site builds verified on doc changes
✅ Link checker — Lychee checks broken links on markdown changes

Smoke Tests

✅ Multi-agent smoke tests — Smoke tests for Claude, Codex, Copilot, and chroot (run on PRs but gated by emoji reactions)

🔍 Identified Gaps

🔴 High Priority

1. 7 Integration Test Files Not Executed in CI

Seven integration test files exist in tests/integration/ but do not match any --testPathPatterns in any CI workflow:

Missing Test File	Security Relevance
`api-target-allowlist.test.ts`	Validates API targets are auto-added to domain allowlist
`chroot-capsh-chain.test.ts`	Validates capability dropping in chroot
`chroot-copilot-home.test.ts`	Validates whitelisted home directory isolation
`gh-host-injection.test.ts`	Tests GH_HOST injection prevention
`ghes-auto-populate.test.ts`	Tests GHES domain auto-population
`skip-pull.test.ts`	Tests `--skip-pull` flag behavior
`workdir-tmpfs-hiding.test.ts`	Tests workdir tmpfs isolation

Several of these (chroot-capsh-chain, gh-host-injection, chroot-copilot-home) are security-critical tests that verify the firewall's isolation guarantees are not silently broken by code changes.

2. Critically Low Unit Test Coverage — Core Files at Near-Zero

From COVERAGE_SUMMARY.md:

File	Statement Coverage	Priority
`cli.ts`	0%	🔴 Critical
`docker-manager.ts`	18%	🔴 Critical
`host-iptables.ts`	83%	🟡 Good

cli.ts (entry point, signal handling, orchestration) and docker-manager.ts (all container lifecycle logic, compose generation, bind mount config) are the two most important files and are essentially untested at the unit level. A refactor in either file could introduce regressions that slip through.

3. Coverage Thresholds Are Too Low to Be Meaningful

Current thresholds: Statements 38%, Branches 30%, Functions 35%, Lines 38%. Given that cli.ts is 0% and docker-manager.ts is 18%, these thresholds can pass while the most important code paths have no coverage at all. The thresholds do not enforce coverage on security-critical paths.

4. Container Security Scan Has a Path Filter Gap

container-scan.yml only triggers on containers/** path changes. Changes to src/docker-manager.ts or src/squid-config.ts that alter container configuration, mount points, or capabilities do not retrigger the container scan — even though those source changes directly affect runtime security posture.

🟡 Medium Priority

5. Duplicate Workflow Definition (`test-integration.yml` = `test-integration-suite.yml`)

Both files have the name Integration Tests and identical content (4 parallel jobs: domain/network, protocol/security, container-ops, API proxy). This causes confusion in the PR check list and doubles the build cost with no added value. One should be removed or differentiated.

6. Smoke Tests Are Not Automatic — Require Emoji Reaction

The agentic smoke tests (smoke-claude.md, smoke-codex.md, smoke-copilot.md, smoke-chroot.md) run on PRs but only when a maintainer adds a specific emoji reaction (❤️, 🎉, 👀, 🚀). They do not run automatically. This means a PR that breaks the actual Claude/Copilot/Codex agent execution can merge without the smoke tests ever firing.

7. Performance Benchmarks Never Run on PRs

performance-monitor.yml only runs on a weekly schedule. A PR that introduces a 2× container startup regression would not be caught until the following week. No performance gate exists on the PR merge path.

8. `api-proxy` Container Not Scanned by Trivy

container-scan.yml scans awf-agent and awf-squid but the API proxy sidecar (containers/api-proxy/) is not scanned. The API proxy handles real API credentials (OpenAI, Anthropic, Copilot tokens) and runs as a network-accessible service, making it a high-value target for CVEs.

9. No SBOM (Software Bill of Materials) Generation

No workflow generates or attaches an SBOM to releases. For a security tool distributed as a Docker image and npm binary, SBOM attestation is increasingly expected for supply chain transparency. This is especially relevant since the project publishes to GHCR.

10. No Coverage Enforcement Per File or Per Module

Coverage is enforced globally (38% statements project-wide) but not per-module. A contributor could add 1000 new lines with 0% coverage to docker-manager.ts and the global threshold would still pass, as long as other covered files compensate.

🟢 Low Priority

11. No License Compliance Check

No workflow scans dependencies for license compatibility. As a tool used in enterprise/CI environments and distributed on npm/GHCR, license drift (a dependency changing from MIT to GPL/AGPL) should be automatically detected.

12. No Spell Check on Documentation

The link checker (link-check.yml) validates URLs but there is no spell check or prose style linting on documentation. The docs site (docs-site/) targets enterprise users and engineers who may file issues for documentation errors.

13. Documentation Build Not Triggered by Code Changes

docs-preview.yml only builds the docs when docs-site/**, docs/**, or *.md files change. A change to src/ that adds a new CLI flag would not trigger a docs preview build. Manual verification is needed to confirm docs remain accurate after code changes.

14. No Commit Message Validation in CI

commitlint is configured (via commitlint.config.js + husky) as a local pre-commit hook, but there is no CI enforcement. Commits merged via the GitHub UI, squash-merges from PRs, or commits from automated tools bypass the hook entirely.

📋 Actionable Recommendations

R1: Add Missing Integration Tests to CI Matrix [High | Low Complexity]

Issue: 7 integration test files never run in CI.
Fix: Add the missing test patterns to test-integration.yml:

- name: Run security isolation tests
  run: |
    npm run test:integration -- \
      --testPathPatterns="(api-target-allowlist|chroot-capsh-chain|chroot-copilot-home|gh-host-injection|ghes-auto-populate|skip-pull|workdir-tmpfs-hiding)" \
      --verbose

Impact: Catches regressions in security-critical isolation paths that are currently invisible to CI.

R2: Increase Coverage Thresholds and Add Per-File Minimums [High | Medium Complexity]

Issue: 38% global threshold allows critical files to have 0% coverage.
Fix: Raise global thresholds incrementally and add per-file overrides in jest.config.js:

coverageThreshold: {
  global: { branches: 50, functions: 60, lines: 55, statements: 55 },
  './src/docker-manager.ts': { statements: 40 },
  './src/cli.ts': { statements: 30 },
}

Impact: Forces test investment in the highest-risk files.

R3: Expand Container Security Scan Trigger Paths [High | Low Complexity]

Issue: Container scan skips PRs that change container config in src/.
Fix: Add src/** to the paths: filter in container-scan.yml trigger.
Impact: Ensures every code change that could affect container security posture triggers a Trivy scan.

R4: Add Trivy Scan for API Proxy Container [Medium | Low Complexity]

Issue: API proxy container is excluded from security scanning.
Fix: Add a third scan-api-proxy job to container-scan.yml mirroring the existing scan-agent job with ./containers/api-proxy.
Impact: Closes a CVE blind spot on the component that holds real API credentials.

R5: Remove Duplicate Integration Test Workflow [Medium | Low Complexity]

Issue: test-integration.yml and test-integration-suite.yml are identical.
Fix: Delete one file; keep the one with better path filtering.
Impact: Halves unnecessary CI runtime and removes check list confusion.

R6: Make Smoke Tests Automatically Run on PRs (Opt-Out Model) [Medium | Medium Complexity]

Issue: Smoke tests only run when maintainer adds emoji reaction.
Fix: Run smoke tests automatically on PRs with roles: maintainer to avoid burning runner minutes on external contributor PRs. Or add a required smoke test for a single agent (e.g., smoke-copilot.md) to block merges.
Impact: Prevents merging PRs that silently break the end-to-end agent execution flow.

R7: Add Performance Gate on PRs [Medium | Medium Complexity]

Issue: Performance regressions only detected weekly.
Fix: Add a lightweight startup-time benchmark step (container up + simple command) to build.yml or a new PR-targeted workflow. Fail if time exceeds a 2× threshold vs. a stored baseline.
Impact: Catches startup regressions before they reach users.

R8: Add SBOM Generation to Release Workflow [Medium | Low Complexity]

Issue: No supply chain transparency for releases.
Fix: Add anchore/sbom-action to release.yml and attach SBOM to GitHub Release assets.
Impact: Meets enterprise compliance requirements and improves supply chain security posture.

R9: Add License Compliance Scanning [Low | Low Complexity]

Issue: No license drift detection.
Fix: Add license-checker or licensee as a CI step in dependency-audit.yml:

npx license-checker --onlyAllow 'MIT;Apache-2.0;BSD-2-Clause;BSD-3-Clause;ISC;CC0-1.0'

Impact: Prevents accidental introduction of copyleft dependencies.

R10: Enforce Commitlint in CI [Low | Low Complexity]

Issue: Commit message convention only enforced locally via husky.
Fix: Add a step to lint.yml that runs commitlint on the PR's commits via npx commitlint --from origin/main --to HEAD.
Impact: Ensures consistent commit history regardless of how commits are created.

📈 Metrics Summary

Metric	Value
Total workflow files	61 (40 YAML + 21 agentic)
Workflows running on PRs	~20
Unit test files	6
Unit test count	135
Statement coverage	38.39% (threshold: 38%)
Branch coverage	31.78% (threshold: 30%)
Integration test files	30
Integration test files not in CI	7 (23%)
Security scanning tools	CodeQL, Trivy, npm audit, AI security guard
`cli.ts` coverage	0%
`docker-manager.ts` coverage	18%
Recent PR Title Check failure rate	~20% (non-conforming PR titles)
Containers scanned by Trivy	2 of 3 (API proxy missing)

Generated by automated CI/CD gap assessment workflow on 2026-03-18.

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Mar 25, 2026, 10:24 PM UTC

2026-03-25T22:51:31Z

github-actions[bot]
bot Mar 25, 2026
Author

This discussion was automatically closed because it expired on 2026-03-25T22:24:09.774Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1364

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1364

Uh oh!

github-actions[bot] bot Mar 18, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

Code Quality

Testing

Security

Documentation

Smoke Tests

🔍 Identified Gaps

🔴 High Priority

1. 7 Integration Test Files Not Executed in CI

2. Critically Low Unit Test Coverage — Core Files at Near-Zero

3. Coverage Thresholds Are Too Low to Be Meaningful

4. Container Security Scan Has a Path Filter Gap

🟡 Medium Priority

5. Duplicate Workflow Definition (test-integration.yml = test-integration-suite.yml)

6. Smoke Tests Are Not Automatic — Require Emoji Reaction

7. Performance Benchmarks Never Run on PRs

8. api-proxy Container Not Scanned by Trivy

9. No SBOM (Software Bill of Materials) Generation

10. No Coverage Enforcement Per File or Per Module

🟢 Low Priority

11. No License Compliance Check

12. No Spell Check on Documentation

13. Documentation Build Not Triggered by Code Changes

14. No Commit Message Validation in CI

📋 Actionable Recommendations

R1: Add Missing Integration Tests to CI Matrix [High | Low Complexity]

R2: Increase Coverage Thresholds and Add Per-File Minimums [High | Medium Complexity]

R3: Expand Container Security Scan Trigger Paths [High | Low Complexity]

R4: Add Trivy Scan for API Proxy Container [Medium | Low Complexity]

R5: Remove Duplicate Integration Test Workflow [Medium | Low Complexity]

R6: Make Smoke Tests Automatically Run on PRs (Opt-Out Model) [Medium | Medium Complexity]

R7: Add Performance Gate on PRs [Medium | Medium Complexity]

R8: Add SBOM Generation to Release Workflow [Medium | Low Complexity]

R9: Add License Compliance Scanning [Low | Low Complexity]

R10: Enforce Commitlint in CI [Low | Low Complexity]

📈 Metrics Summary

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 25, 2026 Author

github-actions[bot]
bot Mar 18, 2026

5. Duplicate Workflow Definition (`test-integration.yml` = `test-integration-suite.yml`)

8. `api-proxy` Container Not Scanned by Trivy

github-actions[bot]
bot Mar 25, 2026
Author