[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1393

2026-03-21T22:20:37Z

github-actions[bot]
bot Mar 21, 2026

📊 Current CI/CD Pipeline Status

The repository has a well-structured, multi-layered CI/CD pipeline with 18+ workflows running on pull requests. All agentic workflow files are compiled, and the pipeline covers linting, type checking, unit tests, integration tests, SAST, dependency auditing, and semantic PR title enforcement. The overall health is good, with most PR checks consistently passing.

Pipeline Architecture:

┌──────────────────────────────────────────────────────────────┐
│  Agentic Tests (PR + schedule)                               │
│  Smoke (Claude, Copilot, Codex, Chroot) · Build Test Suite  │
│  Security Guard (Claude AI review)                           │
├──────────────────────────────────────────────────────────────┤
│  Integration Tests (every PR)                                │
│  Domain/Network · Protocol/Security · Container/Ops ·        │
│  API Proxy · Chroot Languages · Chroot Package Managers      │
├──────────────────────────────────────────────────────────────┤
│  Unit Tests + Static Analysis (every PR)                     │
│  Jest (38% threshold) · TypeScript · ESLint · markdownlint  │
├──────────────────────────────────────────────────────────────┤
│  Security Scanning (every PR)                                │
│  CodeQL (JS/TS + Actions) · npm audit · Security Guard AI   │
└──────────────────────────────────────────────────────────────┘

✅ Existing Quality Gates

Check	Workflow	Trigger	Blocking
ESLint + markdownlint	`lint.yml`	Every PR	✅ Yes
TypeScript type check	`test-integration.yml`	Every PR	✅ Yes
Build verification (Node 20 + 22)	`build.yml`	Every PR	✅ Yes
API proxy unit tests	`build.yml`	Every PR	✅ Yes
Unit test coverage (≥38% threshold)	`test-coverage.yml`	Every PR	✅ Yes
Coverage regression detection	`test-coverage.yml`	Every PR	✅ Yes
TypeScript strict types	`test-integration.yml`	Every PR	✅ Yes
Domain & network integration tests	`test-integration-suite.yml`	Every PR	✅ Yes
Protocol & security integration tests	`test-integration-suite.yml`	Every PR	✅ Yes
Container & ops integration tests	`test-integration-suite.yml`	Every PR	✅ Yes
API proxy integration tests	`test-integration-suite.yml`	Every PR	✅ Yes
Chroot language support tests	`test-chroot.yml`	Every PR	✅ Yes
Chroot package manager tests	`test-chroot.yml`	Every PR	✅ Yes
CodeQL SAST (JS/TS + Actions)	`codeql.yml`	Every PR + weekly	✅ Yes
npm audit (fail on high/critical)	`dependency-audit.yml`	Every PR + weekly	✅ Yes
Semantic PR title	`pr-title.yml`	Every PR	✅ Yes
Example script smoke tests	`test-examples.yml`	Every PR	✅ Yes
Setup action tests	`test-action.yml`	Every PR	✅ Yes
Markdown link check	`link-check.yml`	PR (docs changes) + weekly	⚠️ Conditional
AI security review (Claude)	`security-guard.md`	Every PR	⚠️ Advisory
Multi-language build test suite	`build-test.md`	Every PR	⚠️ Advisory
Smoke tests (Claude/Copilot/Codex)	`smoke-*.md`	Reaction-gated + scheduled	⚠️ Optional
Container image signing + SBOM	`release.yml`	Release only	N/A
Performance benchmarks	`performance-monitor.yml`	Weekly only	❌ Not on PRs

🔍 Identified Gaps

🔴 High Priority

1. Five integration test files have no CI workflow coverage

The following test files exist in tests/integration/ but are not included in any CI job's --testPathPatterns:

api-target-allowlist.test.ts — tests that --copilot-api-target, --openai-api-target, and --anthropic-api-target values are auto-added to the firewall allowlist
gh-host-injection.test.ts — tests GitHub host injection behavior
ghes-auto-populate.test.ts — tests GHES auto-population logic
skip-pull.test.ts — tests --skip-pull flag behavior
workdir-tmpfs-hiding.test.ts — tests tmpfs hiding of working directories

These tests never run in CI. A PR could break any of these features without failing any check.

2. Critical source files have near-zero unit test coverage

The two largest, most critical files are essentially untested at the unit level:

cli.ts: 0% coverage (0/69 statements, 0/10 functions)
docker-manager.ts: 18% coverage (45/250 statements, 1/25 functions — only 1 of 25 functions tested)

The overall coverage thresholds (38% statements, 30% branches) are calibrated low enough to tolerate these zeros. A regression in the core CLI parsing or Docker orchestration logic would pass all coverage checks.

3. No container image vulnerability scanning on PRs

Container images (containers/agent/, containers/squid/, containers/api-proxy/) are the security-critical execution environment. Container signing and SBOM attestation only happen at release time. There is no Trivy/Grype/Docker Scout scan that gates PR merges when containers/** files change, leaving a window where a Dockerfile change introducing a vulnerable base image layer could be merged.

4. Performance regression testing not on PRs

performance-monitor.yml runs benchmarks weekly but results are never compared against PR changes. Container startup time, time-to-first-request, and cleanup latency are security-relevant (slow startup may mask issues) and user-experience-critical. A PR that doubles AWF startup time would pass all checks.

🟡 Medium Priority

5. --block-domains (domain deny-list) is completely untested

The integration test heat map in docs/INTEGRATION-TESTS.md explicitly identifies this as a gap: --block-domains has no unit tests, integration tests, or CI coverage at any tier. The deny-list is a core security feature — an operator using it to block known-malicious domains gets no regression protection.

6. --env-all flag behavior is untested in integration

--env-all (copy all host environment variables into the container, excluding a blocklist) has no integration test. This is a sensitive behavior: incorrect env passthrough could leak credentials into the container. The heat map confirms this gap.

7. Coverage thresholds too permissive for security-critical code

Global thresholds (branches: 30%, functions: 35%, statements: 38%) do not enforce per-file minimums. This allows cli.ts at 0% and docker-manager.ts at 18% to pass. A threshold that explicitly requires higher coverage for security-critical files (src/host-iptables.ts, src/squid-config.ts, src/docker-manager.ts) would better protect against regressions.

8. Build Test Suite reliability issue

The agentic build-test.md workflow had a 100% failure rate (1/1) in the most recent sample of PR runs. This is a critical end-to-end check that verifies AWF works as a sandbox for real software builds (Go, Rust, Java, Node.js, Bun, Deno, C++, .NET). An unreliable gate causes alert fatigue and may be dismissed.

9. Documentation preview silently degrades on build failure

docs-preview.yml uses continue-on-error: true on the build step, meaning a broken docs build still produces a "success" workflow result with a failure comment. Docs build failures are invisible in PR status checks. One failure was observed in recent PR data.

🟢 Low Priority

10. No artifact/bundle size monitoring

There is no check on the size of dist/ output or container image layers. A PR that accidentally includes large debug artifacts or adds a heavy dependency would not be flagged. This is important for an action used in CI/CD pipelines where download time matters.

11. No mutation testing

Current 38% coverage uses line/branch metrics that can be satisfied by tests that execute code without asserting correctness. Mutation testing (e.g., Stryker) would reveal whether tests actually verify behavior, especially for the Squid ACL generation logic in src/squid-config.ts.

12. Weekly performance baselines not persisted for PR comparison

performance-monitor.yml generates benchmark results and uploads as artifacts (90-day retention), but there is no mechanism to download the most recent baseline and compare against it in a PR workflow. The infrastructure to detect regressions exists but is not wired to PRs.

📋 Actionable Recommendations

Gap 1: Add missing test files to CI patterns

Solution: Add the five uncovered test files to test-integration-suite.yml jobs.

# In test-integration-suite.yml, add to api-proxy job or a new job:
--testPathPatterns="(api-proxy|api-target-allowlist|gh-host-injection|ghes-auto-populate|skip-pull|workdir-tmpfs-hiding)"

Complexity: Low | Impact: High — immediately starts catching regressions in these features

Gap 2: Raise per-file coverage thresholds for critical files

Solution: Add coverageThreshold per-file entries in jest.config.js for critical modules:

coverageThreshold: {
  global: { branches: 30, functions: 35, lines: 38, statements: 38 },
  './src/squid-config.ts': { statements: 95, functions: 95 },
  './src/host-iptables.ts': { statements: 80, functions: 95 },
  './src/docker-manager.ts': { statements: 40, functions: 30 },  // incremental ratchet
}

Complexity: Low | Impact: High — prevents coverage regression in most-critical files

Gap 3: Add container image scanning on `containers/**` PRs

Solution: Add a new workflow using Trivy (free, no external services):

on:
  pull_request:
    paths: ['containers/**']

Run aquasecurity/trivy-action on each Dockerfile target and upload SARIF to GitHub Security tab. Fail on CRITICAL vulnerabilities.
Complexity: Low | Impact: High — fills the window between PR merge and release scan

Gap 4: Wire performance baseline comparison to PRs

Solution: In performance-monitor.yml, commit benchmark results to a dedicated perf-baseline branch. In a new performance-pr.yml workflow, download the baseline artifact and compare on PRs that touch src/** or containers/**.
Complexity: Medium | Impact: Medium — catches startup/latency regressions before merge

Gap 5: Add `--block-domains` integration test

Solution: Add blocked-domains-denylist.test.ts to tests/integration/ testing that --block-domains example.com causes requests to example.com to return 403 even when example.com is in --allow-domains. Add to test-integration-suite.yml domain/network job.
Complexity: Low | Impact: High — protects a core security feature

Gap 6: Add `--env-all` integration test

Solution: Add env-all.test.ts verifying that host environment variables are (a) present in the container, (b) the blocklist (EXCLUDED_ENV_VARS) is honored, and (c) sensitive vars (GITHUB_TOKEN, etc.) are not present. Add to container/ops job.
Complexity: Low | Impact: Medium — prevents credential leakage regressions

Gap 8: Stabilize or fix Build Test Suite

Solution: Investigate the recent failure in build-test.md. Check if it's a flaky network dependency, missing Docker prerequisite, or logic bug. Consider adding a retry step or breaking out flaky languages into a separate optional job.
Complexity: Medium | Impact: High — restores confidence in this critical end-to-end check

Gap 9: Make documentation build failures visible

Solution: Remove continue-on-error: true from the build step in docs-preview.yml, or add a separate job that fails on build error:

- name: Build documentation
  run: cd docs-site && npm run build
  # Remove: continue-on-error: true

Complexity: Low | Impact: Low — makes docs build failures visible as PR check failures

📈 Metrics Summary

Metric	Value
Total workflow files	21 standard YAML + 21 agentic MD
Workflows triggered on PRs	15 blocking + 4 advisory
Unit test suites	6 (135 tests passing)
Integration test files	32 files, ~265 tests
Integration test files with CI coverage	27 / 32 (84%)
Overall statement coverage	38.39%
`cli.ts` statement coverage	0%
`docker-manager.ts` statement coverage	18%
Recent PR workflow success rate (blocking checks)	~95%
Recent PR workflow success rate (advisory agentic)	~75%
Performance benchmarks on PRs	0 (weekly only)
Container scanning on PRs	0 (release only)

Summary: The pipeline is comprehensive for an infrastructure security tool — integration tests are extensive and cover most critical paths. The primary risks are (1) 5 integration test files silently skipped in CI, (2) near-zero unit coverage on the two largest files, and (3) no container vulnerability scanning before merge. All three are low-complexity fixes with high impact.

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Mar 28, 2026, 10:20 PM UTC

2026-03-28T22:48:02Z

github-actions[bot]
bot Mar 28, 2026
Author

This discussion was automatically closed because it expired on 2026-03-28T22:20:36.874Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1393

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1393

Uh oh!

github-actions[bot] bot Mar 21, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

🔍 Identified Gaps

🔴 High Priority

🟡 Medium Priority

🟢 Low Priority

📋 Actionable Recommendations

Gap 1: Add missing test files to CI patterns

Gap 2: Raise per-file coverage thresholds for critical files

Gap 3: Add container image scanning on containers/** PRs

Gap 4: Wire performance baseline comparison to PRs

Gap 5: Add --block-domains integration test

Gap 6: Add --env-all integration test

Gap 8: Stabilize or fix Build Test Suite

Gap 9: Make documentation build failures visible

📈 Metrics Summary

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 28, 2026 Author

github-actions[bot]
bot Mar 21, 2026

Gap 3: Add container image scanning on `containers/**` PRs

Gap 5: Add `--block-domains` integration test

Gap 6: Add `--env-all` integration test

github-actions[bot]
bot Mar 28, 2026
Author