[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1439

2026-03-25T22:26:16Z

github-actions[bot]
bot Mar 25, 2026

📊 Current CI/CD Pipeline Status

The repository has a comprehensive CI/CD pipeline with 21 agentic workflows (compiled .md → .lock.yml) plus 14 traditional YAML workflows. All agentic workflows are compiled and operational. Recent PR runs show a mixed success picture with several persistent failures.

Traditional workflows on PRs (always run):

Workflow	Purpose	Recent Success Rate
Build Verification	Multi-node (20/22) build + ESLint + API proxy unit tests	100%
Lint	ESLint + markdownlint	100%
TypeScript Type Check	`tsc --noEmit` strict type checking	100%
Test Coverage	Jest unit tests with PR branch comparison	100%
CodeQL	SAST for JS/TS and GitHub Actions	100%
Integration Tests	5 parallel Docker-based test groups	~90%*
Chroot Integration Tests	4 parallel chroot language/pkg-manager tests	50%
Dependency Vulnerability Audit	`npm audit` → SARIF to Security tab, fails on high/critical	Failing
PR Title Check	Conventional commit validation	100%
Examples Test	Real end-to-end shell example execution	100%
Test Setup Action	action.yml installer validation	100%
Documentation Preview	Docs build preview (doc-path-triggered only)	Variable
Link Check	Broken link detection (md-path-triggered only)	N/A

Agentic workflows on PRs:

Workflow	Engine	Trigger	Recent Status
Security Guard	Claude	All PRs	Skipped (0/2 ran)
Build Test Suite	Copilot	All PRs	Failing (0/2 passed)
Smoke Claude	Claude	All PRs + 12h schedule	Skipped (opt-in via reaction)
Smoke Copilot	Copilot	All PRs + 12h schedule	Skipped (opt-in via reaction)
Smoke Codex	Codex	All PRs + 12h schedule	Failing (0/2 passed)
Smoke Chroot	Copilot	Path-filtered PRs	Variable

*Integration Tests are split across two workflow files; test-integration-suite.yml has 5 parallel jobs covering domain, network, protocol/security, container ops, and API proxy testing.

✅ Existing Quality Gates

Code Quality

ESLint with custom rules (eslint-rules/no-unsafe-execa.ts) — runs on every PR
TypeScript strict type checking (tsc --noEmit) — runs on every PR
Markdownlint — runs on every PR
Conventional commit title enforcement via amannn/action-semantic-pull-request

Testing

Unit tests (Jest, ~474 statements covered) with coverage comparison against base branch
Coverage enforcement — coverage regression blocks PRs; global thresholds enforced
Integration tests — 9 parallel Docker-based test jobs covering domain filtering, network security, protocol support, credential hiding, API proxy, chroot language support, package managers, /proc filesystem, and edge cases
Examples tests — shell scripts exercising actual awf invocations
API proxy unit tests (Node.js) — run within the Build Verification workflow

Security

CodeQL SAST (JS/TS + Actions analysis) — runs on every PR
npm audit (main + docs-site) — fails on high/critical vulnerabilities; uploads SARIF to GitHub Security tab
Security Guard (Claude AI) — designed to review all PRs for security-weakening changes (currently skipping)
Secret Digger (Claude/Codex/Copilot) — hourly scheduled scans

Documentation

Documentation preview build — triggered on doc-path changes
Link checker — triggered on .md file changes

Performance

Performance benchmarks — weekly scheduled run with regression detection and automatic issue creation

Operational

Smoke tests (Claude/Copilot/Codex) — end-to-end with real AI agents, scheduled every 12h and opt-in on PRs
Build Test Suite — multi-ecosystem build tests (Bun, C++, Deno, .NET, Go, Java, Node.js, Rust) on PRs

🔍 Identified Gaps

🔴 High Priority

1. Six integration test files are orphaned — never run in CI

The following test files exist in tests/integration/ but do not match any test pattern in either test-integration-suite.yml or test-chroot.yml:

api-target-allowlist.test.ts — API target domain auto-allow feature
chroot-capsh-chain.test.ts — Capability chain security (chroot)
chroot-copilot-home.test.ts — Copilot home directory isolation
gh-host-injection.test.ts — GH_HOST injection security
ghes-auto-populate.test.ts — GHES domain auto-population
workdir-tmpfs-hiding.test.ts — tmpfs workdir hiding

These tests cover security-critical behaviors (capability dropping, home directory isolation, host injection). They are written but provide zero CI protection. Any regression in these areas would silently pass CI.

2. Security Guard is not reliably running on PRs

The security-guard.md agentic workflow (Claude-based AI security review) showed 0 successful runs out of 2 observed PR runs — both were "skipped." This means security-sensitive PRs may merge without AI security review. The workflow fires on all PRs (roles: all), suggesting an authentication or activation issue with the agentic runner.

3. Test coverage thresholds are critically low

Current thresholds and actuals:

Metric	Threshold	Actual
Statements	38%	38.39%
Branches	30%	31.78%
Functions	35%	37.03%
Lines	38%	38.31%

The thresholds are nearly identical to actual coverage, providing no safety margin. Most core source files (cli.ts, docker-manager.ts) have less than 50% unit test coverage. Coverage is essentially at the floor — any test deletion would trigger a failure, but no meaningful coverage growth is incentivized.

4. Build Test Suite failing on every PR

The build-test.md agentic workflow (Copilot) has failed in all 2 observed PR runs. This workflow tests 8 ecosystems (Bun, C++, Deno, .NET, Go, Java, Node.js, Rust) and is a required PR check — persistent failure means this gate is effectively non-functional.

5. Chroot Integration Tests showing 50% pass rate

The Chroot Integration Tests failed in 1 of 2 recent PR runs. These tests validate the security-critical chroot isolation mechanism. Flakiness in security tests is particularly concerning and warrants investigation.

🟡 Medium Priority

6. Performance benchmarks not run on PRs

The performance-monitor.yml runs weekly on a schedule. Startup time regressions (awf container spin-up) can only be discovered after merging, not during PR review. For a tool where fast startup is a key UX concern, per-PR benchmarking would catch regressions immediately.

7. No container image security scanning

There is no Trivy, Grype, or similar container vulnerability scanning for the three custom Docker images (containers/squid/, containers/agent/, containers/api-proxy/). The npm audit covers the Node.js CLI but not the Ubuntu-based container base images, which may contain vulnerable system packages.

8. Dependency Vulnerability Audit failing

The dependency-audit.yml workflow shows a recent failure. If this is a real vulnerability (vs. a flaky SARIF upload), it indicates high/critical vulnerabilities in the dependency tree that have not been addressed.

9. Documentation build not validated on code-only PRs

The docs-preview.yml is path-filtered to docs-site/**, docs/**, and *.md. PRs that change TypeScript source (src/**) never validate the docs build. If a doc page references a code snippet or type that changes, the documentation build failure will only be caught after merge (when deploy-docs.yml runs on push to main).

10. Smoke tests are effectively opt-in on PRs

smoke-claude.md, smoke-copilot.md, and smoke-codex.md all technically trigger on PRs but require a specific emoji reaction (❤️, 👀, 🎉) to activate. In practice, they ran 0 times as actual smoke tests in recent PR runs. The smoke-codex.md failures also indicate broken smoke infrastructure for Codex. End-to-end smoke coverage is thus absent from the normal PR review cycle.

🟢 Low Priority

11. No dist/ bundle size tracking

There is no artifact size monitoring for the compiled dist/ output. Accidental inclusion of large files or dependencies in the TypeScript build could go unnoticed.

12. Markdownlint does not run on .github/workflows/*.md (agentic workflow files)

The lint:md script runs markdownlint-cli2 '**/*.md' '#node_modules' but the agentic workflow .md files have YAML frontmatter that markdownlint may flag as errors — it's unclear if these are excluded by config. Worth verifying the lint configuration covers or explicitly excludes workflow markdown files.

13. No mutation testing

The project has a security-critical codebase (firewall, proxy config generation) where subtle logic bugs can have serious consequences. Mutation testing (e.g., Stryker) would help identify whether unit tests actually catch logic errors, not just execute code paths.

14. No Node.js version matrix for integration tests

Integration tests run only on Node.js 22. The build verification matrix tests Node 20 and 22, but integration tests are not validated on Node 20, even though Node 20 is the supported LTS version used in most CI steps.

📋 Actionable Recommendations

High Priority

Gap	Recommendation	Complexity	Impact
Orphaned integration tests	Add the 6 uncovered test files to their respective workflow patterns in `test-integration-suite.yml` or `test-chroot.yml`	Low	High — activates dormant security tests immediately
Security Guard not running	Investigate and fix the activation/authentication issue; add monitoring via `ci-doctor.md`	Low-Medium	High — restores AI security review on all PRs
Low coverage thresholds	Raise thresholds to 50%/50%/50%/50% and track coverage trends with badges	Medium	High — incentivizes test coverage growth
Build Test Suite failing	Debug the clone failures or network access issues in the Copilot agent; consider adding retry logic	Medium	Medium — restores multi-ecosystem build validation
Chroot tests flaky	Add pre-test Docker cleanup to all chroot job steps; investigate failure logs	Low	High — reduces false failures on security-critical tests

Medium Priority

Gap	Recommendation	Complexity	Impact
Performance not on PRs	Add a lightweight benchmark step to `build.yml` measuring container startup time	Medium	Medium
No container scanning	Add Trivy scan of the three Docker images to `dependency-audit.yml` or a dedicated workflow	Low	High — covers OS-level vulnerabilities
Dependency audit failing	Triage the specific vulnerability, update or add exceptions, fix the workflow	Low	High — security debt
Docs not validated on code PRs	Add `docs-site/**` path to a docs build validation step in `build.yml`	Medium	Low-Medium
Smoke tests opt-in	Create a separate `smoke-basic.md` workflow that runs a minimal smoke test unconditionally on every PR without requiring real AI agent credentials	Medium	Medium

Low Priority

Gap	Recommendation	Complexity	Impact
Bundle size tracking	Add `du -sh dist/` and compare against threshold in `build.yml`; use `@pkg-size/action` or similar	Low	Low
Mutation testing	Evaluate Stryker for security-critical modules (`squid-config.ts`, `domain-patterns.ts`)	High	Medium
Node version matrix for integration tests	Add `node-version: ['20', '22']` matrix to integration test suite	Low	Low

📈 Metrics Summary

Metric	Value
Total workflow files (`.yml`)	39
Traditional YAML workflows	~18
Agentic workflows (`.md` compiled)	21
Workflows running on every PR	13
Integration test files	30
Integration test files not covered by CI	6 (20%)
Unit test coverage (statements)	38.39%
Unit test coverage threshold	38%
Recent PR workflow success rate (Build Verification)	100%
Recent PR workflow success rate (Chroot Integration)	50%
Recent PR workflow success rate (Build Test Suite)	0%
Recent PR workflow success rate (Security Guard)	0% (all skipped)

Key takeaway: The foundational quality gates (lint, type-check, unit tests, build, CodeQL) are healthy and passing consistently. The critical gaps are in the integration test layer — 20% of integration tests never run in CI — and in reliability of the agentic workflows (Security Guard, Build Test Suite) that provide the highest-signal quality checks.

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Apr 1, 2026, 10:26 PM UTC

2026-04-01T22:52:36Z

github-actions[bot]
bot Apr 1, 2026
Author

This discussion was automatically closed because it expired on 2026-04-01T22:26:15.873Z.

Closed by Workflow

0 replies

2026-04-02T00:21:52Z

github-actions[bot]
bot Apr 2, 2026
Author

🔮 The ancient spirits stir in the firewall halls.
The smoke-test agent has walked this thread, read the omens, and marked the path.
May your workflows pass under watchful stars.

🔮 The oracle has spoken through Smoke Codex

0 replies

2026-04-02T01:01:51Z

github-actions[bot]
bot Apr 2, 2026
Author

🔮 The ancient spirits stir in the firewall halls.
The smoke-test oracle has passed this way and etched its sign.
By moonlit packets and warded domains, this chamber is witnessed.
So it is foretold.

🔮 The oracle has spoken through Smoke Codex

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1439

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1439

Uh oh!

github-actions[bot] bot Mar 25, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

Code Quality

Testing

Security

Documentation

Performance

Operational

🔍 Identified Gaps

🔴 High Priority

🟡 Medium Priority

🟢 Low Priority

📋 Actionable Recommendations

High Priority

Medium Priority

Low Priority

📈 Metrics Summary

Replies: 3 comments

Uh oh!

github-actions[bot] bot Apr 1, 2026 Author

Uh oh!

github-actions[bot] bot Apr 2, 2026 Author

Uh oh!

github-actions[bot] bot Apr 2, 2026 Author

github-actions[bot]
bot Mar 25, 2026

github-actions[bot]
bot Apr 1, 2026
Author

github-actions[bot]
bot Apr 2, 2026
Author

github-actions[bot]
bot Apr 2, 2026
Author