[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1393
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-03-28T22:20:36.874Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
The repository has a well-structured, multi-layered CI/CD pipeline with 18+ workflows running on pull requests. All agentic workflow files are compiled, and the pipeline covers linting, type checking, unit tests, integration tests, SAST, dependency auditing, and semantic PR title enforcement. The overall health is good, with most PR checks consistently passing.
Pipeline Architecture:
✅ Existing Quality Gates
lint.ymltest-integration.ymlbuild.ymlbuild.ymltest-coverage.ymltest-coverage.ymltest-integration.ymltest-integration-suite.ymltest-integration-suite.ymltest-integration-suite.ymltest-integration-suite.ymltest-chroot.ymltest-chroot.ymlcodeql.ymldependency-audit.ymlpr-title.ymltest-examples.ymltest-action.ymllink-check.ymlsecurity-guard.mdbuild-test.mdsmoke-*.mdrelease.ymlperformance-monitor.yml🔍 Identified Gaps
🔴 High Priority
1. Five integration test files have no CI workflow coverage
The following test files exist in
tests/integration/but are not included in any CI job's--testPathPatterns:api-target-allowlist.test.ts— tests that--copilot-api-target,--openai-api-target, and--anthropic-api-targetvalues are auto-added to the firewall allowlistgh-host-injection.test.ts— tests GitHub host injection behaviorghes-auto-populate.test.ts— tests GHES auto-population logicskip-pull.test.ts— tests--skip-pullflag behaviorworkdir-tmpfs-hiding.test.ts— tests tmpfs hiding of working directoriesThese tests never run in CI. A PR could break any of these features without failing any check.
2. Critical source files have near-zero unit test coverage
The two largest, most critical files are essentially untested at the unit level:
cli.ts: 0% coverage (0/69 statements, 0/10 functions)docker-manager.ts: 18% coverage (45/250 statements, 1/25 functions — only 1 of 25 functions tested)The overall coverage thresholds (38% statements, 30% branches) are calibrated low enough to tolerate these zeros. A regression in the core CLI parsing or Docker orchestration logic would pass all coverage checks.
3. No container image vulnerability scanning on PRs
Container images (
containers/agent/,containers/squid/,containers/api-proxy/) are the security-critical execution environment. Container signing and SBOM attestation only happen at release time. There is no Trivy/Grype/Docker Scout scan that gates PR merges whencontainers/**files change, leaving a window where a Dockerfile change introducing a vulnerable base image layer could be merged.4. Performance regression testing not on PRs
performance-monitor.ymlruns benchmarks weekly but results are never compared against PR changes. Container startup time, time-to-first-request, and cleanup latency are security-relevant (slow startup may mask issues) and user-experience-critical. A PR that doubles AWF startup time would pass all checks.🟡 Medium Priority
5.
--block-domains(domain deny-list) is completely untestedThe integration test heat map in
docs/INTEGRATION-TESTS.mdexplicitly identifies this as a gap:--block-domainshas no unit tests, integration tests, or CI coverage at any tier. The deny-list is a core security feature — an operator using it to block known-malicious domains gets no regression protection.6.
--env-allflag behavior is untested in integration--env-all(copy all host environment variables into the container, excluding a blocklist) has no integration test. This is a sensitive behavior: incorrect env passthrough could leak credentials into the container. The heat map confirms this gap.7. Coverage thresholds too permissive for security-critical code
Global thresholds (branches: 30%, functions: 35%, statements: 38%) do not enforce per-file minimums. This allows
cli.tsat 0% anddocker-manager.tsat 18% to pass. A threshold that explicitly requires higher coverage for security-critical files (src/host-iptables.ts,src/squid-config.ts,src/docker-manager.ts) would better protect against regressions.8. Build Test Suite reliability issue
The agentic
build-test.mdworkflow had a 100% failure rate (1/1) in the most recent sample of PR runs. This is a critical end-to-end check that verifies AWF works as a sandbox for real software builds (Go, Rust, Java, Node.js, Bun, Deno, C++, .NET). An unreliable gate causes alert fatigue and may be dismissed.9. Documentation preview silently degrades on build failure
docs-preview.ymlusescontinue-on-error: trueon the build step, meaning a broken docs build still produces a "success" workflow result with a failure comment. Docs build failures are invisible in PR status checks. One failure was observed in recent PR data.🟢 Low Priority
10. No artifact/bundle size monitoring
There is no check on the size of
dist/output or container image layers. A PR that accidentally includes large debug artifacts or adds a heavy dependency would not be flagged. This is important for an action used in CI/CD pipelines where download time matters.11. No mutation testing
Current 38% coverage uses line/branch metrics that can be satisfied by tests that execute code without asserting correctness. Mutation testing (e.g., Stryker) would reveal whether tests actually verify behavior, especially for the Squid ACL generation logic in
src/squid-config.ts.12. Weekly performance baselines not persisted for PR comparison
performance-monitor.ymlgenerates benchmark results and uploads as artifacts (90-day retention), but there is no mechanism to download the most recent baseline and compare against it in a PR workflow. The infrastructure to detect regressions exists but is not wired to PRs.📋 Actionable Recommendations
Gap 1: Add missing test files to CI patterns
Solution: Add the five uncovered test files to
test-integration-suite.ymljobs.Complexity: Low | Impact: High — immediately starts catching regressions in these features
Gap 2: Raise per-file coverage thresholds for critical files
Solution: Add
coverageThresholdper-file entries injest.config.jsfor critical modules:Complexity: Low | Impact: High — prevents coverage regression in most-critical files
Gap 3: Add container image scanning on
containers/**PRsSolution: Add a new workflow using Trivy (free, no external services):
Run
aquasecurity/trivy-actionon each Dockerfile target and upload SARIF to GitHub Security tab. Fail on CRITICAL vulnerabilities.Complexity: Low | Impact: High — fills the window between PR merge and release scan
Gap 4: Wire performance baseline comparison to PRs
Solution: In
performance-monitor.yml, commit benchmark results to a dedicatedperf-baselinebranch. In a newperformance-pr.ymlworkflow, download the baseline artifact and compare on PRs that touchsrc/**orcontainers/**.Complexity: Medium | Impact: Medium — catches startup/latency regressions before merge
Gap 5: Add
--block-domainsintegration testSolution: Add
blocked-domains-denylist.test.tstotests/integration/testing that--block-domains example.comcauses requests toexample.comto return 403 even whenexample.comis in--allow-domains. Add totest-integration-suite.ymldomain/network job.Complexity: Low | Impact: High — protects a core security feature
Gap 6: Add
--env-allintegration testSolution: Add
env-all.test.tsverifying that host environment variables are (a) present in the container, (b) the blocklist (EXCLUDED_ENV_VARS) is honored, and (c) sensitive vars (GITHUB_TOKEN, etc.) are not present. Add to container/ops job.Complexity: Low | Impact: Medium — prevents credential leakage regressions
Gap 8: Stabilize or fix Build Test Suite
Solution: Investigate the recent failure in
build-test.md. Check if it's a flaky network dependency, missing Docker prerequisite, or logic bug. Consider adding aretrystep or breaking out flaky languages into a separate optional job.Complexity: Medium | Impact: High — restores confidence in this critical end-to-end check
Gap 9: Make documentation build failures visible
Solution: Remove
continue-on-error: truefrom the build step indocs-preview.yml, or add a separate job that fails on build error:Complexity: Low | Impact: Low — makes docs build failures visible as PR check failures
📈 Metrics Summary
cli.tsstatement coveragedocker-manager.tsstatement coverageSummary: The pipeline is comprehensive for an infrastructure security tool — integration tests are extensive and cover most critical paths. The primary risks are (1) 5 integration test files silently skipped in CI, (2) near-zero unit coverage on the two largest files, and (3) no container vulnerability scanning before merge. All three are low-complexity fixes with high impact.
Beta Was this translation helpful? Give feedback.
All reactions