[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1497
Replies: 1 comment
-
|
🔮 The ancient spirits stir in the wires.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
The repository has a well-structured, multi-layered CI/CD pipeline with 40+ workflow files. The pipeline is healthy and actively maintained, covering builds, linting, security, integration tests, and AI-assisted PR review. All compiled agentic workflows (
.lock.yml) are up-to-date.Workflow inventory summary:
.yml).lock.yml)✅ Existing Quality Gates
The following checks currently run on pull requests:
build.ymllint.yml+build.ymllint.ymltest-integration.yml(alias)test-coverage.ymltest-coverage.ymltest-integration-suite.ymltest-chroot.ymltest-examples.ymltest-action.ymlcodeql.ymldependency-audit.ymlpr-title.ymllink-check.yml*.mdsecurity-guard.lock.ymlbuild-test.lock.ymldocs-preview.ymlScheduled/non-PR checks: weekly performance benchmarks, daily dependency monitoring, daily security review, weekly CLI flag consistency checker, hourly secret digger workflows.
🔍 Identified Gaps
🔴 High Priority
1. Critically low unit test coverage with dangerously low thresholds
The two most security-critical files in the codebase have near-zero unit test coverage:
cli.ts(entry point): 0% coverage — zero testsdocker-manager.ts(container orchestration): 18% statement coverage, 4% function coverageThe global thresholds in
jest.config.jsare set to 30–38%, which formally allows 0% on individual files. This means entire CLI code paths (argument parsing, signal handling, cleanup logic) and all container lifecycle code can silently regress with no unit-level detection.2. Performance benchmarks do not run on PRs
performance-monitor.ymlruns only on a weekly schedule (Mondays at 06:00 UTC). There is no per-PR performance comparison, meaning a PR could double container startup time or introduce a memory leak that only surfaces the following week. The benchmark script exists (scripts/ci/benchmark-performance.ts) — it just isn't wired to PRs.3. No container/Docker image security scanning
dependency-audit.ymlaudits npm packages, but no workflow scans Docker container images for vulnerabilities in the base OS layers (ubuntu:22.04,ubuntu/squid). A CVE in the base image or a system package would go undetected until a user reports it. Tools like Trivy or Grype can scan built images.4. No shell script linting (ShellCheck)
The
containers/agent/directory contains 6 security-critical shell scripts (entrypoint.sh,setup-iptables.sh,docker-stub.sh, etc.) that handle iptables rules, chroot, and credential isolation. No ShellCheck or equivalent linter runs in CI. Shell bugs in these files are a direct attack surface for container escapes or credential leaks.🟡 Medium Priority
5. Smoke tests for real agents are not automatically required on every PR
smoke-claude.lock.yml,smoke-codex.lock.yml, andsmoke-copilot.lock.ymlrequire a specific emoji reaction to trigger on PRs (❤️, 🎉, 👀 respectively). Onlysmoke-chroot.lock.ymlruns automatically on PRs touchingsrc/**orcontainers/**. End-to-end agent smoke tests are therefore skipped on most PRs, meaning a breaking change in proxy configuration or entrypoint could ship without being caught by the real-agent path.6. No per-file coverage thresholds for high-risk modules
jest.config.jsonly enforces global thresholds. There are no per-file or per-module minimums. The security-critical modules (docker-manager.ts,host-iptables.ts,cli.ts) could remain at 0% indefinitely as long as the global average stays above the low watermark.7. Integration tests not confirmed as required branch protection status checks
The
test-integration-suite.ymlandtest-chroot.ymlworkflows run on PRs, but there is no evidence they are configured as required status checks in branch protection rules. If they're optional, a PR author (or Copilot agent) could merge despite integration test failures.8. No static analysis for
redact-secrets.ts/ DLP code pathssrc/redact-secrets.tsandsrc/dlp.tsare security-sensitive modules handling secret redaction and DLP URL scanning.dlp.tshas no corresponding test file at all, andredact-secrets.tslacks visible coverage. Regressions here would directly impact the security promise of the firewall.9. Documentation preview build only runs for
docs-site/**changesdocs-preview.ymlonly triggers whendocs-site/**,docs/**,*.md, or the workflow file itself changes. Changes to TypeScript source that affect documented CLI behavior (e.g., new flags) do not validate the docs build.🟢 Low Priority
10. No multi-architecture (arm64) container build testing
All CI runs on
ubuntu-latest(x86_64). The Dockerfile-based containers are not built or tested forlinux/arm64. Users running AWF on Apple Silicon-based GitHub-hosted runners or self-hosted ARM machines may encounter silent architecture incompatibilities.11. No SBOM (Software Bill of Materials) generation
No workflow generates an SBOM for container images or the npm package. SBOM generation is increasingly expected for security-sensitive tooling distributed as container images and is required by some enterprise consumers (SLSA level 2+).
12. Performance benchmarks use unpinned action versions
performance-monitor.ymlusesactions/checkout@v4,actions/setup-node@v4,actions/upload-artifact@v4, andactions/github-script@v7without SHA pinning — inconsistent with the rest of the repository where all actions are pinned to full commit SHAs.13. No mutation testing
The high reported coverage numbers for
squid-config.ts(100%) andlogger.ts(100%) are not validated by mutation testing. Mutation testing (e.g., Stryker) would verify that tests actually detect logic regressions rather than just execute the code path.📋 Actionable Recommendations
Rec 1 — Add per-file coverage thresholds for high-risk modules
Issue:
cli.tsat 0%,docker-manager.tsat 18%.Solution: Add
coverageThresholdentries injest.config.jsfor individual files:Complexity: Low
Impact: High — forces incremental coverage improvement without blocking current work
Rec 2 — Add container image scanning with Trivy
Issue: No Docker image vulnerability scanning.
Solution: Add a step to
build.yml(or a dedicatedcontainer-scan.yml) after container builds:Complexity: Low
Impact: High — fills a significant gap in container supply chain security
Rec 3 — Add ShellCheck to CI
Issue: No shell script linting on security-critical container scripts.
Solution: Add a job to
lint.yml:Complexity: Low
Impact: High — catches common shell bugs in iptables and chroot scripts
Rec 4 — Wire performance benchmarks to PRs
Issue: No performance comparison between PR and base.
Solution: Add a
pull_requesttrigger toperformance-monitor.yml(or a separateperformance-pr.yml). Run benchmarks on both the PR branch and base, then post a comparison comment. Even running without regression gating is valuable for visibility.Complexity: Medium
Impact: Medium — prevents silent performance regressions shipping in minor PRs
Rec 5 — Enforce required status checks for integration tests
Issue: Integration tests may not be required for merging.
Solution: In repository Settings → Branches → Branch protection for
main, ensureIntegration Tests / Domain Tests,Integration Tests / Network Tests,Chroot Integration Tests / ...etc. are listed as required status checks.Complexity: Low (configuration, no code change)
Impact: High — ensures integration failures block merges
Rec 6 — Automatically trigger one smoke test on every PR
Issue: Smoke tests require manual emoji reactions.
Solution: Change
smoke-chroot.lock.ymlpath filter to cover all source changes (it already does), and optionally configure one of the agent smoke tests to run automatically on PRs touchingsrc/**orcontainers/**. Consider routing tosmoke-copilot(lightest agent) as the default automatic smoke test.Complexity: Medium
Impact: Medium — catches proxy/entrypoint breakage before merge
Rec 7 — Add
dlp.tsandredact-secrets.tsunit testsIssue: Zero visible test coverage for DLP and secret redaction modules.
Solution: Create
src/dlp.test.tsandsrc/redact-secrets.test.tstesting key public APIs. These are security-invariant modules where regressions are most dangerous.Complexity: Medium
Impact: High — DLP and redaction are core security features
Rec 8 — Pin action versions in
performance-monitor.ymlIssue: Unpinned action SHAs break supply chain consistency.
Solution: Replace
@v4/@v7references with full SHA pins, matching the pattern used in all other workflows.Complexity: Low
Impact: Low — consistency and supply chain hardening
📈 Metrics Summary
cli.tscoveragedocker-manager.tscoverageBeta Was this translation helpful? Give feedback.
All reactions