[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1505
Replies: 1 comment
-
|
🔮 The ancient spirits stir over this repository.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
The repository has a mature, multi-layered CI/CD system combining traditional GitHub Actions YAML workflows with agentic (AI-driven) workflows. As of March 2026:
performance-monitor.yml)Overall health is good: the pipeline covers a wide range of checks, but coverage thresholds are permissive and 9 of 34 integration test files have no CI workflow running them.
✅ Existing Quality Gates
On Every PR (against
main)build.ymltsc, dist verificationbuild.yml+lint.ymleslint-plugin-security, custom no-unsafe-execa rulelint.ymlmarkdownlint-cli2on all.mdfilestest-integration.yml(named "TypeScript Type Check")tsc --noEmitstrict modetest-coverage.ymltest-integration-suite.ymltest-chroot.ymltest-examples.ymltest-action.ymldependency-audit.ymlnpm audit --audit-level=high, SARIF upload to Security tabcodeql.ymljavascript-typescript+actionslanguages, security-extended queriespr-title.ymlsecurity-guard.lock.ymlbuild-test.lock.ymlsmoke-claude/codex/copilot/chroot.lock.ymlScheduled / Non-PR
🔍 Identified Gaps
🔴 High Priority
1. Very Low Coverage Thresholds
Current thresholds: 38% statements, 30% branches, 35% functions, 38% lines — and the two most critical files have near-zero coverage:
cli.ts: 0% coverage (0/69 statements)docker-manager.ts: 18% coverage (45/250 statements, 4% function coverage)These files orchestrate the entire AWF lifecycle. A PR could eliminate key functionality in these files and pass the coverage gate.
2. Nine Integration Test Files Have Zero CI Coverage
The following test files exist in
tests/integration/but are not run by any workflow job:gh-host-injection.test.tsghes-auto-populate.test.tshost-tcp-services.test.tsworkdir-tmpfs-hiding.test.tschroot-capsh-chain.test.tschroot-copilot-home.test.tsapi-proxy-observability.test.tsapi-proxy-rate-limit.test.tsapi-target-allowlist.test.tsSeveral of these test security-critical behaviors (credential hiding, capability drop, host service isolation).
3. No Container Image Vulnerability Scanning
The three Docker images (
squid,agent,api-proxy) are built from Ubuntu base images and installed packages. There is no Trivy, Grype, or similar scanner integrated into the PR workflow. A dependency update in a Dockerfile could introduce a high-severity CVE undetected.4. Performance Monitor Not Gated on PRs
performance-monitor.ymlruns weekly only. A PR that increases container startup time from 3s to 15s would be merged before the regression is detected. The benchmark infrastructure already exists (scripts/ci/benchmark-performance.ts) but isn't used in PR checks.🟡 Medium Priority
5. Coverage Regression Check Uses
continue-on-error: trueIn
test-coverage.yml(line 85), the coverage comparison step hascontinue-on-error: true. The final failure step (line 196-203) checkssteps.compare.outcome == 'failure', but if the compare script errors (not just detects regression), the outcome isfailurefor both regression and script errors. More critically, if base branch coverage isn't available (line 79 condition), no regression check runs at all — a PR can drop coverage from 80% to 10% without failing.6. No SBOM (Software Bill of Materials) Generation
No workflow generates or publishes a Software Bill of Materials for the npm packages or container images. This is increasingly expected for security-conscious tooling, especially for a product that wraps AI agents.
7. Smoke Tests Are Reaction-Gated (Not Automatic)
The smoke tests (
smoke-claude,smoke-codex,smoke-copilot) require specific emoji reactions to trigger on PRs (❤️, 🎉, 👀 respectively). They also run on a 12h schedule. This means a PR that breaks the Claude smoke test path may not be caught until after merge unless a reviewer manually triggers it.smoke-chrootdoes trigger automatically on relevant path changes (src/**,containers/**), which is better practice.8.
performance-monitor.ymlUses Unpinned Action SHAsUnlike all other workflows which use pinned SHA references (e.g.,
actions/checkout@de0fac2e...),performance-monitor.ymluses floating tags:This is a supply chain security inconsistency.
9. No User-Mode Integration Tests in CI
tests/user-mode.test.shexists but there is no CI workflow that runs it. The user-mode path (non-sudo execution) may regress silently.10. Integration Tests Run on All PRs Without Path Filtering
test-integration-suite.ymlruns 5 parallel jobs (each 45 min timeout) on every PR, including docs-only changes. Path filtering (similar to howtest-examples.ymlignores*.mdfiles) would reduce unnecessary CI load and faster feedback on docs PRs.🟢 Low Priority
11. No Prettier/Formatting Check
Only ESLint is enforced; no code formatter (Prettier) is configured. TypeScript code style varies across files. This is not a functional gap but increases review friction.
12. No Mutation Testing
With ~38% unit test coverage and low thresholds, the test suite may have low "kill rate" against mutations. Tools like Stryker could identify tests that pass regardless of code changes.
13. No macOS/Windows Build Verification
build.ymlonly targetsubuntu-latest. While AWF is Linux-focused (requires Docker + iptables), the CLI itself could theoretically surface installation issues on macOS (a common developer platform). Even a simplenpm ci && npm run build && npm run type-checkon macOS would catch platform-specific issues.14. No Test Retry Logic for Flaky Integration Tests
Network-dependent integration tests (Docker container startup, Squid proxy health checks) can flake under load. There's no
--retriesflag or retry step configuration. This increases false-negative noise in CI.15. Documentation Preview Not Deployed (Artifact Only)
docs-preview.ymlbuilds the Astro Starlight docs and uploads an artifact — reviewers must download and unzip to preview. A deployment to GitHub Pages or a preview service (e.g., Netlify/Cloudflare Pages) would significantly improve docs PR review experience.📋 Actionable Recommendations
High Priority
cli.tsanddocker-manager.tsaquasecurity/trivy-actiontobuild.ymlfor all three container images; upload SARIF to Security tabbuild.yml(container startup time < N seconds) using existingbenchmark-performance.tsMedium Priority
continue-on-errorperformance-monitor.ymlsmoke-claude/smoke-codex/smoke-copilotauto-trigger onsrc/**andcontainers/**path changes (likesmoke-chroot)test-user-modejob totest-integration-suite.ymlrunningtests/user-mode.test.shpaths-ignore: ['**/*.md', 'docs/**', 'docs-site/**']totest-integration-suite.ymlanchore/sbom-actiontorelease.yml; publish as release assetLow Priority
lint:formatscript; enforce inlint.ymlmacos-latesttobuild.ymlmatrix fornpm ci && tsconlyjest --retries=2for integration tests or useretry-on-error: truein step configdocs-site/**PRs📈 Metrics Summary
cli.tscoveragedocker-manager.tscoverageThe most impactful improvements, in order: (1) add missing integration tests to CI, (2) add container image scanning, (3) raise coverage thresholds for the two core modules. These are all low-to-medium complexity changes that would meaningfully raise the quality bar for every PR.
Beta Was this translation helpful? Give feedback.
All reactions