[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1439
Replies: 3 comments
-
|
This discussion was automatically closed because it expired on 2026-04-01T22:26:15.873Z.
|
Beta Was this translation helpful? Give feedback.
-
|
🔮 The ancient spirits stir in the firewall halls.
|
Beta Was this translation helpful? Give feedback.
-
|
🔮 The ancient spirits stir in the firewall halls.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
The repository has a comprehensive CI/CD pipeline with 21 agentic workflows (compiled
.md→.lock.yml) plus 14 traditional YAML workflows. All agentic workflows are compiled and operational. Recent PR runs show a mixed success picture with several persistent failures.Traditional workflows on PRs (always run):
tsc --noEmitstrict type checkingnpm audit→ SARIF to Security tab, fails on high/criticalAgentic workflows on PRs:
*Integration Tests are split across two workflow files; test-integration-suite.yml has 5 parallel jobs covering domain, network, protocol/security, container ops, and API proxy testing.
✅ Existing Quality Gates
Code Quality
eslint-rules/no-unsafe-execa.ts) — runs on every PRtsc --noEmit) — runs on every PRamannn/action-semantic-pull-requestTesting
awfinvocationsSecurity
Documentation
.mdfile changesPerformance
Operational
🔍 Identified Gaps
🔴 High Priority
1. Six integration test files are orphaned — never run in CI
The following test files exist in
tests/integration/but do not match any test pattern in eithertest-integration-suite.ymlortest-chroot.yml:api-target-allowlist.test.ts— API target domain auto-allow featurechroot-capsh-chain.test.ts— Capability chain security (chroot)chroot-copilot-home.test.ts— Copilot home directory isolationgh-host-injection.test.ts— GH_HOST injection securityghes-auto-populate.test.ts— GHES domain auto-populationworkdir-tmpfs-hiding.test.ts— tmpfs workdir hidingThese tests cover security-critical behaviors (capability dropping, home directory isolation, host injection). They are written but provide zero CI protection. Any regression in these areas would silently pass CI.
2. Security Guard is not reliably running on PRs
The
security-guard.mdagentic workflow (Claude-based AI security review) showed 0 successful runs out of 2 observed PR runs — both were "skipped." This means security-sensitive PRs may merge without AI security review. The workflow fires on all PRs (roles: all), suggesting an authentication or activation issue with the agentic runner.3. Test coverage thresholds are critically low
Current thresholds and actuals:
The thresholds are nearly identical to actual coverage, providing no safety margin. Most core source files (
cli.ts,docker-manager.ts) have less than 50% unit test coverage. Coverage is essentially at the floor — any test deletion would trigger a failure, but no meaningful coverage growth is incentivized.4. Build Test Suite failing on every PR
The
build-test.mdagentic workflow (Copilot) has failed in all 2 observed PR runs. This workflow tests 8 ecosystems (Bun, C++, Deno, .NET, Go, Java, Node.js, Rust) and is a required PR check — persistent failure means this gate is effectively non-functional.5. Chroot Integration Tests showing 50% pass rate
The Chroot Integration Tests failed in 1 of 2 recent PR runs. These tests validate the security-critical chroot isolation mechanism. Flakiness in security tests is particularly concerning and warrants investigation.
🟡 Medium Priority
6. Performance benchmarks not run on PRs
The
performance-monitor.ymlruns weekly on a schedule. Startup time regressions (awfcontainer spin-up) can only be discovered after merging, not during PR review. For a tool where fast startup is a key UX concern, per-PR benchmarking would catch regressions immediately.7. No container image security scanning
There is no Trivy, Grype, or similar container vulnerability scanning for the three custom Docker images (
containers/squid/,containers/agent/,containers/api-proxy/). Thenpm auditcovers the Node.js CLI but not the Ubuntu-based container base images, which may contain vulnerable system packages.8. Dependency Vulnerability Audit failing
The
dependency-audit.ymlworkflow shows a recent failure. If this is a real vulnerability (vs. a flaky SARIF upload), it indicates high/critical vulnerabilities in the dependency tree that have not been addressed.9. Documentation build not validated on code-only PRs
The
docs-preview.ymlis path-filtered todocs-site/**,docs/**, and*.md. PRs that change TypeScript source (src/**) never validate the docs build. If a doc page references a code snippet or type that changes, the documentation build failure will only be caught after merge (when deploy-docs.yml runs on push to main).10. Smoke tests are effectively opt-in on PRs
smoke-claude.md,smoke-copilot.md, andsmoke-codex.mdall technically trigger on PRs but require a specific emoji reaction (❤️, 👀, 🎉) to activate. In practice, they ran 0 times as actual smoke tests in recent PR runs. Thesmoke-codex.mdfailures also indicate broken smoke infrastructure for Codex. End-to-end smoke coverage is thus absent from the normal PR review cycle.🟢 Low Priority
11. No dist/ bundle size tracking
There is no artifact size monitoring for the compiled
dist/output. Accidental inclusion of large files or dependencies in the TypeScript build could go unnoticed.12. Markdownlint does not run on
.github/workflows/*.md(agentic workflow files)The
lint:mdscript runsmarkdownlint-cli2 '**/*.md' '#node_modules'but the agentic workflow.mdfiles have YAML frontmatter that markdownlint may flag as errors — it's unclear if these are excluded by config. Worth verifying the lint configuration covers or explicitly excludes workflow markdown files.13. No mutation testing
The project has a security-critical codebase (firewall, proxy config generation) where subtle logic bugs can have serious consequences. Mutation testing (e.g., Stryker) would help identify whether unit tests actually catch logic errors, not just execute code paths.
14. No Node.js version matrix for integration tests
Integration tests run only on Node.js 22. The build verification matrix tests Node 20 and 22, but integration tests are not validated on Node 20, even though Node 20 is the supported LTS version used in most CI steps.
📋 Actionable Recommendations
High Priority
test-integration-suite.ymlortest-chroot.ymlci-doctor.mdMedium Priority
build.ymlmeasuring container startup timedependency-audit.ymlor a dedicated workflowdocs-site/**path to a docs build validation step inbuild.ymlsmoke-basic.mdworkflow that runs a minimal smoke test unconditionally on every PR without requiring real AI agent credentialsLow Priority
du -sh dist/and compare against threshold inbuild.yml; use@pkg-size/actionor similarsquid-config.ts,domain-patterns.ts)node-version: ['20', '22']matrix to integration test suite📈 Metrics Summary
.yml).mdcompiled)Key takeaway: The foundational quality gates (lint, type-check, unit tests, build, CodeQL) are healthy and passing consistently. The critical gaps are in the integration test layer — 20% of integration tests never run in CI — and in reliability of the agentic workflows (Security Guard, Build Test Suite) that provide the highest-signal quality checks.
Beta Was this translation helpful? Give feedback.
All reactions