[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1475
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-04-03T22:23:49.764Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
The repository has a mature and well-structured CI/CD pipeline with 20+ active workflows covering build verification, unit and integration testing, security scanning, documentation, and AI-assisted review. The majority of PR-blocking checks pass reliably.
Total active workflows: 52 (GitHub Actions API count), including 22 on-disk workflow files plus dynamic Copilot and CodeQL workflows.
✅ Existing Quality Gates
The following checks run on every PR targeting
main:Build & Compilation
build.yml— TypeScript compilation, Node.js 20 + 22 matrixtest-integration.yml—tsc --noEmitstrict type checkingCode Quality
lint.yml— ESLint onsrc/+ markdownlint on all.mdfilespr-title.yml— Conventional commit format enforcement (amannn/action-semantic-pull-request)Testing
test-coverage.yml— Unit tests with coverage; PR comment with comparison to base branch; fails on regressiontest-integration-suite.yml— 5 parallel integration test jobs: Domain, Network, Protocol & Security, Container Ops, API Proxytest-chroot.yml— Chroot language support (Python, Go, Java, .NET, Ruby, Node)test-examples.yml— Validates example scriptstest-action.yml— Tests the setup GitHub ActionSecurity
codeql.yml— CodeQL analysis (javascript-typescript + actions)dependency-audit.yml—npm auditwith SARIF upload to Security tabsecurity-guard.lock.yml— AI-powered PR review (Claude) for security boundary changesDocumentation
docs-preview.yml— Builds documentation site for doc-touching PRslink-check.yml— Lychee link validation on.mdfile changesAutomated AI Review
build-test.lock.yml— Copilot agent runs builds/tests and comments on PRs🔍 Identified Gaps
🔴 High Priority
1. Coverage thresholds are dangerously low
Current global thresholds: branches 30%, functions 35%, lines/statements 38%. The two most critical files have alarmingly low coverage:
cli.ts— 0% coverage (0/69 statements)docker-manager.ts— 18% coverage (45/250 statements, 4% functions)Both would continue to pass the coverage gate despite being near-zero. A PR can delete all tests for these files and CI would still be green.
2. No container image security scanning
Docker images (
containers/squid/,containers/agent/,containers/api-proxy/) are built and published without any image vulnerability scanning (no Trivy, Grype, or Snyk container scan). These images run with elevated privileges and process untrusted agent traffic — a compromised base image is a critical attack vector.3. Performance benchmarks do not run on PRs
performance-monitor.ymlruns weekly on a schedule only. Startup latency and memory regressions introduced by PRs are not detected until one week later. The benchmark script (scripts/ci/benchmark-performance.ts) already exists and works; it just isn't triggered on PRs.4. Smoke tests are fully opt-in (reaction-based)
End-to-end smoke tests (smoke-claude, smoke-codex, smoke-copilot) require a specific emoji reaction on the PR to run — they are not required checks. A PR that breaks real agent execution (the primary use case of this firewall) can be merged without any end-to-end validation.
🟡 Medium Priority
5. No per-file minimum coverage enforcement for critical security files
Jest's
coverageThresholdonly enforces global minimums. Critical files likehost-iptables.ts(iptables management),squid-config.ts(proxy config generation), andcontainers/agent/setup-iptables.sh(network isolation) have no individual floor. A PR introducing a new security function without tests cannot be caught by coverage gating.6.
performance-monitor.ymluses unpinned action referencesWhile most workflows pin actions to full SHAs (security best practice),
performance-monitor.ymluses floating tags:This is inconsistent with every other workflow in the repo and exposes CI to supply chain attacks.
7. No SBOM (Software Bill of Materials) generation
There is no workflow that generates or attests an SBOM for published releases or container images. As a security-focused tool distributed to AI agent pipelines, the absence of an SBOM makes supply chain verification impossible for consumers.
8. No concurrency group on integration test workflow
test-integration-suite.ymllacks aconcurrency:group. Rapid PR pushes can queue multiple simultaneous runs that compete for Docker networks and container names, causing flaky failures. (Thebuild.ymlworkflow has the same gap.)9. No mutation testing
With overall coverage at ~38%, it's unclear whether existing tests actually assert correct behavior or just execute code paths. Mutation testing (e.g., Stryker) would reveal whether tests fail when logic is broken — especially important for security-critical logic.
🟢 Low Priority
10. No bundle/artifact size tracking
The compiled
dist/output is not tracked for size regressions. Accidental dependency imports (e.g., bundling all ofnode_modules) could bloat the published package undetected.11. No spell check on documentation
With extensive documentation (README, docs-site, many
.mdfiles), there is no cspell or similar check. Thelint.ymlruns markdownlint for formatting but not spelling.12. Coverage threshold history not tracked
Coverage thresholds are hardcoded in
jest.config.jsat low values (last raised incrementally). There is no automated PR that raises thresholds as coverage improves — thetest-coverage-improver.lock.ymlagentic workflow exists but runs weekly and may not keep thresholds current.13. No changelog enforcement
There is no check that
CHANGELOG.mdor release notes are updated with user-facing changes. Theupdate-release-notes.lock.ymlonly runs at release time.📋 Actionable Recommendations
cli.tsanddocker-manager.ts(e.g., statements: 40%); raise global thresholds incrementally to 60%test-integration-suite.ymlafterdocker build; fail on CRITICAL vulnerabilitiespull_requesttrigger toperformance-monitor.ymlwith fewer iterations (e.g., 3 vs 5)smoke-required.ymlthat runs a lightweight sanity check (e.g.,awf --allow-domains example.com curl (example.com/redacted)) without needing a live LLM, triggered on every PRcoverageThresholdper-file entries for the 5 most critical source files injest.config.jsgh apito get SHA for each tag)actions/attest-sbomoranchore/sbom-actionstep torelease.yml; upload SBOM as release artifactconcurrency: group: ci-$\{\{ github.ref }} cancel-in-progress: truetotest-integration-suite.ymlandbuild.ymlsrc/squid-config.tsandsrc/domain-patterns.tsmodules as a weekly scheduled checkbundlesizeorsize-limitcheck comparingdist/total size against a threshold📈 Metrics Summary
src/**/*.test.ts)tests/integration/**/*.test.ts)cli.tscoveragedocker-manager.tscoverageThe pipeline is generally healthy with high pass rates for core checks. The primary risks are the critically low coverage on the most important files, absence of container image scanning despite the security-focused nature of the project, and no mandatory end-to-end smoke testing on PRs.
Beta Was this translation helpful? Give feedback.
All reactions