[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1471
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-04-02T22:20:57.353Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
The repository has a mature and well-structured CI/CD pipeline with both standard GitHub Actions workflows and an AI-powered agentic layer. As of March 2026:
.lock.yml)performance-monitor.yml)✅ Existing Quality Gates
tsc --noEmitstrict type checking/procfilesystem, edge cases (4 parallel jobs)npm auditfor main package and docs-site, SARIF upload to Security tab, fails onhigh/criticalfeat:,fix:,docs:, etc.).mdchanges only) + weekly🔍 Identified Gaps
🔴 High Priority
1. Critically Low Test Coverage on Core Modules
The two most important source files have negligible unit test coverage:
cli.ts— 0% coverage (69 statements, 17 branches, 10 functions)docker-manager.ts— 18% coverage (45/250 statements, 22% branches, 4% functions)These files implement the entire CLI argument parsing, container orchestration, environment variable handling, and cleanup lifecycle — i.e., the core functionality. The overall threshold of 38% statements is barely enforced and provides false confidence.
2. No Container Image Security Scanning (CVE Scanning)
The three Docker images (
containers/squid/,containers/agent/,containers/api-proxy/) are built and used in integration tests but never scanned for OS-level CVEs. The dependency audit only covers npm packages, not:Tools like Trivy, Grype, or Docker Scout can catch this class of vulnerabilities.
3. Performance Regression Not Detected on PRs
The
performance-monitor.ymlworkflow runs weekly only. A PR could introduce a 10x startup time regression and it would not be caught until the weekly run, by which time the change would already be merged intomain.4. Smoke Tests Cannot Block PR Merges
The smoke tests (real AI agent runs against Claude, Codex, Copilot) are triggered by emoji reactions or on a 12-hour schedule — not as required status checks. A breaking change to the firewall or agent container that would cause real-world AI agent runs to fail can be merged to
mainbefore smoke tests catch it.🟡 Medium Priority
5. Shell Script Linting Absent
The project has substantial shell code that is untested statically:
containers/agent/entrypoint.sh(~600 lines)containers/agent/setup-iptables.shcontainers/squid/entrypoint.shscripts/ci/*.sh(cleanup, test scripts)[shellcheck]((www.shellcheck.net/redacted) would catch common bugs (unquoted variables, missing exit code checks, portability issues) that could create security vulnerabilities in the container entrypoints.
6. Coverage Thresholds Are Too Low for a Security-Critical Tool
Current thresholds: 38% statements, 30% branches, 35% functions, 38% lines. For a network security firewall whose correctness is critical to isolating AI agents, these thresholds provide very weak guarantees. A security bug in uncovered code paths would not be caught by tests.
7. No Dockerfile Linting
Three Dockerfiles exist (
containers/*/Dockerfile) but nohadolintor equivalent runs in CI. Common Dockerfile issues (running as root unnecessarily, usinglatestbase tag, missing--no-cacheflags, improper layer ordering) could affect security and reproducibility.8. Documentation Build Failures Don't Fail PRs
In
docs-preview.yml, the docs build step usescontinue-on-error: true. Documentation build failures are reported as comments but don't fail the PR check. A broken docs site can be merged silently.9. Integration Tests Have No Path Filtering
Unlike smoke-chroot (which is path-filtered to
src/**,containers/**), the main integration test suite (test-integration-suite.yml) runs on all PRs regardless of what changed. Documentation-only PRs run a full 45-minute integration test suite unnecessarily.🟢 Low Priority
10. Action SHA Pinning Inconsistency in
performance-monitor.ymlperformance-monitor.ymluses floating action tags (actions/checkout@v4,actions/setup-node@v4,actions/github-script@v7,actions/upload-artifact@v4) while all other workflows pin to specific commit SHAs. This creates an inconsistent security posture — a compromised upstream action version could inject code into performance benchmark runs.11. No SBOM (Software Bill of Materials) Generation
Releases don't include an SBOM artifact. For a security tool distributed as a GitHub Action and CLI, SBOM generation would aid users in auditing the supply chain of the tool they're using to secure their own workflows.
12. No Stale Branch Coverage for Smoke Tests
Smoke tests run on
mainevery 12 hours. There's no easy way to know if the smoke test that ran was against the latest commit or a significantly older one, making it harder to correlate smoke failures with specific changes.13. API Proxy Container Not Built in Dependency Audit
dependency-audit.ymlauditscontainers/api-proxynpm deps, but doesn't build and scan theapi-proxyDocker image itself. The container includes a Node.js runtime that may have separate OS-level vulnerabilities beyond npm packages.📋 Actionable Recommendations
Recommendation 1: Add Trivy Container Scanning to Build Verification
Complexity: Low | Impact: High
Add a container scanning step to
build.ymlusing Trivy (already available as a GitHub Action, no secrets needed):Run this on all three images (squid, agent, api-proxy). Results flow to the GitHub Security tab.
Recommendation 2: Increase Test Coverage Thresholds Incrementally
Complexity: Medium | Impact: High
Add unit tests for
cli.ts(focus on argument parsing, flag combinations, error paths) anddocker-manager.ts(focus ongenerateDockerCompose, env var merging, cleanup logic). Then raise thresholds to 60%+. Thecli-workflow.test.tspattern (mockingexecaand docker calls) is already established.Target milestones:
Recommendation 3: Add shellcheck to Lint Workflow
Complexity: Low | Impact: Medium
Add a
shellcheckjob tolint.yml:Recommendation 4: Add Hadolint Dockerfile Linting
Complexity: Low | Impact: Medium
Add hadolint scanning for the three Dockerfiles. Can be integrated as a job in
build.ymlorlint.yml.Recommendation 5: Add Path Filtering to Integration Test Suite
Complexity: Low | Impact: Medium
Add
paths-ignoretotest-integration-suite.ymlto skip integration tests on documentation-only PRs:This mirrors the pattern already used by other workflows and reduces unnecessary CI time.
Recommendation 6: Pin Actions in
performance-monitor.ymlComplexity: Low | Impact: Low
Replace floating tags with SHA-pinned versions (consistent with all other workflows). This is a one-time fix.
Recommendation 7: Make Smoke Tests a Required Merge Queue Gate
Complexity: High | Impact: High
Consider adding smoke tests to merge queue or requiring at least one smoke test pass before main merges. This would catch real-world integration breakages before they hit
main.Alternatively, add a lightweight smoke test that runs on every PR (not a full AI agent run, but a basic
awf --allow-domains example.com curl (example.com/redacted)validation) as a required check.Recommendation 8: Generate SBOM in Release Workflow
Complexity: Low | Impact: Low
Add
anchore/sbom-actiontorelease.ymlto generate and attach an SBOM to each release. Useful for enterprise users who need supply chain transparency.📈 Metrics Summary
cli.tscoveragedocker-manager.tscoveragesquid-config.tscoveragelogger.tscoverageAssessment generated on 2026-03-26 by the CI/CD Gaps Assessment workflow.
Beta Was this translation helpful? Give feedback.
All reactions