[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1326
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-03-23T22:26:19.869Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
This repository has a mature, multi-layered CI/CD pipeline with 52 total registered workflows across conventional YAML and compiled agentic (
.md) workflows. The pipeline covers build verification, linting, security scanning, integration testing, and smoke testing end-to-end.Pipeline health (recent PR runs): 18 distinct workflows run on pull requests. Failure rate is low, with the main notable failure being the agentic Build Test Suite (which runs cross-language builds via AI). Most core checks (Build Verification, Lint, TypeScript Type Check, CodeQL, Integration Tests, Test Coverage) are passing consistently.
✅ Existing Quality Gates
The following checks currently run on pull requests:
lint.ymllint.ymltest-integration.ymlbuild.ymlbuild.ymltest-coverage.ymltest-integration-suite.ymltest-chroot.ymltest-examples.ymltest-action.ymlpr-title.ymlcodeql.ymlcontainer-scan.ymlcontainers/**dependency-audit.ymllink-check.yml*.mddocs-preview.ymlsecurity-guard.md(Claude)build-test.md(Copilot)smoke-*.mdScheduled-only checks: Secret diggers (hourly), performance benchmarks (weekly), dependency security monitor (daily), doc maintainer (daily).
🔍 Identified Gaps
🔴 High Priority
1. Integration Test Pattern Coverage Has Blind Spots
The
test-integration-suite.ymlworkflow uses--testPathPatternsregex to split 33 integration test files across 4 parallel jobs. Several test files are not matched by any pattern and are therefore silently skipped in CI:api-target-allowlist.test.ts— tests automatic domain allowlisting for API targetsgh-host-injection.test.ts— security test for GH_HOST injection protectionghes-auto-populate.test.ts— GHES domain auto-population featureskip-pull.test.ts— tests--skip-pullflag behaviorworkdir-tmpfs-hiding.test.ts— security test for workdir visibility hidingThis is a significant gap: security-critical tests (
gh-host-injection,workdir-tmpfs-hiding) exist but may not run on PRs.Recommendation: Either add the missing test names to the relevant job patterns, or replace the pattern-based split with explicit test file lists. Consider auditing periodically with a script that cross-checks test files against CI patterns.
2. Very Low Unit Test Coverage With Permissive Thresholds
Current unit test coverage is 38% statements overall, with critical files having near-zero coverage:
cli.ts— 0% coverage (0/69 statements)docker-manager.ts— 18% coverage (45/250 statements)The coverage thresholds in jest config are set very low (≥38% statements, ≥30% branches), meaning PRs that further reduce coverage in these critical files can still pass.
cli.tsanddocker-manager.tsare the two largest, most complex files.Recommendation: Incrementally raise coverage thresholds. Add per-file minimum thresholds for
cli.tsanddocker-manager.ts. Thetest-coverage-improveragentic workflow runs weekly but improvements should be required as part of landing new features.3. Agentic Build Test Suite Has Persistent Failures
The
Build Test Suiteagentic workflow hasconclusion=failurein recent PR runs. This is an AI-driven workflow that runs multi-language build tests. A persistent failure here means an entire quality gate is effectively non-functional.Recommendation: Investigate the root cause of the failure (likely a network or token issue), fix it, and add alerting via
ci-doctorworkflow for prolonged failures.🟡 Medium Priority
4. Container Security Scan Not Triggered on Source Code Changes
container-scan.yml(Trivy) only runs when files undercontainers/**change. A change tosrc/docker-manager.tsthat alters how containers are configured (capabilities, seccomp, network) would not trigger a container rescan.Recommendation: Consider running container security scans on every PR (with caching to limit cost), or expand the path trigger to include
src/**since source changes affect the runtime security posture.5. Performance Benchmarks Never Run on PRs
performance-monitor.ymlruns only on a weekly schedule. Startup time, container spin-up latency, and throughput regressions introduced in a PR would go undetected until the following weekly run — and the weekly run doesn't comment on the offending PR.Recommendation: Add a lightweight performance check step to the build workflow (e.g., measure startup time on a single iteration) that can detect significant regressions (>50%) and post a PR comment with the delta.
6. Smoke Tests Are Effectively Optional for External Contributors
The
smoke-*.mdagentic smoke tests (Claude, Codex, Copilot) trigger on PRs but are gated byroles: allcombined withreactionemoji requirements for non-team members. While this is intentional to prevent abuse, it means the most realistic end-to-end validation of the firewall (running a real AI agent through the AWF) does not run automatically for all PRs.Recommendation: Consider making at least one smoke test required (or running a non-AI smoke test that exercises the same code paths) as a required status check for maintainer PRs.
7. No License Compliance Checking for Dependencies
There is no automated check to verify that newly added npm dependencies use acceptable licenses (MIT, Apache-2.0, ISC, etc.) and don't introduce copyleft licenses (GPL, AGPL) that could create legal complications for a commercial product.
Recommendation: Add
license-checkerorlicenseeto the dependency audit workflow to flag incompatible license additions.8. Secret Scanning Is Not a PR Gate
The hourly
secret-digger-*agentic workflows scan for secrets but run on a schedule, not on PRs. A secret committed to a PR would not be blocked; it would only be detected after the fact (up to 1 hour later).Recommendation: Consider adding GitHub's native secret scanning push protection (a repository setting) which blocks pushes containing recognized secrets at the git level, complementing the AI-based scanning.
🟢 Low Priority
9. CLI Flag Consistency Check Not on PRs
cli-flag-consistency-checker.mdruns weekly and checks for inconsistencies between CLI flags and documentation. A PR that adds a flag without updating docs would pass all checks and only be caught at the next weekly run.Recommendation: Run the CLI flag consistency check on PRs that touch
src/cli.tsorREADME.md.10. Documentation Preview Doesn't Fail the PR
docs-preview.ymlbuilds the Astro/Starlight docs site withcontinue-on-error: true. A broken docs build silently passes; contributors only see an artifact upload failure if they dig into the logs.Recommendation: Remove
continue-on-error: trueor post a PR comment when the docs build fails. Currently broken docs can be merged unnoticed.11. No SBOM Generation
There is no Software Bill of Materials (SBOM) generation in the release or PR workflow. For a security-focused tool distributed as a GitHub Action/npm package, an SBOM aids downstream consumers in vulnerability tracking.
Recommendation: Add
cyclonedx-npmor@cyclonedx/cdxgento the release workflow to generate an SBOM artifact.12. Missing
api-proxyContainer in Security Scanscontainer-scan.ymlscanscontainers/agent/andcontainers/squid/but there is also acontainers/api-proxy/with its own Node.js dependencies (a separatepackage.json). The api-proxy container is not scanned by Trivy.Recommendation: Add a third job to
container-scan.ymlto scan the api-proxy container image.📋 Actionable Recommendations
gh-host-injection,workdir-tmpfs-hidingmay not be runningcli.tsanddocker-manager.tssrc/**changeslicense-checkerstep to dependency audit workflowsrc/cli.tscontinue-on-errorfrom docs preview, add failure commentcontainer-scan.yml📈 Metrics Summary
cli.tsunit test coveragedocker-manager.tsunit test coverageBeta Was this translation helpful? Give feedback.
All reactions