[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1406
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-03-30T22:22:45.667Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
The repository has a mature, multi-tier CI/CD system with 19 workflows running on pull requests and an additional 17 scheduled/event-triggered workflows. All compiled agentic workflow
.lock.ymlfiles are up-to-date. Recent PR workflow runs show a high success rate (~82% on the most recent PR batch), with two notable failures:Build Test Suite(agentic, external dependency) andSmoke Codex(agentic smoke test).Workflow Inventory
✅ Existing Quality Gates
Code Quality
lint.yml)lint.yml)tsc --noEmitviatest-integration.ymlpr-title.ymlTesting
test-coverage.yml)test-integration-suite.yml)test-chroot.yml)test-examples.yml)action.ymltested with multiple version scenarios (test-action.yml)build-test.md)smoke-*.md)Security
security-extended+security-and-qualityqueriesnpm auditfor main package and docs-site with SARIF upload to Security tabsecurity-guard.md)Documentation
Build
build.yml)npm testincontainers/api-proxy/(build.yml)🔍 Identified Gaps
🔴 High Priority
1. Coverage Thresholds Are Dangerously Low
Current thresholds in
jest.config.js:branches: 30%, functions: 35%, lines: 38%, statements: 38%. A project this security-sensitive should have much higher minimums. Additionally, the coverage regression comparison step usescontinue-on-error: true, meaning even a significant coverage drop may not fail the PR.2. No Container Image Security Scanning on PRs
No workflow scans the Docker images (Squid, agent, api-proxy) for known CVEs using Trivy, Grype, or Anchore. The
docs/INTEGRATION-TESTS.mdcoverage heat map lists "Container security scan" as only having CI coverage — but investigation shows this workflow doesn't exist; the ci-doctor references it but it was never created. For a firewall tool, unpatched base image CVEs are a critical risk.3. Performance Regression Testing Not on PRs
performance-monitor.ymlruns weekly only (Mondays at 06:00 UTC). A PR that degrades container startup time by 50% would merge undetected and only surface in the next weekly report. Startup latency is a user-facing metric for this tool.4.
--block-domainsFeature Has Zero Test CoveragePer
docs/INTEGRATION-TESTS.mdcoverage heat map: "Domain deny-list (--block-domains) ❌ across all tiers — unit, integration, CI, smoke, build-test". This is a core security feature with no automated validation at any level.5.
--env-allFlag Has Zero Test CoverageAlso confirmed by coverage heat map: zero coverage at all levels. The
--env-allflag copies the host environment into the container, which has direct security implications (credential exposure risk).🟡 Medium Priority
6. No Dockerfile Linting (hadolint)
None of the three Dockerfiles (
containers/squid/Dockerfile,containers/agent/Dockerfile,containers/api-proxy/Dockerfile) are linted with hadolint or a similar tool. Hadolint catchesRUN apt-getwithout--no-install-recommends, missingUSERinstructions, shell form vs. exec form, and other best practices.7. Smoke Test Role-Filtering Creates PR Coverage Gaps
Recent workflow runs show "Smoke Claude: skipped" and "Smoke Copilot: skipped" on PRs. Agentic smoke workflows use
roles: allbut may be skipped for external contributors or bot-authored PRs. This means the end-to-end AI agent pipeline isn't validated for all PR types.8. Build Test Suite Fragility
build-test.mdclones external repositories (Mossaka/gh-aw-firewall-test-*) for each PR. The most recent run showed a failure. External dependency on third-party repos creates flaky CI; if those repos are unavailable or have breaking changes, every PR gets a spurious failure.9. No Structured Test Result Reports (JUnit XML)
Jest currently outputs text + LCOV + HTML + JSON summary but no JUnit XML. GitHub Actions can parse JUnit XML to show inline test failures in the PR interface (annotation on the specific line that failed) and track test trends over time. This would significantly improve developer experience when tests fail.
10. Integration Tests Don't Contribute to Coverage Metrics
test-coverage.ymlonly runsnpm run test:coveragetargetingsrc/**/*.tsunit tests. The extensive integration test suite (tests/integration/) exercises the same code paths but is not included in coverage reporting. True coverage is likely much higher than reported.11. No Docker Image Size Budget
No workflow tracks the size of built Docker images. An accidental dependency addition could bloat the agent or squid images, impacting pull times for users. A size regression gate (e.g., fail if image grows >20%) would catch this.
🟢 Low Priority
12. Link Check Only Triggered on Markdown Changes
link-check.ymlusespaths: ['**/*.md', '.github/lychee.toml']. Code-only PRs that remove or rename documentation anchors will break links without the link-check triggering.13. No Changelog/Release Notes Enforcement
No workflow validates that a
CHANGELOG.mdentry or release notes update accompanies non-trivial changes. Theupdate-release-notes.mdworkflow only runs on release publication, not on PRs.14. Performance Monitor Uses Unpinned Action SHA
performance-monitor.ymlusesactions/checkout@v4,actions/setup-node@v4, etc. (tag references, not pinned SHAs). Other workflows are correctly pinned. This inconsistency creates a supply chain risk specifically in the performance monitoring workflow.15. No Mutation Testing
The codebase has low coverage thresholds and security-critical logic. Mutation testing (e.g., with Stryker) would reveal whether tests actually verify correctness or just achieve line coverage through execution without assertions.
📋 Actionable Recommendations
lines: 60%, branches: 50%, functions: 65%incrementally; removecontinue-on-errorfrom comparison steptrivy-actionstep scanning all three Dockerfiles inbuild.yml; upload SARIF to Security tabbuild.ymlusing existingbenchmark-performance.ts; fail if >2x regression--block-domainsuntestedsrc/squid-config.test.tsand integration test intests/integration/--env-alluntested--env-allhadolint/hadolint-actionstep tobuild.ymlscanning all three Dockerfilesworkflow_dispatchfallback test that always runscontinue-on-errorwith explicit issue creation for external failuresjest-junitreporter; configureactions/junit-reporterto annotate PR with test failuresnpm run test:all -- --coverage) for the test-coverage workflowbuild.ymlthat builds images and checksdocker image inspectsize against a threshold filelink-check.ymltrigger to run on all PRs (not just markdown), with a 5-minute timeoutdangoslen/changelog-enforceror a custom scriptperformance-monitor.ymlaction SHAs usingpinactor similarsrc/squid-config.ts,src/domain-patterns.ts)📈 Metrics Summary
--block-domains,--env-all,--docker-warning-stubKey Finding
The most significant structural gap is the absence of a container image vulnerability scanner. For a tool whose core value proposition is security isolation, shipping Docker images with unscanned CVEs would be a trust-breaking issue. This should be the first gap addressed.
Beta Was this translation helpful? Give feedback.
All reactions