You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository has a mature, multi-layered CI/CD pipeline with strong security emphasis. The pipeline covers building, testing, security scanning, documentation, and agentic smoke tests across 4 test tiers.
Hourly: Secret Diggers (Claude, Copilot, Codex) — scan for secrets in repo
On issues: Issue Monster, Issue Duplication Detector
🔍 Identified Gaps
🔴 High Priority
1. Domain/Network Integration Tests Have No Dedicated CI Job
The test-integration-suite.yml workflow covers domain, network, and security tests, but according to the coverage heat map in docs/INTEGRATION-TESTS.md, these ~50 tests (blocked-domains, dns-servers, wildcard-patterns, network-security, etc.) are listed as having no CI workflow entry. The integration suite job pattern-matches specific test files and the domain tests are included, but the workflow is named "Integration Tests" — a mismatch that could lead contributors to miss these are running. More critically: there is no coverage threshold enforcement — the workflow fails only on regression, not on absolute minimums.
2. Dependency Vulnerability Audit Currently Failing
Recent runs show dependency-audit.yml failing. A failing security workflow on main means PRs that would normally be blocked by this gate are proceeding. This should be resolved and the workflow stabilized.
3. Missing Coverage Minimum Threshold
test-coverage.yml detects regressions (coverage decrease vs. base branch) but enforces no absolute floor. A PR that starts from a low-coverage branch or adds code without tests can merge. No threshold like "lines must be ≥ X%" is configured.
4. Performance Benchmarks Not Run on PRs
performance-monitor.yml runs only on a weekly schedule. Performance regressions (startup time, container launch latency) can be silently merged. Given AWF's core value proposition involves container lifecycle timing, a PR-triggered performance check or at minimum a diff-aware benchmark comment would add value.
🟡 Medium Priority
5. --env-all Flag Has No Test Coverage
The coverage heat map explicitly calls out --env-all as having zero unit, integration, or CI coverage. This flag copies all host environment variables into the container — a high-stakes feature for both functionality and security — yet it is completely untested.
6. --block-domains (Domain Deny-List) Has No Test Coverage
The deny-list feature (--block-domains) has zero coverage at all test levels. This is a security-relevant feature; bugs would silently allow traffic that should be blocked.
7. Secret Digger (Copilot) Has Recurring Failures
2 of 5 recent runs failed. Secret scanners are a critical security control; flaky failures erode trust in the signal. Root cause should be investigated — likely a Copilot API reliability issue or a workflow configuration problem.
8. Integration Tests Missing CI Jobs for Several Categories
Per docs/INTEGRATION-TESTS.md, these integration test categories have no dedicated CI workflow:
Domain/Network (6 files, ~50 tests) — included in test-integration-suite.yml but not in a named/visible job
Protocol/Security (8 files, ~100 tests) — same
Container/Ops (7 files, ~45 tests) — same
All are bundled into the generic "Integration Tests" workflow but categorized separately in the docs, making it harder to see which are green/red.
9. No Container Image Vulnerability Scanning on PRs
codeql.yml includes language: actions analysis and dependency-audit.yml scans npm packages, but there is no Docker image vulnerability scanning (e.g., Trivy, Grype) on the Squid, Agent, or API Proxy container images. Container CVEs would not be caught until after release.
10. SSL Bump Only Has Unit Tests, No Integration Tests
The SSL/TLS inspection config has unit test coverage but zero integration test coverage. A regression in HTTPS proxy behavior would only be caught by smoke tests (real AI agents), which are expensive and slower feedback.
🟢 Low Priority
11. No Windows or macOS CI Testing
All workflows run exclusively on ubuntu-latest. AWF uses Docker, which requires different setup on macOS/Windows. The install.sh script supports these platforms but they're untested in CI.
12. build.yml and lint.yml Duplicate the Lint Step
build.yml runs npm run lint as a step, and lint.yml also runs npm run lint as a separate job. This wastes ~5 minutes of compute per PR. The build.yml lint step could be removed in favor of the dedicated lint.yml workflow.
13. test-coverage-improver Agentic Workflow Not Enforced
The weekly test-coverage-improver.md opens PRs to improve coverage, but there's no mechanism to prevent coverage-reducing PRs when this bot's PRs aren't merged. The PR gate (test-coverage.yml) only blocks on regression, so the improver and the gate aren't tightly coupled.
14. No Mutation Testing
The test suite has ~200 unit tests and ~265 integration tests, but no mutation testing is configured. Mutation testing (e.g., Stryker for TypeScript) would reveal tests that pass even when code logic is broken — catching low-quality tests.
15. Integration Test Timeout Sensitivity
All integration test jobs have a 45-minute timeout, and tests run serially (1 worker). A single slow test can block the entire suite. There's no mechanism to detect newly-flaky tests or tests that are approaching the timeout boundary.
Investigate and resolve the current dependency-audit.yml failure. This is likely a specific CVE in a dependency that needs a version bump or an audit override. Until fixed, the security gate is broken.
npm audit --audit-level=high
cd docs-site && npm audit --audit-level=high
In test-coverage.yml, add a step after generating coverage that fails if absolute coverage drops below a floor (e.g., 60% lines). Use the existing coverage-summary.json:
- name: Enforce minimum coveragerun: | LINES=$(jq -r '.total.lines.pct' coverage/coverage-summary.json) if (( $(echo "$LINES < 60" | bc -l) )); then echo "::error::Coverage \$\{LINES}% is below minimum threshold of 60%" exit 1 fi
Expected impact: Prevents coverage decline from accumulating over time.
3. Add Integration Tests for --env-all and --block-domains (High Priority — Medium Complexity)
Create tests/integration/env-all.test.ts and tests/integration/block-domains.test.ts and add them to the pattern matching in test-integration-suite.yml. These test two security-relevant features with zero current coverage.
Expected impact: Closes security testing gaps for two critical CLI flags.
Expected impact: Catches container CVEs before they reach GHCR.
5. Add Benchmark Comment to PRs (Medium Priority — Medium Complexity)
Extend performance-monitor.yml to also run on pull_request with reduced iterations (e.g., 3 instead of 5), and post results as a PR comment. Keep the weekly full run for regression issue creation. Use a skip-if-label: skip-benchmark label for large refactors.
Expected impact: Startup time regressions caught at PR time, not a week later.
Review the 2 failed runs, determine if they're authentication failures, timeout issues, or logic failures. Consider adding a failure notification or fallback to the Claude-based digger if Copilot is unavailable.
Expected impact: Restores reliability of hourly secret scanning.
7. Deduplicate Lint in build.yml (Low Priority — Low Complexity)
Remove the npm run lint step from build.yml since lint.yml already covers it. This saves ~5 minutes of CI compute per PR without reducing coverage.
8. Add SSL/HTTPS Integration Test (Low Priority — Medium Complexity)
Create a minimal integration test that verifies HTTPS CONNECT tunneling works as expected via the Squid proxy. This would provide regression protection for TLS behavior without needing real AI smoke tests.
9. Split Integration Test Suite into Named Jobs (Low Priority — Low Complexity)
Rename the jobs in test-integration-suite.yml to match the category names used in docs/INTEGRATION-TESTS.md (e.g., "Domain & Network Tests", "Protocol & Security Tests"). This improves PR status check readability and aligns docs with CI.
📈 Metrics Summary
Metric
Value
Total workflow files
~62 (40 YAML + 22 compiled Markdown)
PR-triggered workflows
~19 (13 standard + 6 agentic)
Scheduled workflows
~15
Integration test files
34 files
Integration test approximate count
~265 tests
Unit test files
~19 files
Unit test approximate count
~200 tests
Multi-node CI matrix
Node 20 + 22
Languages tested in build-test
8 (Bun, C++, Deno, .NET, Go, Java, Node.js, Rust)
AI engines in smoke tests
3 (Claude, Copilot, Codex)
Recent workflow success rate
~85% (excluding Secret Digger Copilot failures)
Coverage enforcement
Regression detection only (no absolute floor)
Test execution model
Serial (1 worker, Docker constraints)
Container image scanning on PRs
❌ None
--env-all test coverage
❌ None
--block-domains test coverage
❌ None
Assessment generated on 2026-03-28 based on workflow files in .github/workflows/, docs/INTEGRATION-TESTS.md, and recent workflow run history.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
This repository has a mature, multi-layered CI/CD pipeline with strong security emphasis. The pipeline covers building, testing, security scanning, documentation, and agentic smoke tests across 4 test tiers.
Pipeline architecture overview:
Recent run statistics (from 20 most recent runs):
✅ Existing Quality Gates
On Every Pull Request
lint.ymlbuild.ymlbuild.ymlcontainers/api-proxy/tsc --noEmit)test-integration.ymltsconfig.check.jsontest-coverage.ymltest-integration-suite.ymltest-chroot.ymltest-examples.ymltest-action.ymlaction.yml+ 4 scenarioscodeql.ymldependency-audit.ymlpr-title.ymllink-check.yml*.mdchangesdocs-preview.ymlsecurity-guard.mdbuild-test.mdsmoke-claude/copilot/codex/chroot.mdScheduled / Ongoing Monitoring
🔍 Identified Gaps
🔴 High Priority
1. Domain/Network Integration Tests Have No Dedicated CI Job
The
test-integration-suite.ymlworkflow covers domain, network, and security tests, but according to the coverage heat map indocs/INTEGRATION-TESTS.md, these ~50 tests (blocked-domains,dns-servers,wildcard-patterns,network-security, etc.) are listed as having no CI workflow entry. The integration suite job pattern-matches specific test files and the domain tests are included, but the workflow is named "Integration Tests" — a mismatch that could lead contributors to miss these are running. More critically: there is no coverage threshold enforcement — the workflow fails only on regression, not on absolute minimums.2.
Dependency Vulnerability AuditCurrently FailingRecent runs show
dependency-audit.ymlfailing. A failing security workflow onmainmeans PRs that would normally be blocked by this gate are proceeding. This should be resolved and the workflow stabilized.3. Missing Coverage Minimum Threshold
test-coverage.ymldetects regressions (coverage decrease vs. base branch) but enforces no absolute floor. A PR that starts from a low-coverage branch or adds code without tests can merge. No threshold like "lines must be ≥ X%" is configured.4. Performance Benchmarks Not Run on PRs
performance-monitor.ymlruns only on a weekly schedule. Performance regressions (startup time, container launch latency) can be silently merged. Given AWF's core value proposition involves container lifecycle timing, a PR-triggered performance check or at minimum a diff-aware benchmark comment would add value.🟡 Medium Priority
5.
--env-allFlag Has No Test CoverageThe coverage heat map explicitly calls out
--env-allas having zero unit, integration, or CI coverage. This flag copies all host environment variables into the container — a high-stakes feature for both functionality and security — yet it is completely untested.6.
--block-domains(Domain Deny-List) Has No Test CoverageThe deny-list feature (
--block-domains) has zero coverage at all test levels. This is a security-relevant feature; bugs would silently allow traffic that should be blocked.7.
Secret Digger (Copilot)Has Recurring Failures2 of 5 recent runs failed. Secret scanners are a critical security control; flaky failures erode trust in the signal. Root cause should be investigated — likely a Copilot API reliability issue or a workflow configuration problem.
8. Integration Tests Missing CI Jobs for Several Categories
Per
docs/INTEGRATION-TESTS.md, these integration test categories have no dedicated CI workflow:test-integration-suite.ymlbut not in a named/visible jobAll are bundled into the generic "Integration Tests" workflow but categorized separately in the docs, making it harder to see which are green/red.
9. No Container Image Vulnerability Scanning on PRs
codeql.ymlincludeslanguage: actionsanalysis anddependency-audit.ymlscans npm packages, but there is no Docker image vulnerability scanning (e.g., Trivy, Grype) on the Squid, Agent, or API Proxy container images. Container CVEs would not be caught until after release.10.
SSL BumpOnly Has Unit Tests, No Integration TestsThe SSL/TLS inspection config has unit test coverage but zero integration test coverage. A regression in HTTPS proxy behavior would only be caught by smoke tests (real AI agents), which are expensive and slower feedback.
🟢 Low Priority
11. No Windows or macOS CI Testing
All workflows run exclusively on
ubuntu-latest. AWF uses Docker, which requires different setup on macOS/Windows. Theinstall.shscript supports these platforms but they're untested in CI.12.
build.ymlandlint.ymlDuplicate the Lint Stepbuild.ymlrunsnpm run lintas a step, andlint.ymlalso runsnpm run lintas a separate job. This wastes ~5 minutes of compute per PR. Thebuild.ymllint step could be removed in favor of the dedicatedlint.ymlworkflow.13.
test-coverage-improverAgentic Workflow Not EnforcedThe weekly
test-coverage-improver.mdopens PRs to improve coverage, but there's no mechanism to prevent coverage-reducing PRs when this bot's PRs aren't merged. The PR gate (test-coverage.yml) only blocks on regression, so the improver and the gate aren't tightly coupled.14. No Mutation Testing
The test suite has ~200 unit tests and ~265 integration tests, but no mutation testing is configured. Mutation testing (e.g., Stryker for TypeScript) would reveal tests that pass even when code logic is broken — catching low-quality tests.
15. Integration Test Timeout Sensitivity
All integration test jobs have a 45-minute timeout, and tests run serially (1 worker). A single slow test can block the entire suite. There's no mechanism to detect newly-flaky tests or tests that are approaching the timeout boundary.
📋 Actionable Recommendations
1. Fix Failing Dependency Audit (High Priority — Low Complexity)
Investigate and resolve the current
dependency-audit.ymlfailure. This is likely a specific CVE in a dependency that needs a version bump or an audit override. Until fixed, the security gate is broken.2. Add Coverage Minimum Threshold (High Priority — Low Complexity)
In
test-coverage.yml, add a step after generating coverage that fails if absolute coverage drops below a floor (e.g., 60% lines). Use the existingcoverage-summary.json:Expected impact: Prevents coverage decline from accumulating over time.
3. Add Integration Tests for
--env-alland--block-domains(High Priority — Medium Complexity)Create
tests/integration/env-all.test.tsandtests/integration/block-domains.test.tsand add them to the pattern matching intest-integration-suite.yml. These test two security-relevant features with zero current coverage.Expected impact: Closes security testing gaps for two critical CLI flags.
4. Add Container Image Scanning to PRs (Medium Priority — Low Complexity)
Add a job to
build.yml(or a newcontainer-scan.yml) that builds the Docker images and scans them with Trivy:Expected impact: Catches container CVEs before they reach GHCR.
5. Add Benchmark Comment to PRs (Medium Priority — Medium Complexity)
Extend
performance-monitor.ymlto also run onpull_requestwith reduced iterations (e.g., 3 instead of 5), and post results as a PR comment. Keep the weekly full run for regression issue creation. Use askip-if-label: skip-benchmarklabel for large refactors.Expected impact: Startup time regressions caught at PR time, not a week later.
6. Investigate Secret Digger (Copilot) Failures (Medium Priority — Low Complexity)
Review the 2 failed runs, determine if they're authentication failures, timeout issues, or logic failures. Consider adding a failure notification or fallback to the Claude-based digger if Copilot is unavailable.
Expected impact: Restores reliability of hourly secret scanning.
7. Deduplicate Lint in
build.yml(Low Priority — Low Complexity)Remove the
npm run lintstep frombuild.ymlsincelint.ymlalready covers it. This saves ~5 minutes of CI compute per PR without reducing coverage.8. Add SSL/HTTPS Integration Test (Low Priority — Medium Complexity)
Create a minimal integration test that verifies HTTPS CONNECT tunneling works as expected via the Squid proxy. This would provide regression protection for TLS behavior without needing real AI smoke tests.
9. Split Integration Test Suite into Named Jobs (Low Priority — Low Complexity)
Rename the jobs in
test-integration-suite.ymlto match the category names used indocs/INTEGRATION-TESTS.md(e.g., "Domain & Network Tests", "Protocol & Security Tests"). This improves PR status check readability and aligns docs with CI.📈 Metrics Summary
--env-alltest coverage--block-domainstest coverageAssessment generated on 2026-03-28 based on workflow files in
.github/workflows/,docs/INTEGRATION-TESTS.md, and recent workflow run history.Beta Was this translation helpful? Give feedback.
All reactions