Skip to content

test: add Docker Compose healthcheck audit and enforcement#376

Open
bugman-007 wants to merge 2 commits intoLight-Heart-Labs:mainfrom
bugman-007:test/compose-healthcheck-audit
Open

test: add Docker Compose healthcheck audit and enforcement#376
bugman-007 wants to merge 2 commits intoLight-Heart-Labs:mainfrom
bugman-007:test/compose-healthcheck-audit

Conversation

@bugman-007
Copy link
Contributor

Summary

  • Add audit tool to identify compose files missing healthcheck definitions
  • Create scripts/audit-compose-healthchecks.sh with categorization and strict mode
  • Create tests/test-compose-healthcheck-audit.sh with 9 test cases
  • Integrate test into CI pipeline

Motivation

Missing healthchecks = degraded observability

Current state:

  • Core services in docker-compose.base.yml all have healthchecks ✅
  • health-check.sh relies on healthchecks for monitoring ✅
  • 8 production compose files lack healthchecks
  • No automated enforcement or visibility ❌

This PR provides visibility and a foundation for enforcement.

Audit Results

Scanned 32 compose files across the repository:

Files with healthchecks: 16

  • All core services (llama-server, open-webui, dashboard-api, dashboard)
  • Most extension services (n8n, qdrant, searxng, embeddings, tts, etc.)

❌ Production files without healthchecks: 8

  1. docker-compose.intel.yml - Intel Arc GPU overlay
  2. docker-compose.arc.yml - Intel Arc GPU overlay
  3. docker-compose.amd.yml - AMD GPU overlay
  4. docker-compose.nvidia.yml - NVIDIA GPU overlay
  5. docker-compose.apple.yml - Apple Silicon overlay
  6. docker-compose.tier0.yml - Tier 0 configuration
  7. installers/windows/docker-compose.windows-amd.yml - Windows installer
  8. extensions/services/whisper/compose.nvidia.yaml - Whisper NVIDIA variant

⚠️ Local dev files without healthchecks: 7

  • docker-compose.local.yml
  • Various compose.local.yaml files in extensions
  • Non-critical (dev-only files)

ℹ️ Stub files: 1

  • extensions/services/comfyui/compose.yaml (empty stub, GPU overlays have healthchecks)

Tool Features

scripts/audit-compose-healthchecks.sh

# Basic audit
bash scripts/audit-compose-healthchecks.sh

# Strict mode (exit 1 if production files missing healthchecks)
bash scripts/audit-compose-healthchecks.sh --strict

# Quiet mode (minimal output for CI)
bash scripts/audit-compose-healthchecks.sh --quiet

Categorization:

  • Production files: GPU overlays, base configs, extension services
  • Local dev files: *.local.* files (warnings only)
  • Stub files: Empty services: {} (informational)

Test Coverage

The test suite validates:

  1. ✅ Script exists and is runnable
  2. ✅ Runs without shell errors
  3. ✅ Exit codes are valid (0=pass, 1=strict fail)
  4. ✅ Finds and counts compose files
  5. ✅ Reports files with healthchecks
  6. ✅ Identifies production files without healthchecks
  7. --strict flag enforcement works
  8. --quiet flag reduces output
  9. ✅ Script is executable

Test Results

$ bash tests/test-compose-healthcheck-audit.sh
╔═══════════════════════════════════════════════╗
║   Compose Healthcheck Audit Test Suite       ║
╚═══════════════════════════════════════════════╝

  ✓ PASS audit-compose-healthchecks.sh exists
  ✓ PASS audit-compose-healthchecks.sh runs without shell errors
  ✓ PASS audit-compose-healthchecks.sh exit code is valid (0|1): 0
  ✓ PASS audit-compose-healthchecks.sh finds compose files
  ✓ PASS audit-compose-healthchecks.sh reports files with healthchecks
  ✓ PASS audit-compose-healthchecks.sh identifies production files without healthchecks
  ✓ PASS audit-compose-healthchecks.sh --strict flag works (exit: 1)
  ✓ PASS audit-compose-healthchecks.sh --quiet reduces output
  ✓ PASS audit-compose-healthchecks.sh is runnable

Result: 9 passed, 0 failed

CI Integration

Added "Compose Healthcheck Audit Tests" step to .github/workflows/test-linux.yml to run on every PR.

Note: Currently runs in audit mode (non-blocking). Can be switched to --strict mode in future PR once healthchecks are added to identified files.

Impact

  • ✅ Provides visibility into healthcheck coverage across all compose files
  • ✅ Identifies 8 production files needing healthchecks
  • ✅ Establishes foundation for future enforcement
  • ✅ Improves service reliability monitoring
  • ✅ Aligns with project's observability standards
  • ✅ Supports health-check.sh monitoring effectiveness

Next Steps (Future PRs)

  1. Add healthchecks to identified production files
  2. Enable --strict mode in CI to enforce healthchecks
  3. Document healthcheck requirements in extension schema

Related

Copy link
Collaborator

@Lightheartdevs Lightheartdevs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Compose Healthcheck Audit

Script is well-structured and categorization logic is solid. A few issues to fix.

CLAUDE.md violation: 2>/dev/null

audit-compose-healthchecks.sh lines 50, 72, 77 — find and grep suppress stderr. CLAUDE.md rule 4 forbids this. Let errors surface or use explicit handling.

CLAUDE.md violation: || true in test

test-compose-healthcheck-audit.sh line 109 — Test #9 uses if [[ -x ... ]] || true; then pass. Always passes. Dead test. Violates both the || true ban and "let assertions fail visibly" rule.

Tests are structural, not behavioral

All 9 tests run the audit against the real repo and check for expected strings. None create temp compose files to verify detection/classification behavior. One fixture-based test with a known compose file (with and without healthcheck) would make these genuinely behavioral.

Bug: \s not portable to macOS

Line 72: grep -q "^services:\s*{}\s*$" uses \s which is PCRE, not supported in macOS grep's basic regex. CLAUDE.md calls out POSIX compatibility. Use [[:space:]] or grep -E.

CI conflict

PRs #373, #375, and #376 all insert CI steps at the same location in test-linux.yml. Whichever merges first conflicts with the others.

@Lightheartdevs
Copy link
Collaborator

What's needed to get this merged:

  1. Remove all 2>/dev/null — replace with || warn "..." or let errors propagate
  2. Remove || true on test Add supervisor pattern, background GPU automation, production guardia… #9 line 109 — it makes the test always pass
  3. Fix \s in grep (line 72) to [[:space:]] for macOS portability
  4. Add at least one fixture-based behavioral test — create a temp compose file with/without healthcheck and verify detection

Will conflict with #373 and #375 on test-linux.yml — whoever merges first wins, others rebase.

@bugman-007
Copy link
Contributor Author

I addressed review feedback for compose healthcheck audit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants