Skip to content

Fix health check failures when Docker proxy is configured#103

Merged
ezimuel merged 3 commits intomainfrom
fix/79-proxy-healthcheck-and-version-unbound
Mar 9, 2026
Merged

Fix health check failures when Docker proxy is configured#103
ezimuel merged 3 commits intomainfrom
fix/79-proxy-healthcheck-and-version-unbound

Conversation

@amitkanfer
Copy link
Copy Markdown
Contributor

@amitkanfer amitkanfer commented Mar 5, 2026

Summary

Fixes #79

Three bugs fixed, all affecting the same user flow.

Bug 1 — Health check fails when Docker proxy is configured

When users have http_proxy/https_proxy set in the Docker daemon config (e.g. /etc/systemd/system/docker.service.d/ or ~/.config/docker/config.json), Docker injects those proxy env vars into every container. The health check curl commands then attempt to route internal Docker network requests (e.g. http://elasticsearch:9200, http://kibana:5601) through the proxy, which either can't reach those hostnames or rejects them — causing the health check to time out after all retries (~584s) even though Elasticsearch is running fine (as confirmed by the reporter: manual curl http://localhost:9200 from the host returns 200).

Fix: Add --noproxy '*' to all curl commands that contact internal Docker network hostnames:

  • Elasticsearch container health check
  • Kibana container health check

Bug 1b — kibana_settings password setup silently broken inside bash -c '...'

The kibana_settings service command is bash -c '...' (outer single quotes). Adding --noproxy '*' inside that context breaks the shell quoting: the ' in '*' closes the outer single-quoted string, causing bash to receive a truncated script and exit immediately with an error — so the Kibana system password is never set and the stack fails to start.

Fix: Use --noproxy "*" (double quotes) for the kibana_settings curl command. Inside a single-quoted bash -c '...' string, double quotes are literal characters passed through to bash, which then interprets "*" as a double-quoted glob-safe * — the correct value for curl's --noproxy argument.

Bug 2 — VERSION: unbound variable crash in error log generation

The script uses set -eu. When a failure occurs, generate_error_log() calls get_os_info(), which sources /etc/os-release and then prints $VERSION. On rolling-release distributions like Arch Linux, /etc/os-release does not define VERSION (only VERSION_ID). With set -eu, this causes an unbound variable crash — so the error log is never written and the user sees a confusing VERSION: unbound variable message instead of the actual failure details.

Fix: Use ${VERSION:-} to safely default to an empty string when VERSION is not defined.

Tests added

tests/get_os_info_test.sh (runs in all CI jobs)

  • test_get_os_info_succeeds_when_VERSION_is_not_defined: creates a mock /etc/os-release without a VERSION field (simulating Arch Linux) and verifies the fixed code path (${VERSION:-}) exits 0 under set -eu
  • test_bare_VERSION_crashes_under_set_eu_when_not_defined: confirms the unfixed pattern ($VERSION) would have crashed, making the above test meaningful

tests/docker/proxy.sh (runs in Docker CI jobs)

Configures ~/.docker/config.json to inject a non-existent proxy (http://127.0.0.1:19999) into all containers — simulating a user with a corporate proxy in their Docker daemon config. Runs start-local.sh and asserts both services respond:

  • test_elasticsearch_accessible_with_proxy_configured: HTTP 200 from localhost:9200
  • test_kibana_accessible_with_proxy_configured: HTTP 200 from localhost:5601

All 4 CI jobs (ubuntu-22.04 Docker, ubuntu-24.04 Docker, ubuntu-22.04 Podman, ubuntu-24.04 Podman) pass.

Test plan

  • Run start-local with http_proxy set in Docker daemon config — health checks pass (verified in CI via tests/docker/proxy.sh)
  • Run start-local on Arch Linux — error log generates without crashing (verified in CI via tests/get_os_info_test.sh)
  • Run start-local on a standard distro (Ubuntu, Debian) — no regression (verified across all 4 CI matrix jobs)

🤖 Generated with Claude Code

…N unbound variable (#79)

Two bugs fixed:

1. Health check curl commands inside containers did not bypass the
   Docker daemon proxy (http_proxy/https_proxy). When users have a
   system-wide Docker proxy configured, curl routes internal Docker
   network requests (e.g. http://elasticsearch:9200) through the proxy,
   causing the health check to fail even though Elasticsearch is
   running correctly. Adding --noproxy '*' ensures health checks
   always connect directly within the Docker network.
   Affected: ES health check, kibana_settings password setup curl,
   and Kibana health check.

2. On rolling-release distributions like Arch Linux, /etc/os-release
   does not define the VERSION variable (only VERSION_ID). Because the
   script runs with set -eu, referencing $VERSION caused an unbound
   variable error that crashed the error-log generation, hiding the
   real failure message. Fixed by using ${VERSION:-} as a safe default.

Fixes #79

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@cla-checker-service
Copy link
Copy Markdown

cla-checker-service bot commented Mar 5, 2026

❌ Author of the following commits did not sign a Contributor Agreement:
266b07b, e507a64, 1bbe023

Please, read and sign the above mentioned agreement if you want to contribute to this project

Amit Kanfer and others added 2 commits March 5, 2026 09:46
- tests/get_os_info_test.sh: unit test verifying ${VERSION:-} does not crash
  under set -eu when VERSION is absent from /etc/os-release (Arch Linux).
  Includes a negative test confirming bare $VERSION would have crashed.

- tests/docker/proxy.sh: integration test verifying ES and Kibana health checks
  succeed when Docker client proxy config injects HTTP_PROXY into containers.
  Configures ~/.docker/config.json to point at a non-existent proxy
  (127.0.0.1:19999), then runs start-local.sh and asserts both services
  return HTTP 200.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The kibana_settings Docker Compose command uses bash -c '...' with
single quotes. Using --noproxy '*' inside this context breaks the
outer single-quoting (the ' in '*' closes the bash -c argument).

Use --noproxy "*" (double quotes) instead, which is safe inside
a single-quoted context and passes the literal * to curl.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@amitkanfer amitkanfer requested a review from ezimuel March 5, 2026 10:31
Copy link
Copy Markdown
Collaborator

@ezimuel ezimuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @amitkanfer

@ezimuel ezimuel merged commit 94d5602 into main Mar 9, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

es-local-dev health check keeps failing

2 participants