Skip to content

[client] Add health check flag to status command and expose daemon status in output#5650

Merged
mlsmaycon merged 1 commit intomainfrom
feat/status-health-checks
Mar 22, 2026
Merged

[client] Add health check flag to status command and expose daemon status in output#5650
mlsmaycon merged 1 commit intomainfrom
feat/status-health-checks

Conversation

@lixmal
Copy link
Collaborator

@lixmal lixmal commented Mar 22, 2026

Describe your changes

  • Add --check flag to netbird status for container/orchestrator health probes (live, ready, startup)

  • Expose daemonStatus field in JSON/YAML output so structured output is returned for all daemon states, including auth-required states that previously only printed plain text

  • Introduce ConvertOptions struct to replace the 10 positional parameters in ConvertToStatusOutputOverview

  • Add ParseDaemonStatus to validate raw status strings against known constants instead of raw casting

  • Update Docker entrypoint to use status --check instead of log file parsing for startup detection

  • live and ready use lightweight status RPC (no full peer status), startup requests full status with fresh probes

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

netbirdio/docs#665

Summary by CodeRabbit

  • New Features

    • Added --check flag to the status command to validate daemon health (liveness, readiness, and startup checks).
  • Improvements

    • Increased Docker container entrypoint service and login timeouts from 5 to 30 seconds for more reliable startup.
    • Refactored startup verification to use health checks instead of log file monitoring.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 279568e2-0391-4f41-aa27-65f47226fd7a

📥 Commits

Reviewing files that changed from the base of the PR and between 239ad44 and 44f69f3.

📒 Files selected for processing (9)
  • client/Dockerfile
  • client/Dockerfile-rootless
  • client/cmd/status.go
  • client/internal/debug/debug.go
  • client/netbird-entrypoint.sh
  • client/status/status.go
  • client/status/status_test.go
  • client/wasm/cmd/main.go
  • proxy/internal/debug/handler.go
✅ Files skipped from review due to trivial changes (4)
  • client/Dockerfile-rootless
  • client/status/status_test.go
  • client/Dockerfile
  • client/internal/debug/debug.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • client/wasm/cmd/main.go
  • client/netbird-entrypoint.sh
  • client/cmd/status.go

📝 Walkthrough

Walkthrough

Adds a persistent --check health flag to the status CLI (live|ready|startup) with early-exit health probes, refactors status conversion to use an exported ConvertOptions and DaemonStatus, updates call sites, and replaces entrypoint log polling with netbird status --check polling and longer timeouts.

Changes

Cohort / File(s) Summary
Health Check & CLI
client/cmd/status.go
Adds persistent --check flag, implements runHealthCheck with live/ready/startup probes, changes statusFunc early-exit behavior, and updates getStatus call sites to new signature.
Status Model & API
client/status/status.go, client/status/status_test.go
Introduces DaemonStatus type and lifecycle constants, adds ConvertOptions struct, adds daemonStatus to OutputOverview, and changes ConvertToStatusOutputOverview to accept ConvertOptions; tests updated.
Call Site Updates
client/internal/debug/debug.go, client/wasm/cmd/main.go, proxy/internal/debug/handler.go
Call sites updated to pass nbstatus.ConvertOptions{...} instead of positional args; removed now-unneeded version imports at these locations.
Entrypoint Script & Dockerfiles
client/netbird-entrypoint.sh, client/Dockerfile, client/Dockerfile-rootless
Increases NB_ENTRYPOINT_SERVICE_TIMEOUT and NB_ENTRYPOINT_LOGIN_TIMEOUT defaults, removes log-file locating/grep logic, and replaces log polling with `netbird status --check live
Status RPC change
client/cmd/status.go (callers)
Updates getStatus(ctx, fullPeerStatus, shouldRunProbes) usage; health checks pass startup flag for both parameters while normal status uses (true, false).

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI
    participant Daemon
    participant Validator

    User->>CLI: netbird status --check ready
    CLI->>CLI: parse --check
    CLI->>Daemon: getStatus(ctx, fullPeerStatus=?, shouldRunProbes=?)
    Daemon-->>CLI: FullStatus + status string
    CLI->>Validator: checkReadiness(resp)
    alt status is Idle/Connecting/Connected
        Validator-->>CLI: success (exit 0)
    else NeedsLogin/LoginFailed/SessionExpired or other
        Validator-->>CLI: fail (exit non‑zero)
    end

    User->>CLI: netbird status --check startup
    CLI->>Daemon: getStatus(ctx, fullPeerStatus=true, shouldRunProbes=true)
    Daemon-->>CLI: FullStatus + status string (full)
    CLI->>Validator: checkStartup(resp) (mgmt/signal states, relays)
    alt all checks pass
        Validator-->>CLI: success (exit 0)
    else failure
        Validator-->>CLI: fail (exit non‑zero)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • mlsmaycon
  • pappz

Poem

🐇 I hop to check the daemon's state,
Live, ready, startup — I won't be late.
I poll and peep, no log to find,
Waiting on signals, calm and kind.
Hooray — the network wakes in time.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately summarizes the two main changes: adding a health check flag to the status command and exposing daemon status in output.
Description check ✅ Passed The PR description covers key changes with bullet points, includes issue/docs references, and properly completes the repository template with appropriate checkboxes marked.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/status-health-checks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lixmal lixmal changed the title [client, proxy] Add health check flag to status command and expose daemon status in output [client] Add health check flag to status command and expose daemon status in output Mar 22, 2026
@lixmal lixmal force-pushed the feat/status-health-checks branch from fd75940 to 4f380f4 Compare March 22, 2026 04:53
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
client/status/status_test.go (1)

241-245: Extend this fixture matrix beyond Connected.

Worth adding NeedsLogin, LoginFailed, SessionExpired, and an unknown raw status here so the new daemonStatus serialization is protected across the auth-related states this PR is introducing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/status/status_test.go` around lines 241 - 245, Update the
TestConversionFromFullStatusToOutputOverview fixture to be table-driven and
include additional auth-related daemon statuses: NeedsLogin, LoginFailed,
SessionExpired, plus an unknown/raw status, so ConvertToStatusOutputOverview and
ParseDaemonStatus are exercised across these values; iterate over a slice of
cases that set resp.GetStatus()/resp.GetFullStatus() inputs and expected
DaemonVersion/daemonStatus outputs, call ConvertToStatusOutputOverview with
ConvertOptions (DaemonVersion, DaemonStatus:
ParseDaemonStatus(resp.GetStatus())), and assert the serialized daemonStatus
matches expectations for each case to protect the new auth-related states.
client/cmd/status.go (1)

206-271: Worth pinning the new probe rules down with unit tests.

A small table around checkReadiness and checkStartup would protect the auth-required, missing-full-status, disconnected-management/signal, and no-relay-available branches that now drive container health probes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/cmd/status.go` around lines 206 - 271, Add table-driven unit tests for
checkReadiness and checkStartup that cover all probe branches: for
checkReadiness, create resp with proto.StatusResponse wrapping
internal.StatusType values StatusNeedsLogin, StatusLoginFailed, and
StatusSessionExpired and assert each returns an error containing the status
string, plus a success case for a healthy status. For checkStartup, test: (1)
resp with nil FullStatus returns the "no full status available" error; (2)
FullStatus with ManagementState.Connected == false returns "management not
connected"; (3) FullStatus with SignalState.Connected == false returns "signal
not connected"; (4) FullStatus with relays that include rel:// or rels:// URIs
where relayCount>0 and relaysConnected==0 returns the "no relay servers
available" error; and (5) a success case where required fields are connected and
at least one relay is available. Use table-driven subtests, construct
proto.StatusResponse/FullStatus/ManagementState/SignalState/Relay messages
directly, and assert checkReadiness/checkStartup error presence and messages
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/netbird-entrypoint.sh`:
- Around line 66-79: The startup wait logic treats a zero timeout as a failure
path; change the behavior so NB_ENTRYPOINT_SERVICE_TIMEOUT=0 means "no-wait"
(immediately return) instead of exiting with failure: in
wait_for_daemon_startup(), check if timeout is 0 at the top and return
immediately; otherwise run the existing loop and only exit on real timeout.
Apply the same pattern to the other readiness loop that uses
NB_ENTRYPOINT_LOGIN_TIMEOUT (the login/readiness probe loop at lines 82-97): if
NB_ENTRYPOINT_LOGIN_TIMEOUT is 0, skip waiting and proceed with the fast path
(do not force a failure), otherwise perform the existing timed loop. Ensure you
reference and use the same timeout variable names (NB_ENTRYPOINT_SERVICE_TIMEOUT
and NB_ENTRYPOINT_LOGIN_TIMEOUT) and preserve current logging behavior when
non-zero timeouts expire.

In `@client/status/status.go`:
- Around line 40-49: The current ParseDaemonStatus function collapses unknown
values to DaemonStatusIdle; change the default behavior to preserve the raw
value instead of coercing to Idle by returning DaemonStatus(s) in the default
branch (or alternatively define and return a new DaemonStatusUnknown constant if
you prefer explicitness). Update the function ParseDaemonStatus and, if adding a
new constant, add DaemonStatusUnknown to the DaemonStatus type definition so
unknown serialized states remain distinguishable from DaemonStatusIdle.
- Around line 52-63: ConvertToStatusOutputOverview currently copies
ConvertOptions.DaemonVersion verbatim which results in blank daemon version when
callers (client/internal/debug/debug.go, client/wasm/cmd/main.go,
proxy/internal/debug/handler.go) omit it; fix by making
ConvertToStatusOutputOverview default an empty opts.DaemonVersion to a
meaningful value (e.g., "local" or a shared DaemonVersionDefault constant or
runtime-derived version) before copying into the output, or alternatively
restore DaemonVersion as a required field on ConvertOptions; update the
ConvertOptions/ConvertToStatusOutputOverview code path that sets
output.DaemonVersion to check for empty string and substitute the default so
same-process callers no longer emit a blank daemon version.

---

Nitpick comments:
In `@client/cmd/status.go`:
- Around line 206-271: Add table-driven unit tests for checkReadiness and
checkStartup that cover all probe branches: for checkReadiness, create resp with
proto.StatusResponse wrapping internal.StatusType values StatusNeedsLogin,
StatusLoginFailed, and StatusSessionExpired and assert each returns an error
containing the status string, plus a success case for a healthy status. For
checkStartup, test: (1) resp with nil FullStatus returns the "no full status
available" error; (2) FullStatus with ManagementState.Connected == false returns
"management not connected"; (3) FullStatus with SignalState.Connected == false
returns "signal not connected"; (4) FullStatus with relays that include rel://
or rels:// URIs where relayCount>0 and relaysConnected==0 returns the "no relay
servers available" error; and (5) a success case where required fields are
connected and at least one relay is available. Use table-driven subtests,
construct proto.StatusResponse/FullStatus/ManagementState/SignalState/Relay
messages directly, and assert checkReadiness/checkStartup error presence and
messages accordingly.

In `@client/status/status_test.go`:
- Around line 241-245: Update the TestConversionFromFullStatusToOutputOverview
fixture to be table-driven and include additional auth-related daemon statuses:
NeedsLogin, LoginFailed, SessionExpired, plus an unknown/raw status, so
ConvertToStatusOutputOverview and ParseDaemonStatus are exercised across these
values; iterate over a slice of cases that set
resp.GetStatus()/resp.GetFullStatus() inputs and expected
DaemonVersion/daemonStatus outputs, call ConvertToStatusOutputOverview with
ConvertOptions (DaemonVersion, DaemonStatus:
ParseDaemonStatus(resp.GetStatus())), and assert the serialized daemonStatus
matches expectations for each case to protect the new auth-related states.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ee57640b-689c-4dd2-999c-40b272fe6676

📥 Commits

Reviewing files that changed from the base of the PR and between b550a2f and 9349770.

📒 Files selected for processing (7)
  • client/cmd/status.go
  • client/internal/debug/debug.go
  • client/netbird-entrypoint.sh
  • client/status/status.go
  • client/status/status_test.go
  • client/wasm/cmd/main.go
  • proxy/internal/debug/handler.go

@lixmal lixmal force-pushed the feat/status-health-checks branch from 4f380f4 to 49f7635 Compare March 22, 2026 05:02
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
client/netbird-entrypoint.sh (1)

35-48: ⚠️ Potential issue | 🟠 Major

Zero timeouts still skip the loops and change behavior.

deadline=$((SECONDS + timeout)) means timeout=0 never enters either loop, so daemon startup exits immediately with failure and login always falls through to netbird up. That still breaks the previous no-wait semantics for NB_ENTRYPOINT_SERVICE_TIMEOUT=0 and NB_ENTRYPOINT_LOGIN_TIMEOUT=0.

Also applies to: 51-66

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/netbird-entrypoint.sh` around lines 35 - 48, The timeout handling in
wait_for_daemon_startup treats timeout=0 as an immediate deadline and causes an
erroneous failure; change the logic so a zero timeout means "no-wait" (succeed
immediately) by adding an explicit check at the start of wait_for_daemon_startup
that returns immediately if timeout is 0 (or only run the deadline loop when
timeout > 0). Apply the same fix to the equivalent login/service wait routine
(the handlers that use NB_ENTRYPOINT_SERVICE_TIMEOUT and
NB_ENTRYPOINT_LOGIN_TIMEOUT) so both functions return immediately for a zero
timeout instead of computing a deadline and exiting with failure.
🧹 Nitpick comments (1)
client/status/status_test.go (1)

179-179: Add an auth-required daemon-status fixture.

These assertions only cover Connected. The new contract here is structured JSON/YAML for non-connected daemon states too, so please add at least one NeedsLogin/LoginFailed/SessionExpired case to catch regressions back to the old plain-text flow.

Also applies to: 242-245, 336-336, 460-460

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/status/status_test.go` at line 179, Test fixtures currently only
assert DaemonStatus: DaemonStatusConnected; add at least one additional fixture
case that uses a non-connected structured daemon state (e.g., DaemonStatus:
DaemonStatusNeedsLogin or DaemonStatusLoginFailed or DaemonStatusSessionExpired)
so tests validate JSON/YAML output for auth-required flows; update the test
table entries (the cases around the existing DaemonStatusConnected occurrences)
to include a sample structured payload for the chosen status (a map/object with
a human message and any error code fields) and assert the serialized output
matches the expected structured JSON/YAML instead of plain text, referencing the
DaemonStatus enum and the test case names where the other Connected fixtures
exist.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/cmd/status.go`:
- Around line 230-237: The readiness check in checkReadiness currently casts the
raw string with internal.StatusType(resp.GetStatus()) which accepts any string;
replace that with internal.ParseDaemonStatus(resp.GetStatus()) to validate the
value, return the parse error if it fails, and then switch on the validated
status. Update the function checkReadiness to call internal.ParseDaemonStatus,
handle the returned (status, error) pair (return the error when non-nil), and
only treat known statuses (internal.StatusNeedsLogin,
internal.StatusLoginFailed, internal.StatusSessionExpired) as non-ready; all
other unrecognized values should cause a non-zero exit via the returned error.
- Around line 206-227: runHealthCheck currently always calls getStatus(ctx,
false) so checkStartup uses cached connectivity; change the logic to request
fresh probes for startup by calling getStatus(ctx, true) when checkFlag ==
"startup" (or select the force-probe boolean based on checkFlag before invoking
getStatus) so checkStartup evaluates current management/signal/relay
reachability; update references to getStatus, runHealthCheck, and checkStartup
accordingly.

---

Duplicate comments:
In `@client/netbird-entrypoint.sh`:
- Around line 35-48: The timeout handling in wait_for_daemon_startup treats
timeout=0 as an immediate deadline and causes an erroneous failure; change the
logic so a zero timeout means "no-wait" (succeed immediately) by adding an
explicit check at the start of wait_for_daemon_startup that returns immediately
if timeout is 0 (or only run the deadline loop when timeout > 0). Apply the same
fix to the equivalent login/service wait routine (the handlers that use
NB_ENTRYPOINT_SERVICE_TIMEOUT and NB_ENTRYPOINT_LOGIN_TIMEOUT) so both functions
return immediately for a zero timeout instead of computing a deadline and
exiting with failure.

---

Nitpick comments:
In `@client/status/status_test.go`:
- Line 179: Test fixtures currently only assert DaemonStatus:
DaemonStatusConnected; add at least one additional fixture case that uses a
non-connected structured daemon state (e.g., DaemonStatus:
DaemonStatusNeedsLogin or DaemonStatusLoginFailed or DaemonStatusSessionExpired)
so tests validate JSON/YAML output for auth-required flows; update the test
table entries (the cases around the existing DaemonStatusConnected occurrences)
to include a sample structured payload for the chosen status (a map/object with
a human message and any error code fields) and assert the serialized output
matches the expected structured JSON/YAML instead of plain text, referencing the
DaemonStatus enum and the test case names where the other Connected fixtures
exist.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4685635b-fc17-465c-9630-846cd06a0aa9

📥 Commits

Reviewing files that changed from the base of the PR and between 9349770 and 4f380f4.

📒 Files selected for processing (7)
  • client/cmd/status.go
  • client/internal/debug/debug.go
  • client/netbird-entrypoint.sh
  • client/status/status.go
  • client/status/status_test.go
  • client/wasm/cmd/main.go
  • proxy/internal/debug/handler.go
✅ Files skipped from review due to trivial changes (1)
  • client/status/status.go

@lixmal lixmal force-pushed the feat/status-health-checks branch from 49f7635 to d9f5f6b Compare March 22, 2026 05:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
client/status/status.go (1)

157-169: ⚠️ Potential issue | 🔴 Critical

Use getter methods to safely access error fields in potentially nil state objects.

GetManagementState() and GetSignalState() can return nil pointers. Directly accessing .Error on these results causes a panic. Use GetError() instead, which includes a nil receiver guard.

Fix
 	managementOverview := ManagementStateOutput{
 		URL:       managementState.GetURL(),
 		Connected: managementState.GetConnected(),
-		Error:     managementState.Error,
+		Error:     managementState.GetError(),
 	}
 
 	signalOverview := SignalStateOutput{
 		URL:       signalState.GetURL(),
 		Connected: signalState.GetConnected(),
-		Error:     signalState.Error,
+		Error:     signalState.GetError(),
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/status/status.go` around lines 157 - 169,
ConvertToStatusOutputOverview currently dereferences potential nil pointers by
reading managementState.Error and signalState.Error directly; update the
construction of ManagementStateOutput and SignalStateOutput to use the safe
getters (managementState.GetError() and signalState.GetError()) or otherwise
call GetError() on the pbFullStatus.GetManagementState()/GetSignalState()
results so you never directly access .Error on a possibly nil object (refer to
ConvertToStatusOutputOverview, GetManagementState, GetSignalState,
ManagementStateOutput, SignalStateOutput).
🧹 Nitpick comments (1)
client/status/status.go (1)

47-58: Keep one source of truth for name filters.

ConvertOptions now requires callers to keep PrefixNamesFilter and PrefixNamesFilterMap in sync, but the filter path only uses the slice to enable filtering and the map to do the actual match. That makes future call sites easy to get subtly wrong. Consider deriving the map inside ConvertToStatusOutputOverview or keying the filter off len(PrefixNamesFilterMap) instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/status/status.go` around lines 47 - 58, ConvertOptions currently
forces callers to keep PrefixNamesFilter and PrefixNamesFilterMap in sync;
update the implementation so ConvertToStatusOutputOverview is the single source
of truth by deriving the map from PrefixNamesFilter inside
ConvertToStatusOutputOverview (or if you prefer the map API, treat
PrefixNamesFilterMap as authoritative and derive the slice from it there) and
stop relying on callers to populate both; update logic in
ConvertToStatusOutputOverview to build/override PrefixNamesFilterMap from
PrefixNamesFilter (or vice versa) and use that derived map for matching so
callers only need to set one field.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/cmd/status.go`:
- Around line 206-225: runHealthCheck currently routes all probes through
getStatus with the full payload; change it to avoid full-status RPCs for "live"
and "ready": for "live" perform a lightweight connectivity/health RPC (or call
getStatus with GetFullPeerStatus=false) and return nil on success, for "ready"
call the lightweight status (getStatus with full=false) and then call
checkReadiness(resp) using only resp.Status, and keep the existing full
getStatus call (GetFullPeerStatus=true) only for the "startup" case so
checkStartup(resp) still receives the full payload; update the runHealthCheck
switch to call the appropriate lightweight or full status paths and reference
runHealthCheck, getStatus, checkReadiness and checkStartup when making the
changes.

---

Outside diff comments:
In `@client/status/status.go`:
- Around line 157-169: ConvertToStatusOutputOverview currently dereferences
potential nil pointers by reading managementState.Error and signalState.Error
directly; update the construction of ManagementStateOutput and SignalStateOutput
to use the safe getters (managementState.GetError() and signalState.GetError())
or otherwise call GetError() on the
pbFullStatus.GetManagementState()/GetSignalState() results so you never directly
access .Error on a possibly nil object (refer to ConvertToStatusOutputOverview,
GetManagementState, GetSignalState, ManagementStateOutput, SignalStateOutput).

---

Nitpick comments:
In `@client/status/status.go`:
- Around line 47-58: ConvertOptions currently forces callers to keep
PrefixNamesFilter and PrefixNamesFilterMap in sync; update the implementation so
ConvertToStatusOutputOverview is the single source of truth by deriving the map
from PrefixNamesFilter inside ConvertToStatusOutputOverview (or if you prefer
the map API, treat PrefixNamesFilterMap as authoritative and derive the slice
from it there) and stop relying on callers to populate both; update logic in
ConvertToStatusOutputOverview to build/override PrefixNamesFilterMap from
PrefixNamesFilter (or vice versa) and use that derived map for matching so
callers only need to set one field.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2a220d9c-2e47-43f0-99c1-8af0416fbc5b

📥 Commits

Reviewing files that changed from the base of the PR and between 4f380f4 and d9f5f6b.

📒 Files selected for processing (7)
  • client/cmd/status.go
  • client/internal/debug/debug.go
  • client/netbird-entrypoint.sh
  • client/status/status.go
  • client/status/status_test.go
  • client/wasm/cmd/main.go
  • proxy/internal/debug/handler.go
✅ Files skipped from review due to trivial changes (1)
  • client/wasm/cmd/main.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • client/status/status_test.go
  • client/internal/debug/debug.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
client/cmd/status.go (1)

206-275: Add direct coverage for the new probe contract.

This code now controls the container entrypoint, but there’s still no focused coverage for live/ready/startup, auth-required states, unknown statuses, or missing FullStatus. A small table-driven test around checkReadiness and checkStartup would make this much harder to regress.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/cmd/status.go` around lines 206 - 275, Add table-driven unit tests for
the new probe contract by creating tests (e.g. in client/cmd/status_test.go)
that directly exercise checkReadiness and checkStartup using crafted
*proto.StatusResponse values; cover success cases (internal.StatusIdle,
StatusConnecting, StatusConnected -> no error), auth-required error states
(StatusNeedsLogin, StatusLoginFailed, StatusSessionExpired -> expect error), an
unknown status case (expect error), and for checkStartup cover nil FullStatus
(expect error), management disconnected (expect error), signal disconnected
(expect error), relays present but none available (expect error), relays with
non-rel:// URIs ignored, and a happy-path where management/signal are connected
and at least one rel:// relay is available (no error); build responses with
resp.GetFullStatus() values (ManagementState, SignalState, Relays with
GetURI/GetAvailable) and assert errors/messages from checkReadiness and
checkStartup accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/cmd/status.go`:
- Around line 206-229: runHealthCheck currently initializes logging and calls
getStatus before validating the --check value, allowing invalid inputs to
trigger unnecessary RPCs; add validation of checkFlag (acceptable values:
"live", "ready", "startup") at the top of runHealthCheck (before util.InitLog
and getStatus) and return a clear error for invalid values so the CLI fails fast
without contacting the daemon; reference checkFlag, the local check variable,
and functions runHealthCheck and getStatus when making the change.

In `@client/netbird-entrypoint.sh`:
- Around line 4-5: The new 30s defaults in netbird-entrypoint.sh for
NB_ENTRYPOINT_SERVICE_TIMEOUT and NB_ENTRYPOINT_LOGIN_TIMEOUT are ineffective
because client/Dockerfile and client/Dockerfile-rootless export hardcoded
smaller values; update those Dockerfiles to stop exporting the old hardcoded
envs or change their exported values to the new defaults so the script can
actually use 30s. Specifically, remove or modify any ENV or export lines that
set NB_ENTRYPOINT_SERVICE_TIMEOUT and NB_ENTRYPOINT_LOGIN_TIMEOUT in
client/Dockerfile and client/Dockerfile-rootless so they either defer to
netbird-entrypoint.sh defaults or match the 30s/30s values referenced in
netbird-entrypoint.sh (ensure variable names NB_ENTRYPOINT_SERVICE_TIMEOUT and
NB_ENTRYPOINT_LOGIN_TIMEOUT are the ones updated).

---

Nitpick comments:
In `@client/cmd/status.go`:
- Around line 206-275: Add table-driven unit tests for the new probe contract by
creating tests (e.g. in client/cmd/status_test.go) that directly exercise
checkReadiness and checkStartup using crafted *proto.StatusResponse values;
cover success cases (internal.StatusIdle, StatusConnecting, StatusConnected ->
no error), auth-required error states (StatusNeedsLogin, StatusLoginFailed,
StatusSessionExpired -> expect error), an unknown status case (expect error),
and for checkStartup cover nil FullStatus (expect error), management
disconnected (expect error), signal disconnected (expect error), relays present
but none available (expect error), relays with non-rel:// URIs ignored, and a
happy-path where management/signal are connected and at least one rel:// relay
is available (no error); build responses with resp.GetFullStatus() values
(ManagementState, SignalState, Relays with GetURI/GetAvailable) and assert
errors/messages from checkReadiness and checkStartup accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4dbbdeb5-374e-4048-8614-f0ed0d0a2269

📥 Commits

Reviewing files that changed from the base of the PR and between d9f5f6b and 239ad44.

📒 Files selected for processing (7)
  • client/cmd/status.go
  • client/internal/debug/debug.go
  • client/netbird-entrypoint.sh
  • client/status/status.go
  • client/status/status_test.go
  • client/wasm/cmd/main.go
  • proxy/internal/debug/handler.go
✅ Files skipped from review due to trivial changes (2)
  • client/wasm/cmd/main.go
  • client/status/status_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • proxy/internal/debug/handler.go
  • client/status/status.go
  • client/internal/debug/debug.go

@lixmal lixmal force-pushed the feat/status-health-checks branch from 239ad44 to 44f69f3 Compare March 22, 2026 06:12
@sonarqubecloud
Copy link

@mlsmaycon mlsmaycon merged commit 8276228 into main Mar 22, 2026
42 checks passed
@mlsmaycon mlsmaycon deleted the feat/status-health-checks branch March 22, 2026 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants