Skip to content

Replace os.Exit in binoculars StartUp with error return#4782

Open
gnosyslambda wants to merge 1 commit intoarmadaproject:masterfrom
gnosyslambda:fix/replace-os-exit-binoculars
Open

Replace os.Exit in binoculars StartUp with error return#4782
gnosyslambda wants to merge 1 commit intoarmadaproject:masterfrom
gnosyslambda:fix/replace-os-exit-binoculars

Conversation

@gnosyslambda
Copy link
Copy Markdown

Fixes #4779

StartUp() in internal/binoculars/server.go was calling os.Exit(-1) on init errors, which makes the function impossible to test and prevents the caller from doing any cleanup.

Changed the function to return an error instead, and updated the caller in cmd/binoculars/main.go to handle it with log.Fatalf. The end result is the same (process exits on failure), but now the logic is testable and the caller is in control.

Changes:

  • StartUp signature: (func(), *sync.WaitGroup)(func(), *sync.WaitGroup, error)
  • Two os.Exit(-1) calls replaced with fmt.Errorf + %w wrapping
  • Removed unused os and log imports from server.go
  • Caller in main.go updated to check the returned error

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR cleanly refactors StartUp in internal/binoculars/server.go to return an error instead of calling os.Exit(-1), making the function testable and giving the caller control over cleanup. The caller in cmd/binoculars/main.go is updated to handle the error by explicitly invoking both deferred shutdown functions (shutdownGateway, shutdownMetricServer) before calling os.Exit(1), ensuring resources are cleaned up correctly since os.Exit skips deferred calls.

Key changes and observations:

  • StartUp signature changed from (func(), *sync.WaitGroup)(func(), *sync.WaitGroup, error), with two os.Exit(-1) calls replaced by fmt.Errorf("%w", ...) returns.
  • wg creation and wg.Add(1) are now correctly placed just before grpcCommon.Listen, removing the previously raised concern about incrementing the WaitGroup counter on error paths where wg.Done() would never be called.
  • The caller in main.go explicitly shuts down the gateway and metric server before os.Exit(1), correctly addressing the deferred-cleanup bypass issue.
  • Note for follow-up: grpcCommon.Listen (in internal/common/grpc/grpc.go) still calls log.Fatalf internally when port binding or serving fails (there is already a TODO comment on that line). This means StartUp is not yet fully free from hidden os.Exit paths, and a follow-up PR similar to this one for grpcCommon.Listen would complete the pattern.

Confidence Score: 5/5

  • This PR is safe to merge — the changes are minimal, correct, and address all previously raised concerns.
  • The refactor is straightforward: two os.Exit calls replaced with error returns, WaitGroup creation moved to the correct location, and the caller updated to handle explicit cleanup before exiting with a non-zero code. No logic changes, no regressions, and all previously raised review concerns are addressed.
  • No files require special attention.

Important Files Changed

Filename Overview
internal/binoculars/server.go Cleanly refactors StartUp to return errors instead of calling os.Exit(-1). The wg creation is correctly moved to just before grpcCommon.Listen, addressing the previously raised WaitGroup-before-fallible-init concern. Error messages use %w for proper wrapping. No issues found.
cmd/binoculars/main.go Caller correctly handles the new error return: logs the error, explicitly invokes both deferred shutdowns (shutdownGateway, shutdownMetricServer) before os.Exit(1), ensuring cleanup runs despite os.Exit skipping deferred calls. No issues found.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[main.go: main] --> B[ServeMetrics\ndefer shutdownMetricServer]
    B --> C[serveHttp\ndefer shutdownGateway]
    C --> D[binoculars.StartUp]
    D --> E{Kubernetes\nClient?}
    E -- error --> F[return nil, nil, error]
    F --> G[log.Errorf]
    G --> H[shutdownGateway explicit]
    H --> I[shutdownMetricServer explicit]
    I --> J[os.Exit 1]
    E -- ok --> K{ConfigureAuth?}
    K -- error --> F
    K -- ok --> L[CreateGrpcServer\nSetup Services]
    L --> M[wg := WaitGroup\nwg.Add 1\ngrpcCommon.Listen]
    M --> N[return GracefulStop, wg, nil]
    N --> O[Wait for stop signal\nshutdown]
    O --> P[wg.Wait]
    P --> Q[deferred shutdownGateway\ndeferred shutdownMetricServer]
Loading

Last reviewed commit: "refactor: replace os..."

@gnosyslambda gnosyslambda force-pushed the fix/replace-os-exit-binoculars branch from 3b7794f to 2c7c7fb Compare March 19, 2026 05:17
@gnosyslambda gnosyslambda force-pushed the fix/replace-os-exit-binoculars branch from c4f985c to 194f737 Compare March 19, 2026 05:49
- StartUp now returns error instead of calling os.Exit(-1)
- WaitGroup creation moved next to grpcCommon.Listen
- Caller explicitly shuts down gateway and metrics before os.Exit(1)

Fixes armadaproject#4779

Signed-off-by: gnosyslambda <gnosyslambda@gmail.com>
@gnosyslambda gnosyslambda force-pushed the fix/replace-os-exit-binoculars branch from 194f737 to 3f2bf70 Compare March 19, 2026 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace os.Exit in binoculars library with error return

1 participant