v2.1.0: Circuit breaker, metrics, event-driven architecture#2
Merged
Conversation
Major enhancements to the Go rewrite: - Circuit breaker with exponential backoff and restart budgets to prevent restart storms. Per-container action labels (restart/stop/notify/none). - Prometheus /metrics endpoint with counters, gauges, and histograms for restarts, skips, notifications, and event processing. - Event-driven Docker watcher with auto-reconnect (polling fallback for tests). Debouncing and real-time orchestration tracking. - Notification rate limiting and retry with exponential backoff (3 attempts). - Testability interfaces (docker.API, clock.Clock, notify.Notifier) with mock implementations and 25+ unit tests (60% guardian coverage). - Config validation, WaitGroup goroutine management, defensive guards. - CI: coverage reporting, govulncheck, Trivy scanning, 3 new acceptance tests (opt-out, circuit-breaker, custom-label), failure artifact capture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Will-Luck
added a commit
that referenced
this pull request
Feb 10, 2026
v2.1.0: Circuit breaker, metrics, event-driven architecture
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
autoheal.action= restart/stop/notify/none) to prevent restart storms/metricsendpoint with 9 metric definitions — restart counters, skip counters, notification tracking, event processing histograms, unhealthy/circuit gaugesdocker.API,clock.Clock,notify.Notifierinterfaces with mock implementations. 25+ unit tests achieving 60% coverage on guardian packageNew Environment Variables
AUTOHEAL_BACKOFF_MULTIPLIERAUTOHEAL_BACKOFF_MAXAUTOHEAL_BACKOFF_RESET_AFTERAUTOHEAL_RESTART_BUDGETAUTOHEAL_RESTART_WINDOWMETRICS_PORTNOTIFY_RATE_LIMITTest plan
go build ./cmd/guardiancompilesgolangci-lint run ./...lint cleango test -count=1 ./...all unit tests passdocker buildimage builds successfully🤖 Generated with Claude Code