Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
84f5799
test: surface DrainRegistryInFlight failures immediately via result c…
bnema Mar 1, 2026
c7a2b36
fix: recover missing ImageID via inspect for redundant deploy skip
bnema Mar 2, 2026
5f1ff6a
fix: resolve existing container from runtime when memory is stale
bnema Mar 2, 2026
dca4ad5
fix: orphan cleanup never kills running canonical container
bnema Mar 2, 2026
1c38208
feat: add TCP and HTTP readiness probes
bnema Mar 2, 2026
43c7dfb
feat: readiness cascade — healthcheck > HTTP probe > TCP probe > delay
bnema Mar 2, 2026
b3c4051
feat: post-switch stabilization window with automatic rollback
bnema Mar 2, 2026
e6981de
feat: add deploy config for stabilization delay and probe timeouts
bnema Mar 2, 2026
2a109aa
test: add strict zero-downtime ordering tests
bnema Mar 2, 2026
5d857a7
fix: address code review findings for zero-downtime deploy
bnema Mar 3, 2026
0fd48ee
refactor: address CodeRabbit review nitpicks
bnema Mar 3, 2026
5c94f33
chore: remove dead dns_suffix config field
bnema Mar 3, 2026
8147c67
refactor: address CodeRabbit review and fix lint gate
bnema Mar 3, 2026
2eb86f5
feat(registry): add deploy event suppression for CLI-managed pushes
bnema Mar 4, 2026
5c9a313
feat(admin): add deploy-intent endpoint to suppress event-based deploys
bnema Mar 4, 2026
5ba38cf
feat(cli): call deploy-intent before push to prevent event-based doub…
bnema Mar 4, 2026
f5d1226
feat(admin): clear deploy event suppression after successful deploy
bnema Mar 4, 2026
b5f1ca3
fix: address code review findings from CR
bnema Mar 4, 2026
7d1dad7
fix: normalize suppression keys in registry suppression methods
bnema Mar 4, 2026
87a4f06
fix: context-aware sleep in TCP and HTTP probe retry loops
bnema Mar 4, 2026
593697c
fix: use SO_REUSEADDR in delayed listener test to prevent TIME_WAIT f…
bnema Mar 4, 2026
fae4729
refactor: use parseResponse in DeployIntent for consistent error hand…
bnema Mar 4, 2026
a2cc8dc
refactor: extract testMinDelayConfig helper in container service tests
bnema Mar 4, 2026
bf45a07
refactor: split deploy coordination from registry boundary
bnema Mar 4, 2026
08d30e7
fix: address code review findings in DeployIntent, probe sleep, drain…
bnema Mar 4, 2026
11dcb09
refactor: trim DeployIntent input, extract probeLoop helper, fix test…
bnema Mar 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion docs/config/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,6 @@ preserve = true # Keep volumes on container removal
[network_isolation]
enabled = true # Per-app isolated networks
network_prefix = "gordon" # Network name prefix
dns_suffix = ".internal" # DNS suffix for services

# Auto-route
[auto_route]
Expand Down
16 changes: 0 additions & 16 deletions docs/config/network-isolation.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ Isolate applications in separate Docker networks for enhanced security.
[network_isolation]
enabled = true
network_prefix = "gordon"
dns_suffix = ".internal"
```

## Options
Expand All @@ -17,7 +16,6 @@ dns_suffix = ".internal"
|--------|------|---------|-------------|
| `enabled` | bool | `false` | Enable per-app network isolation |
| `network_prefix` | string | `"gordon"` | Prefix for created networks |
| `dns_suffix` | string | `".internal"` | DNS suffix for service discovery |

## How It Works

Expand Down Expand Up @@ -99,19 +97,6 @@ db = connect("postgresql://postgres:5432/mydb")
cache = connect("redis://redis:6379")
```

## DNS Resolution

The `dns_suffix` option adds a suffix for internal DNS resolution:

```toml
[network_isolation]
dns_suffix = ".internal"
```

Services can be accessed as:
- `postgres` (short form)
- `postgres.internal` (with suffix)

## Examples

### Basic Isolation
Expand All @@ -137,7 +122,6 @@ Each app gets its own network with its own database.
[network_isolation]
enabled = true
network_prefix = "prod"
dns_suffix = ".internal"

[routes]
"app.company.com" = "company-app:v2.1.0"
Expand Down
2 changes: 0 additions & 2 deletions docs/config/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,6 @@ enabled = false # Create routes from image labels a
[network_isolation]
enabled = false # Enable per-app Docker networks
network_prefix = "gordon" # Prefix for created networks
dns_suffix = ".internal" # DNS suffix for internal resolution

# =============================================================================
# VOLUMES
Expand Down Expand Up @@ -215,7 +214,6 @@ keep_last = 3 # Keep N newest tags per repository
| `auto_route.enabled` | `false` | Auto-route disabled |
| `network_isolation.enabled` | `false` | Isolation disabled |
| `network_isolation.network_prefix` | `"gordon"` | Network prefix |
| `network_isolation.dns_suffix` | `".internal"` | DNS suffix |
| `volumes.auto_create` | `true` | Auto-create volumes |
| `volumes.prefix` | `"gordon"` | Volume prefix |
| `volumes.preserve` | `true` | Keep volumes |
Expand Down
22 changes: 21 additions & 1 deletion docs/reference/docker-labels.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,27 @@ Labels you can set in your Dockerfile:

| Label | Example | Description |
|-------|---------|-------------|
| `gordon.proxy.port` | `"3000"` | Port to proxy HTTP traffic to |
| `gordon.domains` | `"app.example.com,www.app.example.com"` | Comma-separated domains for auto-route |
| `gordon.port` | `"3000"` | Port to proxy HTTP traffic to |
| `gordon.proxy.port` | `"3000"` | Port to proxy HTTP traffic to (legacy alias for `gordon.port`) |
| `gordon.health` | `"/healthz"` | HTTP health check endpoint path for readiness probing |
| `gordon.env-file` | `"/app/.env.example"` | Path to env template file inside the image |

### Health Check Label

When `gordon.health` is set, Gordon performs HTTP GET requests to the specified
path during deployment and waits for a 2xx or 3xx response before routing traffic
to the new container:

```dockerfile
FROM node:20-alpine
LABEL gordon.health="/api/health"
EXPOSE 3000
CMD ["node", "server.js"]
```

Gordon probes `http://<container-ip>:3000/api/health` until it gets a successful
response or the `deploy.http_probe_timeout` is reached.

### Proxy Port Label

Expand Down
1 change: 1 addition & 0 deletions internal/adapters/in/cli/controlplane.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ type ControlPlane interface {

GetStatus(ctx context.Context) (*remote.Status, error)
Reload(ctx context.Context) error
DeployIntent(ctx context.Context, imageName string) error
Deploy(ctx context.Context, deployDomain string) (*remote.DeployResult, error)
Restart(ctx context.Context, restartDomain string, withAttachments bool) (*remote.RestartResult, error)
ListTags(ctx context.Context, repository string) ([]string, error)
Expand Down
19 changes: 18 additions & 1 deletion internal/adapters/in/cli/controlplane_local.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ type localControlPlane struct {
containerSvc in.ContainerService
backupSvc in.BackupService
registrySvc in.RegistryService
deployCoord in.DeployCoordinator
healthSvc in.HealthService
logSvc in.LogService
}
Expand All @@ -27,12 +28,21 @@ func NewLocalControlPlane(kernel *app.Kernel) ControlPlane {
return &localControlPlane{}
}

registrySvc := kernel.Registry()
var deployCoord in.DeployCoordinator
if registrySvc != nil {
if coordinator, ok := any(registrySvc).(in.DeployCoordinator); ok {
deployCoord = coordinator
}
}

return &localControlPlane{
configSvc: kernel.Config(),
secretSvc: kernel.Secrets(),
containerSvc: kernel.Container(),
backupSvc: kernel.Backup(),
registrySvc: kernel.Registry(),
registrySvc: registrySvc,
deployCoord: deployCoord,
healthSvc: kernel.Health(),
logSvc: kernel.Logs(),
}
Expand Down Expand Up @@ -223,6 +233,13 @@ func (l *localControlPlane) Reload(_ context.Context) error {
return app.SendReloadSignal()
}

func (l *localControlPlane) DeployIntent(_ context.Context, imageName string) error {
if l.deployCoord != nil {
l.deployCoord.SuppressDeployEvent(imageName)
}
return nil
}

func (l *localControlPlane) Deploy(ctx context.Context, deployDomain string) (*remote.DeployResult, error) {
if l.containerSvc != nil && l.configSvc != nil {
route, err := l.configSvc.GetRoute(ctx, deployDomain)
Expand Down
4 changes: 4 additions & 0 deletions internal/adapters/in/cli/controlplane_remote.go
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@ func (r *remoteControlPlane) Reload(ctx context.Context) error {
return r.client.Reload(ctx)
}

func (r *remoteControlPlane) DeployIntent(ctx context.Context, imageName string) error {
return r.client.DeployIntent(ctx, imageName)
}

func (r *remoteControlPlane) Deploy(ctx context.Context, deployDomain string) (*remote.DeployResult, error) {
return r.client.Deploy(ctx, deployDomain)
}
Expand Down
9 changes: 9 additions & 0 deletions internal/adapters/in/cli/push.go
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,15 @@ func runPush(ctx context.Context, imageArg, domainFlag, tag string, build bool,
}
fmt.Printf("Domain: %s\n", styles.Theme.Bold.Render(pushDomain))

// Signal the server to suppress event-based deploys for this image.
// The CLI will trigger an explicit deploy after push completes.
if !noDeploy {
if err := handle.plane.DeployIntent(ctx, imageName); err != nil {
// Non-fatal: worst case we get a redundant deploy via event
fmt.Fprintf(os.Stderr, "warning: failed to register deploy intent: %v\n", err)
}
}
Comment on lines +159 to +164
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

--no-deploy can still trigger a deployment.

At Line 159, suppression is skipped when noDeploy is true. That allows registry image.pushed events to trigger deploys, which breaks the “push only” contract for --no-deploy.

Suggested fix
-	if !noDeploy {
-		if err := handle.plane.DeployIntent(ctx, imageName); err != nil {
-			// Non-fatal: worst case we get a redundant deploy via event
-			fmt.Fprintf(os.Stderr, "warning: failed to register deploy intent: %v\n", err)
-		}
-	}
+	if err := handle.plane.DeployIntent(ctx, imageName); err != nil {
+		// Non-fatal: worst case event-based deploy behavior is unchanged
+		fmt.Fprintf(os.Stderr, "warning: failed to register deploy intent: %v\n", err)
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if !noDeploy {
if err := handle.plane.DeployIntent(ctx, imageName); err != nil {
// Non-fatal: worst case we get a redundant deploy via event
fmt.Fprintf(os.Stderr, "warning: failed to register deploy intent: %v\n", err)
}
}
if err := handle.plane.DeployIntent(ctx, imageName); err != nil {
// Non-fatal: worst case event-based deploy behavior is unchanged
fmt.Fprintf(os.Stderr, "warning: failed to register deploy intent: %v\n", err)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/adapters/in/cli/push.go` around lines 159 - 164, The current logic
skips calling handle.plane.DeployIntent when noDeploy is true, which leaves the
registry image.pushed events free to trigger deployments; always record a
suppression intent for imageName instead of skipping registration: modify the
flow around noDeploy to either call a new method (e.g.,
handle.plane.SuppressDeploy(ctx, imageName)) or extend handle.plane.DeployIntent
to accept a noDeploy flag so an intent is stored for imageName even when
noDeploy is true, and ensure the event handler checks that stored intent before
performing an actual deploy.


if build {
if err := buildAndPush(ctx, version, platform, dockerfile, buildArgs, versionRef, latestRef); err != nil {
return err
Expand Down
14 changes: 14 additions & 0 deletions internal/adapters/in/cli/remote/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -724,6 +724,20 @@ func (c *Client) Deploy(ctx context.Context, deployDomain string) (*DeployResult
return &result, nil
}

// DeployIntent tells the server that a CLI-managed push is about to happen,
// suppressing event-based deploys for this image.
func (c *Client) DeployIntent(ctx context.Context, imageName string) error {
imageName = strings.TrimSpace(imageName)
if imageName == "" {
return fmt.Errorf("image name cannot be empty")
}
resp, err := c.requestWithRetry(ctx, http.MethodPost, "/deploy-intent/"+url.PathEscape(imageName), nil)
if err != nil {
return err
}
return parseResponse(resp, nil)
}

// Restart API

// RestartResult contains the result of a restart.
Expand Down
62 changes: 60 additions & 2 deletions internal/adapters/in/http/admin/handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import (
"fmt"
"io"
"net/http"
"net/url"
"strconv"
"strings"
"time"
Expand All @@ -27,6 +28,11 @@ const maxAdminRequestSize = 1 << 20 // 1MB
// maxLogLines is the maximum allowed number of log lines that can be requested.
const maxLogLines = 10000

type registryDeployService interface {
in.RegistryService
in.DeployCoordinator
}

// Handler implements the HTTP handler for the admin API.
type Handler struct {
configSvc in.ConfigService
Expand All @@ -37,7 +43,7 @@ type Handler struct {
healthSvc in.HealthService
secretSvc in.SecretService
logSvc in.LogService
registrySvc in.RegistryService
registrySvc registryDeployService
eventBus out.EventPublisher
log zerowrap.Logger
}
Expand Down Expand Up @@ -129,7 +135,7 @@ func NewHandler(
healthSvc in.HealthService,
secretSvc in.SecretService,
logSvc in.LogService,
registrySvc in.RegistryService,
registrySvc registryDeployService,
eventBus out.EventPublisher,
log zerowrap.Logger,
backupSvc in.BackupService,
Expand Down Expand Up @@ -213,6 +219,7 @@ func (h *Handler) matchRoute(path string) (routeHandler, bool) {
{"/routes/by-image", h.handleRoutesByImage},
{"/routes", h.handleRoutes},
{"/secrets", h.handleSecrets},
{"/deploy-intent", h.handleDeployIntent},
{"/deploy", h.handleDeploy},
{"/restart", h.handleRestart},
{"/tags", h.handleTags},
Expand All @@ -228,6 +235,48 @@ func (h *Handler) matchRoute(path string) (routeHandler, bool) {
return nil, false
}

// handleDeployIntent handles /admin/deploy-intent/:image endpoint.
// It registers a deploy intent, suppressing event-based deploys for the image.
func (h *Handler) handleDeployIntent(w http.ResponseWriter, r *http.Request, path string) {
if r.Method != http.MethodPost {
h.sendError(w, http.StatusMethodNotAllowed, "method not allowed")
return
}

ctx := r.Context()
if !HasAccess(ctx, domain.AdminResourceConfig, domain.AdminActionWrite) {
h.sendError(w, http.StatusForbidden, "insufficient permissions for config:write")
return
}

if h.registrySvc == nil {
h.sendError(w, http.StatusServiceUnavailable, "registry service unavailable")
return
}

rawName := strings.TrimPrefix(path, "/deploy-intent/")
if rawName == "" || rawName == "/deploy-intent" {
h.sendError(w, http.StatusBadRequest, "image name required")
return
}

imageName, err := url.PathUnescape(rawName)
if err != nil {
h.sendError(w, http.StatusBadRequest, "invalid image name encoding")
return
}

log := zerowrap.FromCtx(ctx)
log.Info().Str("image", imageName).Msg("deploy intent registered, suppressing image.pushed events")

h.registrySvc.SuppressDeployEvent(imageName)

h.sendJSON(w, http.StatusOK, map[string]string{
"status": "ok",
"image": imageName,
})
}

// sendJSON sends a JSON response.
func (h *Handler) sendJSON(w http.ResponseWriter, status int, data any) {
w.Header().Set("Content-Type", "application/json")
Expand Down Expand Up @@ -1078,6 +1127,15 @@ func (h *Handler) handleDeploy(w http.ResponseWriter, r *http.Request, path stri
return
}

// Clear deploy event suppression now that the explicit deploy has completed.
// This re-enables event-based deploys for future direct docker pushes.
if route.Image != "" && h.registrySvc != nil {
// Use the registry package's image name normaliser so digest-form refs
// and multi-segment paths are handled correctly.
imageName := registry.ExtractImageName(route.Image)
h.registrySvc.ClearDeployEventSuppression(imageName)
}

log.Info().Str("domain", deployDomain).Str("container_id", container.ID).Msg("container deployed via admin API")
h.sendJSON(w, http.StatusOK, dto.DeployResponse{
Status: "deployed",
Expand Down
8 changes: 6 additions & 2 deletions internal/app/run.go
Original file line number Diff line number Diff line change
Expand Up @@ -1187,12 +1187,14 @@ func createContainerService(ctx context.Context, v *viper.Viper, cfg Config, svc
VolumePreserve: v.GetBool("volumes.preserve"),
NetworkIsolation: v.GetBool("network_isolation.enabled"),
NetworkPrefix: v.GetString("network_isolation.network_prefix"),
DNSSuffix: v.GetString("network_isolation.dns_suffix"),
NetworkGroups: svc.configSvc.GetNetworkGroups(),
Attachments: svc.configSvc.GetAttachments(),
ReadinessDelay: v.GetDuration("deploy.readiness_delay"),
ReadinessMode: v.GetString("deploy.readiness_mode"),
HealthTimeout: v.GetDuration("deploy.health_timeout"),
StabilizationDelay: v.GetDuration("deploy.stabilization_delay"),
TCPProbeTimeout: v.GetDuration("deploy.tcp_probe_timeout"),
HTTPProbeTimeout: v.GetDuration("deploy.http_probe_timeout"),
DrainDelay: v.GetDuration("deploy.drain_delay"),
DrainMode: v.GetString("deploy.drain_mode"),
DrainTimeout: v.GetDuration("deploy.drain_timeout"),
Expand Down Expand Up @@ -2346,7 +2348,6 @@ func loadConfig(v *viper.Viper, configPath string) error {
v.SetDefault("auto_route.enabled", false)
v.SetDefault("network_isolation.enabled", false)
v.SetDefault("network_isolation.network_prefix", "gordon")
v.SetDefault("network_isolation.dns_suffix", ".internal")
v.SetDefault("volumes.auto_create", true)
v.SetDefault("volumes.prefix", "gordon")
v.SetDefault("volumes.preserve", true)
Expand Down Expand Up @@ -2374,6 +2375,9 @@ func loadConfig(v *viper.Viper, configPath string) error {
v.SetDefault("deploy.readiness_delay", "5s")
v.SetDefault("deploy.readiness_mode", "auto")
v.SetDefault("deploy.health_timeout", "90s")
v.SetDefault("deploy.stabilization_delay", "2s")
v.SetDefault("deploy.tcp_probe_timeout", "30s")
v.SetDefault("deploy.http_probe_timeout", "60s")
v.SetDefault("deploy.drain_mode", "auto")
v.SetDefault("deploy.drain_timeout", "30s")

Expand Down
10 changes: 10 additions & 0 deletions internal/boundaries/in/deploy.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
package in

// DeployCoordinator defines deploy coordination operations.
//
// These methods let the CLI/admin API suppress image-pushed deploy events
// while an explicit deploy flow is in progress.
type DeployCoordinator interface {
SuppressDeployEvent(imageName string)
ClearDeployEventSuppression(imageName string)
}
Loading
Loading