-
Notifications
You must be signed in to change notification settings - Fork 3
Cluster Mode
Cluster mode lets a single Sentinel server manage containers across multiple Docker hosts via lightweight agents. Communication uses gRPC bidirectional streaming over mutual TLS (TLS 1.3 minimum), with protobuf-serialised messages.

┌──────────────────────────────┐
│ Server │
│ - Web UI :8080 │
│ - gRPC :9443 │
│ - Certificate Authority │
│ - Host registry (BoltDB) │
└──────┬───────────┬────────────┘
│ │
gRPC/mTLS gRPC/mTLS
│ │
┌────────────┘ └────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Agent (host-a) │ │ Agent (host-b) │
│ Reports state │ │ Reports state │
│ Executes cmds │ │ Executes cmds │
└─────────────────┘ └─────────────────┘
| Component | Role |
|---|---|
| Server | Central dashboard, certificate authority, command dispatcher, host registry |
| Agent | Connects to server, reports container state, executes commands locally |
| Transport | gRPC with protobuf serialisation, bidirectional streaming |
| Security | Mutual TLS with built-in CA. Server signs each agent's certificate. |
Defined in internal/cluster/proto/sentinel.proto.
| Service | RPC | Type | Description |
|---|---|---|---|
EnrollmentService |
Enroll |
Unary | Agent presents token + CSR, receives signed cert |
AgentService |
Channel |
Bidi-stream | Persistent command/event channel |
AgentService |
ReportState |
Unary | Full container state snapshot |
Server to Agent:
| Message | Purpose |
|---|---|
ListContainersRequest |
Request a fresh container list |
UpdateContainerRequest |
Trigger a container update (pull, stop, recreate, start) |
ContainerActionRequest |
Stop, start, or restart a container |
FetchLogsRequest |
Retrieve container logs |
PullImageRequest |
Pre-pull an image |
RunHookRequest |
Execute a command inside a container |
PolicySync |
Push policy updates to the agent cache |
SettingsSync |
Push operational settings to the agent cache |
CertRenewalResponse |
Deliver a renewed certificate |
Agent to Server:
| Message | Purpose |
|---|---|
Heartbeat |
Periodic keepalive with version and feature flags |
ContainerList |
Container state snapshot |
UpdateResult |
Outcome of an update operation |
ContainerActionResult |
Outcome of a stop/start/restart |
FetchLogsResult |
Container log output |
HookResult |
Hook execution result with exit code |
RollbackResult |
Outcome of a rollback |
OfflineJournal |
Batch replay of actions taken while disconnected |
CertRenewalCSR |
Certificate renewal request |
docker run -d \
--name sentinel-server \
--restart unless-stopped \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v sentinel-data:/data \
-p 8080:8080 \
-p 9443:9443 \
-e SENTINEL_CLUSTER=true \
ghcr.io/will-luck/docker-sentinel:latest| Port | Purpose |
|---|---|
| 8080 | Web UI and REST API |
| 9443 | gRPC cluster endpoint (mTLS) |
Alternatively, run without SENTINEL_CLUSTER=true and select Server role with cluster mode enabled during the setup wizard at http://server:8080/setup.
On first start, the server:
- Creates a self-signed CA in
SENTINEL_CLUSTER_DIR(default/data/cluster). - Generates an HMAC signing key for enrollment tokens.
- Issues an ephemeral server certificate from the CA.
- Starts the gRPC listener on
SENTINEL_CLUSTER_PORT.
- On the server, go to the Cluster page and click Generate Enrollment Token. Tokens are single-use and expire after the configured duration.
- Start a plain Sentinel container on the agent host:
docker run -d \
--name sentinel-agent \
--restart unless-stopped \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v sentinel-agent-data:/data \
-p 8080:8080 \
ghcr.io/will-luck/docker-sentinel:latest- Navigate to
http://agent:8080/setup, select Agent role, enter the server address and enrollment token, and complete the wizard.
Set SENTINEL_ENROLL_TOKEN and the agent auto-enrolls on startup with no wizard needed.
docker run -d \
--name sentinel-agent \
--restart unless-stopped \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v sentinel-agent-data:/data \
-e SENTINEL_MODE=agent \
-e SENTINEL_SERVER_ADDR=10.0.0.10:9443 \
-e SENTINEL_ENROLL_TOKEN=<token> \
-e SENTINEL_HOST_NAME=worker-1 \
ghcr.io/will-luck/docker-sentinel:latest- Server admin generates a one-time token (
POST /api/cluster/enroll-token). Only the HMAC-SHA256 hash is stored; the plaintext is shown once. - Agent generates an ECDSA P-256 key pair and creates a PKCS#10 CSR.
- Agent connects to the server without a client certificate (TLS with
InsecureSkipVerifysince it doesn't have the CA cert yet). - Agent sends
EnrollRPC with the token and CSR. - Server validates the token (HMAC comparison), marks it used, signs the CSR.
- Server returns: host ID, CA certificate PEM, signed agent certificate PEM.
- Agent persists
ca.pem,agent.pem,agent-key.pem, andhost-idtoSENTINEL_CLUSTER_DIR. - All subsequent connections use mTLS with the enrolled certificate.
| Property | Detail |
|---|---|
| CA algorithm | ECDSA P-256 |
| CA validity | 10 years |
| Certificate validity | 1 year (server and agent) |
| Minimum TLS version | 1.3 |
| Server cert key usage | ServerAuth + ClientAuth |
| Agent cert key usage | ClientAuth only |
| Server SAN | localhost, 127.0.0.1, ::1, plus all private IPs from host interfaces, plus any entries from SENTINEL_CLUSTER_ADVERTISE
|
| Token signing | HMAC-SHA256 with a random 32-byte key (persisted to hmac-key.bin) |
| Revocation | Certificate serial added to BoltDB CRL; checked at TLS handshake and per-RPC |
| Certificate renewal | Agent sends a new CSR when cert approaches expiry; server signs and delivers it inline |
The enrollment token signing key is a dedicated random secret (not derived from the CA certificate). It is generated on first run and stored with 0600 permissions.
| State | Description |
|---|---|
| Active | Agent connected. Containers visible on the dashboard. Commands dispatched normally. |
| Paused | No new updates dispatched. In-progress operations finish. Agent remains connected. |
| Decommissioned | Certificate revoked. Agent cannot reconnect without re-enrolling. |
| Action | API | Effect |
|---|---|---|
| Pause | POST /api/cluster/hosts/{id}/pause |
Stop scheduling updates on this host |
| Remove | DELETE /api/cluster/hosts/{id} |
Disconnect agent, delete from registry |
| Revoke | POST /api/cluster/hosts/{id}/revoke |
Add cert serial to CRL, disconnect, delete from registry |

Click Generate Token on the Cluster page to produce a one-time enrolment token, then run the displayed docker run command on the new host.


By default, the server TLS certificate includes localhost, 127.0.0.1, ::1, and all private IPs detected on host network interfaces. If agents connect via an address that is not in this set (e.g. a Tailscale IP, a DNS name, or a public IP), TLS verification will fail because the server address does not match any certificate SAN.
Set SENTINEL_CLUSTER_ADVERTISE to a comma-separated list of additional IPs or hostnames that should be included in the server certificate:
-e SENTINEL_CLUSTER_ADVERTISE="100.64.0.5,sentinel.example.com"The values are parsed at certificate generation time. IP addresses become IP SANs; hostnames become DNS SANs. If the server certificate already exists, changing this variable takes effect on the next certificate renewal or after deleting the existing certificate files from SENTINEL_CLUSTER_DIR.
This can also be configured at runtime via Settings > Cluster in the web UI (the advertise_addr field).
When multiple sources monitor the same Docker daemon (e.g. the local socket, a Portainer endpoint, and a cluster agent all pointing at the same host), Sentinel can detect the overlap and automatically prevent duplicate container entries.
Each Docker daemon has a unique Engine ID. Sentinel collects this ID from:
- The local Docker socket on startup (stored as
local_engine_idin the database). - Each cluster agent, which reports its Engine ID during heartbeats (stored in the host registry).
- Each Portainer endpoint, which exposes the Engine ID via the Portainer API.
When a cluster agent reports its Engine ID, the server compares it against all configured Portainer endpoints. If a Portainer endpoint has the same Engine ID as a connected agent, the endpoint is automatically flagged to prevent scanning the same containers twice. A source_overlap event is emitted via SSE to notify the dashboard.
This is fully automatic and requires no configuration. The deduplication check runs whenever an agent reports its Engine ID or a Portainer endpoint is added.
From the server dashboard, operators can manage containers on any connected agent host:
| Operation | Description |
|---|---|
| List containers | Real-time container list with state, image, ports |
| Update | Pull new image, stop, remove, recreate with same config |
| Stop / Start / Restart | Container lifecycle actions |
| View logs | Fetch the last N lines of container output (max 500) |
| Run hooks | Execute commands inside containers |
All operations use synchronous request/response over the bidirectional gRPC stream. The server registers a response channel before sending a command and blocks until the agent replies or a timeout occurs.
- Server sends
UpdateContainerRequestwith container name, target image, and optional digest. - Agent inspects the running container to capture its full configuration.
- Agent pulls the target image.
- Agent stops, removes, and recreates the container with the new image (preserving env vars, volumes, ports, networks).
- Agent starts the new container.
- Agent pushes a fresh container list followed by the
UpdateResult.
If the agent loses connectivity for longer than SENTINEL_GRACE_PERIOD_OFFLINE (default 30m), it enters autonomous mode:
- Monitors containers locally using its cached copy of the server's policies and settings.
- Does not attempt container updates (registry checks require the server).
- Journals all observed state changes to a JSON file on disk.
- Reconnects automatically when the server is reachable again, using exponential backoff (1s, 2s, 4s, ... capped at 30s).
On reconnection:
- Agent sends its full state report.
- Agent replays the offline journal to the server via the
OfflineJournalmessage. - Agent clears the local journal.
- Normal heartbeat/command loop resumes.
The agent caches policies and settings pushed by the server via PolicySync and SettingsSync messages. The cache is persisted to policy_cache.json in the agent's data directory so it survives agent restarts.
Policy resolution order (highest priority first):
- Container labels (
sentinel.policy) - Server-pushed per-container overrides
- Server-pushed default policy
- Hardcoded fallback:
manual
The server automatically updates agents running a different version. On each poll cycle, it compares its own version against the version reported in each agent's heartbeat. If they differ, the server:
- Finds the container with
sentinel.self=truelabel on the agent host. - Sends an
UpdateContainerRequestwith the server's version tag. - The agent pulls the new image, recreates its own container, and the new process reconnects.
Dev builds (version dev or empty) are skipped.
| Direction | Port | Protocol | Purpose |
|---|---|---|---|
| Agent to Server | 9443 (configurable) | TCP/TLS 1.3 | gRPC cluster communication |
| Server to Agent | None | Server does not initiate connections; agents connect outbound |
The gRPC connection is persistent (long-lived bidirectional stream). Agents reconnect automatically on any interruption. Firewalls must allow agents to reach the server on SENTINEL_CLUSTER_PORT.
The Cluster page in the web UI provides:
- Host list with connection status, last seen time, agent version, and container counts.
- Generate Enrollment Token button for adding new agents.
- Per-host actions: pause, remove, revoke.
- Host grouping in the dashboard, with containers visually grouped by their host.
- Connection status indicators: connected (green), disconnected with timestamp and disconnect reason (network, cert, server).
Agents advertise their capabilities during heartbeat. Current feature set:
| Feature | Description |
|---|---|
update |
Container update lifecycle (pull, stop, recreate, start) |
hooks |
Execute commands inside containers |
pull |
Pre-pull images |
list |
List containers |
logs |
Fetch container logs |
| Variable | Default | Description |
|---|---|---|
SENTINEL_CLUSTER |
false |
Enable the gRPC cluster listener |
SENTINEL_CLUSTER_PORT |
9443 |
gRPC listen port |
SENTINEL_CLUSTER_DIR |
/data/cluster |
CA, certificates, and HMAC key storage |
SENTINEL_CLUSTER_ADVERTISE |
(empty) | Extra IPs or hostnames added to the server TLS certificate as Subject Alternative Names (comma-separated) |
| Variable | Default | Description |
|---|---|---|
SENTINEL_MODE |
(auto) | Set to agent for agent-only mode |
SENTINEL_SERVER_ADDR |
Server host:port for gRPC connection |
|
SENTINEL_ENROLL_TOKEN |
One-time enrollment token (consumed on first start) | |
SENTINEL_HOST_NAME |
Human-readable agent name displayed in the dashboard | |
SENTINEL_GRACE_PERIOD_OFFLINE |
30m |
Time offline before autonomous mode activates |
SENTINEL_CLUSTER_DIR |
/data/cluster |
Local certificate and state storage |
| Symptom | Cause | Fix |
|---|---|---|
connection refused |
Server not listening or wrong port | Verify SENTINEL_CLUSTER=true on server; check SENTINEL_CLUSTER_PORT matches |
certificate has been revoked |
Agent cert was revoked via the UI | Re-enroll with a new token |
host not registered |
Agent's host ID not in server registry (data loss or manual deletion) | Re-enroll with a new token and fresh data volume |
transport is closing / repeated reconnects |
Network instability | Check firewalls, MTU, and connectivity between agent and server |
TLS handshake failure |
Clock skew or expired certificate | Verify system clocks are synchronised (NTP); check cert validity dates |
The server marks an agent as disconnected when its gRPC stream ends. Disconnect reasons are classified:
| Category | Meaning |
|---|---|
network |
Connection lost (EOF, timeout, reset, transport closed) |
cert |
Permission denied or host not registered (certificate issue) |
server |
Stream cancelled by server (e.g. host revoked or replaced) |
If an agent reconnects while the server still has an old stream open (e.g. after a network partition), the server automatically cancels the old stream and registers the new one. A "replaced stale stream" log entry is emitted.
| Problem | Fix |
|---|---|
| Token expired | Generate a new token from the Cluster page |
| Token already used | Each token is single-use; generate a new one |
| Token too short | Tokens are 64 hex characters; verify the full token was copied |
If the server's CA was regenerated (e.g. the cluster data volume was deleted, or the server was migrated to a new host), agents that still have the old CA certificate cached locally will fail to connect. The agent logs this once:
server CA mismatch -- the server's TLS certificate has changed since this agent enrolled
The log entry includes a fix field with the resolution steps, and a data_dir field showing where the agent stores its certificate data.
Resolution:
- Stop the agent container.
- Delete the agent's cluster data directory (default:
/data/clusterinside the container, or the path set bySENTINEL_CLUSTER_DIR). - Generate a new enrolment token on the server via the Cluster page.
- Restart the agent with
SENTINEL_ENROLL_TOKENset to the new token.
The agent will re-enrol with the server's new CA and receive a fresh certificate. This message is logged once per agent lifecycle to avoid log spam.
If the server and agent are running different versions, the server automatically triggers an update. Check server logs for "agent version mismatch" entries. The agent container must have the sentinel.self=true label for auto-update to locate it.
Getting Started
Using Sentinel
Multi-Host
Security
Reference