Skip to content

v2.12.0: Multi-instance Portainer, self-update, and bug fixes#75

Merged
Will-Luck merged 67 commits intomainfrom
dev
Mar 13, 2026
Merged

v2.12.0: Multi-instance Portainer, self-update, and bug fixes#75
Will-Luck merged 67 commits intomainfrom
dev

Conversation

@Will-Luck
Copy link
Copy Markdown
Owner

Summary

  • Multi-instance Portainer support: Connect multiple Portainer instances with per-endpoint toggles, containers appear as dashboard host groups, BoltDB migration for instance storage
  • Portainer self-update: Via portainer-updater helper container
  • NPM resolver hardening: Auto-detects local IPs to prevent cross-host port shadowing, skips wildcard domains
  • 10+ bug fixes: History page scan summaries, failed approval recording, images page alignment, filter bar borders, container detail for remote containers, scoped key lookups, and more

What changed

  • 40 commits since v2.11.1
  • New BoltDB bucket: portainer_instances with CRUD + migration from legacy single-instance settings
  • Multi-Portainer scanning with per-endpoint filtering and local socket auto-blocking
  • Portainer connector UI (settings page) with endpoint toggles
  • Smart local socket detection (IsLocalSocket) to prevent scanning the host Docker twice
  • NPM resolver improvements (local IP detection, wildcard skip)
  • Portainer self-update feature via helper container
  • Images page: column alignment, red unused badge
  • History page: scan summary row fix, failed approval recording
  • Filter bar bottom border

Test plan

  • Bug hunt on the diff
  • Build and deploy to test environment
  • Verify multi-Portainer connector UI
  • Verify Portainer self-update flow
  • Verify NPM URL resolution with multiple hosts
  • Verify history and images page fixes
  • Run full test suite

🤖 Generated with Claude Code

web-flow added 30 commits March 10, 2026 23:17
When accessing the dashboard via NPM domain (sen.lucknet.uk), the HTTP
Host header was used for NPM ForwardHost matching. Since NPM stores IPs
not domains, no proxy hosts matched and every port fell back to
sen.lucknet.uk:<port>.

For local containers, use Lookup() (matches against the resolver's
configured sentinelHost IP) instead of LookupForHost() with the request
domain. Cluster containers still use LookupForHost() with the remote
host's IP.
NPM proxy hosts with wildcard domains like *.s3.garage.example.com
produced broken URLs. The resolver now picks the first non-wildcard
domain from the list, falling back to skipping the entry entirely
if all domains are wildcards.
The filter bar on history, logs, and images pages was missing the bottom
border that the dashboard had. Added explicit border-bottom to .filter-bar.
… queue entries

Portainer settings now take effect immediately without a container restart,
using the same factory pattern as the NPM connector. The connection test
always recreates the provider from current DB settings so token changes are
picked up. Portainer endpoints that point at the same Docker socket Sentinel
monitors no longer produce duplicate queue entries (container IDs are
compared against the local scan). Updated help text and integration
descriptions for accuracy.
The dashboard stat card used a local-only pending count that excluded
Portainer queue items, while the nav badge used the full queue length.
Both now use the queue length directly so they always match. Removed
the checkmark icon from the zero-state as it added no value.
The detail page handler only knew about local and cluster containers.
Portainer containers (host=portainer:N) fell through to the cluster
lookup which returned "not found". Added a Portainer branch that
extracts the endpoint ID, fetches containers from the Portainer API,
and builds the detail view with policy, version, and queue info.
Portainer and cluster container detail pages looked up history and
snapshots using the bare container name, but records are stored under
scoped keys (e.g. "portainer:3::name"). Now uses hostFilter::name
when a host filter is present.
Covers data model, local socket detection, connector UI,
dashboard integration, engine changes, and migration path.
13 tasks across 7 chunks: store CRUD, migration, engine multi-instance
scanning, local socket detection, web API, dashboard host groups, and
frontend connector cards.
Adds MigratePortainerSettings() which converts the flat portainer_url/
portainer_token/portainer_enabled settings keys into a PortainerInstance
record (id "p1", name "Portainer") and clears the old keys. Safe to call
multiple times: skips if any instances already exist. Also adds
DeleteSetting() to bolt.go.
Replace single-instance Portainer settings (flat portainer_enabled/url/token
keys) with a full instance CRUD API backed by PortainerInstanceStore. The
PortainerProvider interface now takes instanceID parameters on all methods.

New routes: GET/POST /api/portainer/instances, PUT/DELETE instances/{id},
POST instances/{id}/test, GET instances/{id}/endpoints,
PUT instances/{id}/endpoints/{epid}.

Existing container detail handler updated to parse the new
"portainer:instanceID:epID" host filter format while remaining backwards
compatible with the legacy "portainer:epID" format.

Note: cmd/sentinel/ adapters will not compile until Task 7 updates them.
Three bugs found during live testing on the test cluster:

1. Portainer instances added via the API had no live scanner (only
   boot-time instances worked). Added ConnectInstance/DisconnectInstance
   to PortainerProvider, called from create/update/delete handlers.

2. Portainer self-signed certs caused TLS verification failures. Added
   InsecureSkipVerify to the Portainer HTTP client (standard for homelab
   and private network setups).

3. csrfToken is a function reference (window.csrfToken = getCSRFToken)
   but connectors.html passed it as a value. Changed all 11 occurrences
   to csrfToken() calls.
IsLocalSocket() was defined but never called. Now applied in two places:

1. Scanner.Endpoints() filters out local socket endpoints so the engine
   never scans them (defence in depth).

2. Test Connection handler auto-marks new local socket endpoints as
   blocked with reason "local Docker socket (duplicates direct
   monitoring)" so users see why the endpoint is disabled.

Updated scanner tests to use TCP URLs for mock endpoints (empty URL +
EndpointDocker type now correctly triggers IsLocalSocket).
Only auto-block unix:// endpoints when the Portainer instance runs on the
same host as Sentinel. Previously all unix:// endpoints were blocked
regardless of host, which incorrectly disabled remote Portainer instances.

- Add isLocalPortainerInstance() to compare Portainer URL against local IPs
- Remove over-aggressive IsLocalSocket filter from Scanner.Endpoints()
- Wire engine into multiPortainerAdapter so runtime-added instances are
  scanned without restart
- Reconnect engine after endpoint config changes (test, update)
- Add unit tests for local detection helpers
web-flow added 29 commits March 12, 2026 01:29
DetectLocalAddrs was including container-internal addresses (172.17.x.x,
localhost, hostname) which never match NPM ForwardHost values. This caused
Lookup() to silently filter out all proxies when SENTINEL_HOST was not set,
making port chips fall back to raw IP:port links.

Now only includes routable addresses: explicit SENTINEL_HOST values and
Docker host IP via host.docker.internal. Returns an empty set when neither
is available, which disables filtering (safe fallback matching all proxies).
…t shadowing

When hostAddr from the HTTP request is a valid IP (direct IP access or
SENTINEL_HOST), use LookupForHost to match only NPM proxies forwarding
to that specific host. Falls back to Lookup() when accessed via domain.

Fixes regression from 1c18dec where empty localAddrs disabled all
filtering, allowing port 8080 on host A to shadow port 8080 on host B.
…ction

Two bugs fixed:
- API control/queue handlers routed Portainer hostIDs to cluster branch
  (hostID != "" was too broad). Added portainer: prefix exclusion to all
  7 guards in api_control.go and reordered api_queue.go routing.
- Portainer containers passed empty digest to CheckVersionedWithDigest,
  causing digestsMatch("", remoteDigest) to always report false updates.
  Now fetches real repo digests via Portainer image inspect API.
PullImage now properly drains the Docker streaming response to ensure
the image pull completes before container creation. Previously the
response body was closed without reading, causing create to fail with
"no such image" because the pull hadn't finished.

Also adds success history recording for remote updates (Portainer,
cluster agent, swarm). Previously only failures were recorded; the
success path only existed in the local UpdateContainer function.

Debug logging retained in Portainer scan path for ongoing diagnostics
(semverScope, digest, isLocal, up-to-date status).
Inside Docker, the gRPC server only sees container bridge IPs in its
network interfaces. Agents connecting via the host's external IP fail
TLS verification because that IP isn't in the server certificate SANs.

New SENTINEL_CLUSTER_ADVERTISE env var (or cluster_advertise DB setting)
accepts comma-separated IPs/hostnames to include as additional SANs in
the ephemeral server certificate. Also exposed in the cluster settings
API for UI configuration.
Detect when the same Docker host is reachable via multiple sources
(local socket, cluster agent, Portainer connector) and auto-block
the lower-priority source. Priority: local > cluster > Portainer.

- Proto: add engine_id field to StateReport (field 6)
- Agent: collect Engine ID on startup, include in state reports
- Hub: store local Engine ID as DB setting on boot
- Registry: persist agent Engine IDs in host state
- Portainer: probe endpoint Engine ID via Docker info API
- Web: findEngineOverlap checks local/cluster before Portainer
- Auto-block overlapping endpoints on Test Connection
- Connectors UI: show overlap reason + Force Enable button
- SSE source_overlap event for real-time dashboard notifications
- ForceAllow user override clears auto-block per endpoint
Two CSS issues when cluster hosts are present:
- tbody tr:last-child removed border-bottom from last row of each
  host-group, leaving no separator between groups
- section-divider border-top was on inner div (inset by td padding)
  instead of on the td itself
Swarm service rows (svc-header, svc-task-row) had an extra
<td class="col-actions"> that regular container rows and the
thead did not have. With table-layout:fixed, this created a
phantom column that stole ~300px of width, pushing the entire
table layout left and preventing dividers from spanning the
full UI width.

Closes #62
When approving updates from the Pending Updates page and navigating
to the Dashboard, containers could show "Updating" indefinitely.
The update completed and cleared the maintenance flag, but the SSE
event was published before the Dashboard's EventSource connection
was established -- a missed-event race between server-side page
render and client-side SSE subscribe.

Added a catch-up fetch in the SSE connected handler: on initial
connect, the Dashboard scans for any rows with .badge-updating and
re-fetches their current state from /api/containers/{name}/row,
picking up the cleared maintenance flag.
Agent side: detect TLS certificate errors (x509 unknown authority)
in the reconnection loop and log a clear ERROR message once with
the fix steps (stop agent, delete cluster data dir, re-enroll with
fresh token). Subsequent reconnect attempts log only the WARN
without repeating the guidance.

Cluster page: show troubleshooting section for ALL disconnected
hosts, not just those with a known disconnect category. When the
category is empty (e.g. host loaded from store after restart with
no prior disconnect event), show a generic section covering the
three common causes: agent not running, network issue, and CA
certificate mismatch after volume recreation.
LoadSetting returns ("", nil) for missing keys, which the old code
interpreted as show_stopped=false. Add a v != "" guard so a fresh
database preserves the default-true behaviour.

Closes #63
JS-created task rows had 7 cells (extra actions column) vs the 6-column
table, pushing status/ports right. Also missing col-status/col-policy
classes so centering rules didn't apply. Match the HTML template's
6-cell structure and add the correct column classes.

Closes #64
The previous commit (fa32a2a) fixed the source JS but the bundled
app.js was stale. Ran make frontend to produce the correct bundle
with col-* classes on shutdown task rows.
The "Service scaled to 0" placeholder had colspan="6" but starts at
column 2 (after the checkbox cell), making the browser allocate 7
columns in a 6-column table. With table-layout:fixed this caused
the table to shrink when any swarm service was stopped.

Changed to colspan="5" so the total (1 + 5) matches the 6-column
colgroup. The JS path in swarm.js already had the correct value.
)

When a service is scaled to 0, Docker removes all tasks within seconds.
Added an in-memory task cache to swarmAdapter that preserves last-seen
running tasks per service and serves them as "shutdown" when Docker
returns none. This ensures task rows with node names and SHUTDOWN badges
survive full page refreshes instead of showing a generic placeholder.
…check

- Scan() accessed u.portainerInstances without portainerMu lock at two
  sites (prune loop and len guard) while HTTP handlers mutate the slice
  concurrently. Snapshot under RLock, matching scanPortainerInstances.

- SavePortainerInstance and convertStoreInstance dropped EngineID and
  ForceAllow when converting between web and store types. ForceAllow
  loss caused manually unblocked endpoints to get re-blocked on
  reconnect.

- Replace zero-value struct comparison with map ok idiom for endpoint
  existence check.
@Will-Luck Will-Luck merged commit 4182fb5 into main Mar 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants