Skip to content

Conversation

Copy link

Copilot AI commented Nov 6, 2025

During Kubernetes helm chart removal, unordered pod termination causes Dali to shutdown before clients, leaving them stalled attempting reconnection until forcibly terminated by k8s grace period.

Changes

Core implementation (dali/base/dasess.cpp):

  • Added waitForClientsToDisconnect(unsigned timeoutMs) to CCovenSessionManager
    • Polls connected client count every second using iteration-based timeout (avoids msTick() wraparound)
    • Logs progress every 5 seconds with remaining client count
    • Lists remaining clients if timeout reached
  • Modified CDaliSessionServer::suspend() to wait during shutdown initiation
    • Reads shutdownGracePeriod config (default: 60s, max: 1 day)
    • Caps value with overflow protection (86400s * 1000 = 86.4M ms, well within uint32 range)

Configuration:

dali:
  - name: mydali
    shutdownGracePeriod: 120  # seconds, 0 to disable

Behavior:

  1. Dali receives SIGTERM from Kubernetes
  2. suspend() waits for client count to reach zero or timeout
  3. stop() proceeds with normal shutdown sequence

Thread safety: Uses existing CHECKEDCRITICALBLOCK(sessmanagersect) for client lookup access.

Documentation: Added dali/GRACEFUL_SHUTDOWN.md with configuration examples, behavior details, and Kubernetes integration notes.

Original prompt

Look at Dali and the code in dali/base.
At the moment, when a HPCC helm chart is removed the unordered removal of deployments cause Dali to be notified to shutdown (which it does), leaving other clients dangling, often they restart and then attempt to try to re-connect to Dali, which causes them to stall for ages, until the k8s grace time terminates them.

When Dali begins to shutdown, it should 1st wait for all connected clients to disconnect, before it goes ahead and shutsdown itself (with a configurable maximum time limit).


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 6 commits November 6, 2025 15:50
Co-authored-by: jakesmith <902700+jakesmith@users.noreply.github.com>
Co-authored-by: jakesmith <902700+jakesmith@users.noreply.github.com>
Co-authored-by: jakesmith <902700+jakesmith@users.noreply.github.com>
Co-authored-by: jakesmith <902700+jakesmith@users.noreply.github.com>
Co-authored-by: jakesmith <902700+jakesmith@users.noreply.github.com>
Co-authored-by: jakesmith <902700+jakesmith@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix Dali shutdown sequence for client disconnections Add graceful shutdown to Dali with configurable client disconnect wait Nov 6, 2025
Copilot AI requested a review from jakesmith November 6, 2025 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants