-
Notifications
You must be signed in to change notification settings - Fork 462
Description
Description
Because there is no agreed-upon way to signal operators that a node is drained, there are multiple ways that operators handle it.
Rook detects node drain by observing pods on the node. This works fine but feels a bit fragile.
The problem is that some operators (e.g. the Zalando PostgreSQL Operator) "detect" drains by watching node's labels. Whenever a label is not set anymore (e.g. "node-ready=true") it will (try to) failover to another DB pod on another node.
This is a feature request to update node's labels when a reboot is about to happen.
Steps to reproduce the issue:
- update some machineconfig,
- observe machine-config-daemon trying to drain a node,
- failing to drain the node because there is a pdb on a pod on that node,
meanwhile
4. some operator not knowing that the machine is about to be rebooted and not updating the pdb (directly or indirectly.)
- the node not getting drained.
Describe the results you expected:
- update some machineconfig,
- machine-config-daemon updating label
machineconfiguration.openshift.io/pending-restart=falseto=true,
3a. an operator removes active workload from the node, removing/updating pdbs that affect the node,
3b. machine-config-daemon drains the node, - node reboots successful,
- machine-config-daemon sets label
machineconfiguration.openshift.io/pending-restart=false.