We have seen several instances where:
- All pods on a node are marked as "Terminating"
- The underlying node reports that the Kubelet has stopped reporting
- EC2 healthchecks for the node are passing
We would like to deploy a mechanism to automatically detect and terminate these instances rather than doing it manually.
https://mission-control.thoughtbot.com/branch/main/aws/book/debug/cluster-errors.html#kubelet-stopped-posting-node-status