-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Edit after initial posting:
The problem was not correctly identified. The issue now refers to
- Add a proper timeout on ceph commands (currently the command never ends if ceph doesn't respond)
- Add a proper error message when the qorum is not formed
- Make vm_manager create command work even when one node is in standby or maintenance mode (the rest of the commands works from what I see)
Original issue:
Describe the bug
When the cluster is in a degraded state (one of the hypervisors is powered off), no vm_manager command responds (start, stop, list ...)
To Reproduce
- Create a SEAPATH cluster (no matter the distribution) and configure it
- Deploy a VM in it
- Shut down one hypervisor
- Launch a
vm-mgr listcommand on one of the other hypervisors - The command never responds
Expected behavior
When the quorum is formed (at least two machines are up and connected), vm_manager should be able to respond correctly, even to create and deploy VMs
First investigations
It seems that vm_manager was developed with only a fully running cluster in mind. There is no mention of the word "quorum" in the code.
vm_manager should be able to detect if the quorum is formed on the current machine
- If it is: schedule the command correctly
- If it is not: fail with a meaningful message
IMO, we should even be able to revert this PR seapath/ansible#870, as vm_manager should be able to deploy a VM even when one physical machine is out of the quorum.