Skip to content

vm_manager response in degraded cluster #81

@eroussy

Description

@eroussy

Edit after initial posting:
The problem was not correctly identified. The issue now refers to

  • Add a proper timeout on ceph commands (currently the command never ends if ceph doesn't respond)
  • Add a proper error message when the qorum is not formed
  • Make vm_manager create command work even when one node is in standby or maintenance mode (the rest of the commands works from what I see)

Original issue:

Describe the bug
When the cluster is in a degraded state (one of the hypervisors is powered off), no vm_manager command responds (start, stop, list ...)

To Reproduce

  • Create a SEAPATH cluster (no matter the distribution) and configure it
  • Deploy a VM in it
  • Shut down one hypervisor
  • Launch a vm-mgr list command on one of the other hypervisors
  • The command never responds

Expected behavior
When the quorum is formed (at least two machines are up and connected), vm_manager should be able to respond correctly, even to create and deploy VMs

First investigations
It seems that vm_manager was developed with only a fully running cluster in mind. There is no mention of the word "quorum" in the code.
vm_manager should be able to detect if the quorum is formed on the current machine

  • If it is: schedule the command correctly
  • If it is not: fail with a meaningful message

IMO, we should even be able to revert this PR seapath/ansible#870, as vm_manager should be able to deploy a VM even when one physical machine is out of the quorum.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions