set maintenance mode or stop corosync/pacemaker on update#563
set maintenance mode or stop corosync/pacemaker on update#563Itxaka wants to merge 1 commit intocrowbar:masterfrom Itxaka:make_update_aware_of_ha
Conversation
| block do | ||
| %x{#{command}} | ||
| node.run_state['found_ha_packages'] = $?.exitstatus | ||
| Chef::Log.debug("Run #{command}, got exit status #{node.run_state['found_ha_packages']}.") |
There was a problem hiding this comment.
Style/StringLiteralsInInterpolation: Prefer double-quoted strings inside interpolations. (https://github.com/SUSE/style-guides/blob/master/Ruby.md#stylestringliteralsininterpolation)
| end # block | ||
| end # ruby_block | ||
|
|
||
| %w(corosync pacemaker).each do |s| |
There was a problem hiding this comment.
Style/WordArray: Use [] for an array of words. (https://github.com/SUSE/style-guides/blob/master/Ruby.md#stylewordarray)
| only_if do | ||
| ha = node.run_state["found_ha_packages"] | ||
| is_cluster = !search(:node, "roles:pacemaker-cluster-member").empty? | ||
| return !ha && is_cluster && node.run_state["needs_update"] |
|
|
||
| service "pacemaker" do | ||
| action :start | ||
| end |
|
needs rebase + adam patch + some changes around |
| ::Chef::Recipe.send(:include, CrowbarPacemaker::MaintenanceModeHelpers) | ||
| ::Chef::Resource.send(:include, CrowbarPacemaker::MaintenanceModeHelpers) | ||
|
|
||
|
|
There was a problem hiding this comment.
Style/EmptyLines: Extra blank line detected.
| ::Chef::Recipe.send(:include, CrowbarPacemaker::MaintenanceModeHelpers) | ||
| ::Chef::Resource.send(:include, CrowbarPacemaker::MaintenanceModeHelpers) | ||
|
|
||
|
|
There was a problem hiding this comment.
Style/EmptyLines: Extra blank line detected.
|
@aspiers ready for re-review. Spent most of the day testing the different scenarios and it seems to work perfectly. On a node part of a cluster, non-remote: Sadly, there is some code duplication in regards to https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/libraries/maintenance_mode_helpers.rb#L60 This is because the helper on crowbar-pacemaker is already using an So instead we create several resources that get triggered in the case that maintenance mode is required due to the late evaluation of the variables. I could not find a better way of doing it while respecting the chef workflow. |
| service s do | ||
| action :stop | ||
| only_if { node.run_state["found_ha_packages"] } | ||
| not_if { node[:pacemaker][:is_remote] } |
There was a problem hiding this comment.
I think you should make sure that node[:pacemaker] exists; If I'm not missing sometthing, this could could be executed at non-pacemaker nodes as well
|
Missing exceptions for non-cluster nodes in some of the resources! |
| true | ||
| end | ||
| only_if do | ||
| ha = node.run_state["found_ha_packages"] |
There was a problem hiding this comment.
ha seems bit confusing, could you rename this variable? (Or just not create it at all and directly do a check for run_state at line 104)
| end | ||
| only_if do | ||
| ha = node.run_state["found_ha_packages"] | ||
| is_cluster = !search(:node, "roles:pacemaker-cluster-member").empty? |
There was a problem hiding this comment.
wtf, this is wrong. I wanted to check if the node is part of a cluster, this will just search for any nodes that have that role ¬_¬
There was a problem hiding this comment.
Oh, true. Look also at the same search used earlier.
| # limitations under the License. | ||
| # | ||
|
|
||
| ::Chef::Recipe.send(:include, CrowbarPacemaker::MaintenanceModeHelpers) |
There was a problem hiding this comment.
Just curious - how is this different from ::Chef::Recipe.include CrowbarPacemaker::MaintenanceModeHelpers ?
There was a problem hiding this comment.
Chef-ism. http://lists.opscode.com/sympa/arc/chef/2011-05/msg00286.html
Actually, I think that due the restructuring of the resources in my last update this is not required, I'll retry with the usual include to see
There was a problem hiding this comment.
Not needed anymore, reverting to the usual include
|
@Itxaka I'm still a bit confused how this is supposed to handle the remote case? |
|
@aspiers Lets see, for remote case, there should be no pacemaker/corosync services runnig so it wont stop them. The question is, for remote nodes, when upgrading them, do we need to set the node into maintenance mode as well? The HA guide does not mention remote nodes so Im a bit lost in there and what to do with them. |
… nodes (bsc#983617) Make the update barclamp aware of HA nodes and deal with them properly. When HA packages need to be updated, stop the HA services before the update and start them again after the package have been updated. When normal packages are updated, set the node in maintenance mode as mentioned on the HA guide. Also do not stop or set maintenance mode if the node is a remote_node. Updates on normal nodes should not be affected by this changes.
| end | ||
| end | ||
|
|
||
| ["corosync", "pacemaker"].each do |s| |
There was a problem hiding this comment.
The HA docs only say to stop pacemaker, why are both here? Could you add a comment?
| command += '|egrep -q "corosync|pacemaker"' | ||
| system("zypper #{command}") | ||
| # exit 0: found, 1 not found | ||
| node.run_state["found_ha_packages"] = $?.exitstatus ? true : false |
There was a problem hiding this comment.
I disagree with hound here, I don't think $? is a cryptic perlism.
But I think the logic is wrong - $?.exitstatus produces the numeric exit code, and grep will return 1 if no matches were found, so 1 ? true : false results in true, so the pacemaker services get stopped even when it's not necessary.
Also, if needs_update ends up being false you could probably skip the extra zypper call.
| true | ||
| end | ||
| only_if do | ||
| is_cluster = node.role? "pacemaker-cluster-member" |
There was a problem hiding this comment.
With crowbar/crowbar-ha#187 this could probably include pacemaker-remote roles too, yes?
| # HA packages are NOT gonna be updated | ||
| # And Node is part of a cluster | ||
| # And there is packages to update | ||
| execute "crm --wait node maintenance" do |
There was a problem hiding this comment.
Could add the remote node name following what crowbar/crowbar-ha#187 did
|
Revisiting this due to Bug 1061834 – installing updates causes HA failures. @Itxaka Any chance you could amend the commit message to contain the full bsc URL? Thanks! |
|
Ummm, seems that the branch for this PR migth have been removed somehow, so I would need to create a different PR with that branch covered somehow I guess? |
|
doesnt seem to be needed feel free to restore if so |
As per all the docs by pacemaker, the services should be stopped for a software update of the cluster stack. If other packages are being updated but not the cluster stack, then it's enough to set maintenance mode.
FIXME: however if the update is happening on a remote node, then the
crmcommand is unavailable so we need to set maintenance mode another way. UPDATE: something like crowbar/crowbar-ha#187 might fix this.Depends on
crowbar/crowbar-ha#183