Skip to content

Inclusion of monit_wrapper::default stops executing rest of the recipe if a monit service cannot be started #21

@amalakar

Description

@amalakar

Hi,

I have noticed that whenever spark recipe has a bug in it (like incorrect port number etc), future runs of the recipe with the fix fails because it won't go beyond the monit_wrapper::default which checks if the service is running. This causes the fix not to be applied, which again fails in the next run. Causing a chicken and egg problem. I had to manually delete (rm /etc/monit/conf.d/spark-standalone-worker.conf) to make the chef recipe run.

recipe: monit_wrapper::default

  • chef_gem[waitutil] action install (up to date)

    Recipe Compile Error in /var/chef/cache/cookbooks/analytics-spark-deploy/recipes/query-spark-worker-next-staging.rb

    RuntimeError

    Timed out waiting to get the status of spark-standalone-worker (currently "Does not exist")

    Cookbook Trace:

    /var/chef/cache/cookbooks/monit_wrapper/libraries/status.rb:65:in `get_stable_monit_service_status'
    /var/chef/cache/cookbooks/monit_wrapper/libraries/status.rb:87:in `monit_service_running?'
    /var/chef/cache/cookbooks/monit_wrapper/libraries/status.rb:101:in `monit_service_exists_and_running?'
    /var/chef/cache/cookbooks/apache_spark/recipes/spark-standalone-worker.rb:104:in `block in from_file'
    /var/chef/cache/cookbooks/apache_spark/recipes/spark-standalone-worker.rb:95:in `from_file'
    /var/chef/cache/cookbooks/ooyala-apache-spark/recipes/spark-worker.rb:2:in `from_file'
    /var/chef/cache/cookbooks/analytics-spark-deploy/recipes/query-spark-worker-next-staging.rb:4:in `from_file'

Relevant File Content:


  /var/chef/cache/cookbooks/monit_wrapper/libraries/status.rb:

   58:        def get_stable_monit_service_status(service_name)
   59:          start_time = Time.now
   60:          timeout_sec = 120
   61:          logged_message = false
   62:          status = get_monit_summary[service_name]
   63:          until monit_status_stable?(status)
   64:            if Time.now - start_time >= timeout_sec
   65>>             raise "Timed out waiting to get the status of #{service_name} " +
   66:                    "(currently #{status.inspect})"
   67:            end
   68:            unless logged_message
   69:              Chef::Log.info('Waiting for Monit to initialize the status of service ' +
   70:                             "#{service_name} for up to #{timeout_sec} seconds")
   71:              logged_message = true
   72:            end
   73:            sleep(1)
   74:            status = get_monit_summary[service_name]`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions