Skip to content

Conversation

@renereimann
Copy link
Contributor

This dragonfly script is intendet to be an alarm system for your setup. There are two different things that the script checks.

  1. it checks if all your docker containers are running and do not have an error
  2. you can do specific checks on endpoints to check for critical behaviour
    If a problem is detected, the script will send a message to slack to warn you.
    The script is run in a docker container and an example docker-compose.yaml file is added to see how to setup the docker container.
    The script is configured via a yaml config file. The syntax is similar to our normal dripline config files. An example yaml config file is included in the pull request. The system works fine in the Mainz setup.

…t service. Till now we added the ability to send messages to slack using a slack_hook. Next we will do some diagnostics on docker containers itself.
…onfly stand alone script. That has several advantages, one of it being that we do not depend on rabbit broker, and thus the checks still work if rabbit broker went down.
…mented by Paul K. but not yet rolled out. We should update this once the fix is rolled out.
…ong in your setup. There are two different types of checks: 1. we check if all docker containers are running and not having errors, 2. we check some endpoints and see if they fullfill some condition. If there are problems we send a message to slack. The script is configured by a yaml file. The example yaml file is included as well as a docker-compose file that gives an example how to setup that script with docker compose. Testing this script worked well.
…bles it to prevent others posting to your workspace, this is a place holder and just for demonstration purpose
…you do not need to bind it from external, just the config file is needed
@renereimann renereimann requested a review from nsoblath April 9, 2025 13:02
@wcpettus
Copy link

wcpettus commented May 9, 2025

Working from high-level comments to lower:

  • This mixes two functionalities, one of which (docker container checking) makes a lot of sense to exist outside of the normal dripline ecosystem, the other less so. Maybe unifying the watchdog functionality in one place is the right strategy, but more discussion on this point might be useful.
    • As an example in DL2 there was a sensor_monitor which could check endpoint values for anomalous readings. This was more passive in that it watched the alerts exchange (and so wasn't actively querying). But with heartbeats and the docker watchdog the active query maybe isn't necessary for endpoints?
  • The dripline authentication should probably be managed by scarab and draw from the authentications file, and not require specifying again here in the config file
  • Do the changes to dripline:service send impact the implementation here (@nsoblath)?
  • Robert was working on a slack relay service that lives within dripline. Without touching the value of having the watchdog exist totally distinct from the rest of the mesh, down the road we might want to unify the slack connection pieces.
    • Do you have thoughts on the relative merit of webhook (used here) vs token (used by Robert) for authentication/connection?

…evice or rabbit broker is down, we want this script to not crash at all
… presure gauge values. There were no type conversion for numbers, the method was fixed / hard coded to 'not_equal' and the error message came not through correctly. We fixed it by adding type conversion based on the value type itself, using the method provided in the config file and getting the error messages through.
@renereimann
Copy link
Contributor Author

Thanks a lot Walter,

  • I agree that one can separate the endpoint checking from the docker container checking. I do it here for my on simplicity but I definitively can make a seperate script for both of them and run them in separate containers.
  • I recently saw that SlowDash should also get a end-point checking feature so some coordination here would be nice.
  • About token vs webhook: a webhook is bound to a specific channel and can only post messages. The verification is part of the URL. With a token you can use the chat.postMessage() function and you can post to several channels and you can respond to user input. So you would need that if you have some interactive multi-channel application / bot. Regarding security I think since webhook only has very limited permissions it is fine to use but I do not have more details on that.
  • The changes to dripline:service send does not impact this application (as far as I can tell) since its running fine before and after the fix
  • I guess there are smarter ways for the authentification however I do not yet understand scarabs purpose and abilities (in terms of authentication and also more in general).

@renereimann renereimann merged commit d5d2b49 into develop Jul 17, 2025
3 checks passed
@renereimann renereimann deleted the feature/watchdog branch July 17, 2025 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants