-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Feature epic details
- For the title of this issue, type: Documentation, Development epic name
- Link to development epic: Make mpHealth-4.0 work with EE7 and later apps open-liberty#30056
- Target GA release:
Operating systems
Does the documentation apply to all operating systems?
- Yes
- No; specify operating systems: ______
Summary
Provide a concise summary of your feature. What is the update, why does it matter, and to whom? What do 80% of target users need to know to be most easily productive using your runtime update?
The mpHealth-4.0 feature introduces a new file-based health check mechanism. This works in parallel to the /health, /health/started, /health/live and /health/ready REST endpoints. This is intended to be used with the exec configuration for health probes on Kubernetes (executed a command in the pod/container) instead of using the httpGet configuration.
The file-based health-check mechanism creates the started, live and ready files in the /health directory located in the servers "output" directory. This local system (file-based) mechanism provides an alternative to the REST based (network-based) probe configuration on Kubernetes. Paired with the accompanying scripts for the exec command in the Open Liberty container image, users can bypass any network issues that cause false DOWN statuses and have a quicker duration for establishing the UP status of the pod/image.
Configuration
List any new or changed properties, parameters, elements, attributes, etc. Include default values and configuration examples where relevant:
The file-based health-check mechanism is enabled when the checkInterval attribute is configured for the mpHealth element.
For example:
<mpHealth checkInterval="5s" />
- This config only accepts numerical values with an optional time unit of either
sorms - No unit type defaults to
s - There is no default value out of the box as this value is needed to be configured to enable functionality
- Invalid values will default to 10s.
- 0s, 0ms , 0 or an empty value will indicate that the functionality is disabled.
- There is an environment variable configuration as well :
MP_HEALTH_CHECK_INTERVAL. - If both are configured, the server.xml config takes precedence.
There is an additional optional configuration startupCheckInterval with an associated env var MP_HEALTH_STARTUP_CHECK_INTERVAL.
- This config only accepts numerical values with an optional time unit of either
sorms - No unit type defaults to
ms - The default is 100ms
- Invalid values, 0s, 0ms , 0 or an empty value will default to 100ms.
Updates to existing topics
To update existing topics, specify a link to the topics that are affected. Include a copy of the current text and the exact text to which it will change. For example: Change ABC to XYZ
Add new section :
For more information, see [Adding health reports to microservices](https://openliberty.io/guides/microprofile-health.html).
<here>
The @Liveness annotation
The new text: (This should only be present in 25.0.0.6 docs and up!)
The Micro Profile Health v4.0 runtime in Open Liberty that is enabled through the mpHealth-4.0 feature provides an alternative file-based health check mechanism. When enabled, this mechanism works along-side the REST based endpoints. The file-based mechanism will create three files that correspond to the health check types at the following locations:
- <server.output.dir>/health/ready
- <server.output.dir>/health/live
- <server.output.dir>/health/started
These files are created when all health check statuses return an UP statuses (i.e, startup returns UP, liveness returns UP and readiness returns UP) and will be deleted upon server shut down along with the /health directory. After the files are created, at a user configured interval the runtime will query the readiness health checks and liveness health checks. For each respective health check, if the status returns UP, then the associated health-check file will update his last modified time stamp. The updated timestamp indicates to any observers that an UP status was reported at this time. A DOWN status results in the file not having its time stamp updated.
The configuration for defining this update interval is the checkInterval attribute of the <mpHealth> element when using the mpHealth-4.0 feature. This configuration is required to enable the file-based health-check mechanism. There is also an optional startupCheckInterval which defaults to 100ms. This configuration value is used during the startup of the feature and is the interval between health-check queries for establishing the UP statuses for all three health check types.
There are environment variable configurations for the checkInterval and startupCheckInterval. They are MP_HEALTH_CHECK_INTERVAL and MP_HEALTH_STARTUP_CHECK_INTERVAL respectively. If both an environment variable configuration and a server.xml configuration is detected, then the server.xml configuration takes precedence.
Example configuration:
<mpHealth checkInterval="10s" startupCheckInterval="250ms" />
The file-based health check mechanism offers users an alternative to the network-based method.
Create a new topic
To create a topic, specify a first draft of the topic that you want added and the section in the navigation where the topic should go.
### Attention
We're also intending to update the documentation on: https://github.com/OpenLiberty/open-liberty-operator/blob/main/doc/openshift-monitoring.adoc
The script to be used in the Kubernetes with exec command is in the Open Liberty container image. It was introduced here OpenLiberty/ci.docker#629
The operator team is working on implementing support through the UI via : OpenLiberty/open-liberty-operator#687, but currently that is not the case. We need to update their documents to let user's know how to configure the the scripts and the probe configuration there.
I have not updated or created any docs on the Open Liberty docs to explain how to use the file-based health check functionality on Kubernetes (primarily just the configuration aspect of the probe config) since I assume that is out of the scope of the docs?
If this is wrong, then may nee to work with an ID person on the best path forward on how to explain this?