diff --git a/content/en/docs/concepts/cluster-administration/system-metrics.md b/content/en/docs/concepts/cluster-administration/system-metrics.md index aee94e415c888..e3eb9a33cec05 100644 --- a/content/en/docs/concepts/cluster-administration/system-metrics.md +++ b/content/en/docs/concepts/cluster-administration/system-metrics.md @@ -177,13 +177,14 @@ flag to expose these alpha stability metrics. ### kubelet Pressure Stall Information (PSI) metrics -{{< feature-state for_k8s_version="v1.34" state="beta" >}} +{{< feature-state feature_gate_name="KubeletPSI" >}} -As a beta feature, Kubernetes lets you configure kubelet to collect Linux kernel +The kubelet collects Linux kernel [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html) (PSI) for CPU, memory and I/O usage. The information is collected at node, pod and container level. -The metrics are exposed at the `/metrics/cadvisor` endpoint with the following names: + +*Prometheus Metrics*: Exposed at the `/metrics/cadvisor` endpoint as cumulative counters (totals) representing the total stall time in seconds. The metrics are exposed at this endpoint with the following names: ``` container_pressure_cpu_stalled_seconds_total @@ -193,11 +194,75 @@ container_pressure_memory_waiting_seconds_total container_pressure_io_stalled_seconds_total container_pressure_io_waiting_seconds_total ``` +*Summary API*: Exposed at the `/stats/summary` endpoint, providing both the cumulative `totals` and the moving averages (`avg10`, `avg60`, `avg300`) in a JSON format. These averages represent the percentage of time that tasks were stalled on a resource over the respective 10-second, 60-second, and 5-minute intervals. + +These metrics are also natively exported through the node's respective file in `/proc/pressure/` -- cpu, memory, and io in the following format: + +``` +some avg10=0.00 avg60=0.00 avg300=0.00 total=0 +full avg10=0.00 avg60=0.00 avg300=0.00 total=0 +``` + +How can these metrics be interpreted together? Take for example the following query from the Summary API: +`kubectl get --raw "/api/v1/nodes/$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')/proxy/stats/summary" | jq '.pods[].containers[] | select(.name=="") | {name, cpu: .cpu.psi, memory: .memory.psi, io: .io.psi}'`. +This returns the information in a json format as such. + +``` +{ + "name": "", + "cpu": { + "full": { + "total": 0, + "avg10": 0, + "avg60": 0, + "avg300": 0 + }, + "some": { + "total": 35232438, + "avg10": 0.74, + "avg60": 0.52, + "avg300": 0.21, + }, + }, + "memory": { + "full": { + "total": 539105, + "avg10": 0, + "avg60": 0, + "avg300": 0 + }, + "some": { + "total": 658164, + "avg10": 0.01, + "avg60": 0.01, + "avg300": 0.00, + }, + } + }, + "io": { + "full": { + "total": 33190987, + "avg10": 0.31, + "avg60": 0.22, + "avg300": 0.05, + }, + "some": { + "total": 40809937, + "avg10": 0.52, + "avg60": 0.45, + "avg300": 0.12, + } + } +} +``` + +Here is a simple spike scenario. The `avg10` value of `0.74` indicates that in the last 10 seconds, at least one task in this container was stalled on the CPU for 0.74% of the time (0.0074 seconds or 74 milliseconds). Because `avg10` (0.74) is significantly higher than `avg300` (0.21) on the same resource, this suggests a recent surge in resource contention rather than a sustained long-term bottleneck. If monitored continuously and the `avg300` metrics increase as well, we can diagnose a more serious, lasting issue! -This feature is enabled by default, by setting the `KubeletPSI` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/). The information is also exposed in the -[Summary API](/docs/reference/instrumentation/node-metrics#psi). +Additionally, notice how in this example `cpu.some` shows pressure, while `cpu.full` remains at 0.00. This tells us that while some processes were delayed waiting for CPU time, the container as a whole was still making progress. A non-zero full value would indicate that all non-idle tasks were stalled simultaneously - a much bigger problem. +Although not as human-readable, the `total` value of 35232438 represents the cumulative stall time in microseconds, that allow latency spike detection that otherwise may not show in the averages. They are also useful for monitoring systems, like Prometheus, to calculate precise rates of increase over specific time windows. +As a final note, when observing high I/O Pressure alongside low Memory Pressure, it can indicate that the application is waiting on disk throughput rather than failing due to a lack of available RAM. The node is not over-committed on memory, and a different diagnosis for disk consumption can be investigated. -You can learn how to interpret the PSI metrics in [Understand PSI Metrics](/docs/reference/instrumentation/understand-psi-metrics/). +PSI metrics unlock a more robust way to monitor realitime resource contention at all levels for every cgroup, opening up the opportunity to dynamically handle workloads across the system. You can read more about the PSI metrics in [Understand PSI Metrics](/docs/reference/instrumentation/understand-psi-metrics/). #### Requirements diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/KubeletPSI.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/KubeletPSI.md index 0caecab23fec2..c9a0c8aab5aa0 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates/KubeletPSI.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/KubeletPSI.md @@ -13,5 +13,10 @@ stages: - stage: beta defaultValue: true fromVersion: "1.34" + toVersion: "1.35" + - stage: stable + defaultValue: true + fromVersion: "1.36" + locked: true --- Enable kubelet to surface Pressure Stall Information (PSI) metrics in the Summary API and Prometheus metrics. diff --git a/content/en/docs/reference/instrumentation/node-metrics.md b/content/en/docs/reference/instrumentation/node-metrics.md index 042aed8c4244a..94006fa6528b4 100644 --- a/content/en/docs/reference/instrumentation/node-metrics.md +++ b/content/en/docs/reference/instrumentation/node-metrics.md @@ -45,13 +45,13 @@ the kubelet [fetches Pod- and container-level metric data using CRI](/docs/refer ## Pressure Stall Information (PSI) {#psi} -{{< feature-state for_k8s_version="v1.34" state="beta" >}} +{{< feature-state feature_gate_name="KubeletPSI" >}} -As a beta feature, Kubernetes lets you configure kubelet to collect Linux kernel +As a stable feature, Kubernetes lets you configure kubelet to collect Linux kernel [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html) (PSI) for CPU, memory, and I/O usage. The information is collected at node, pod and container level. See [Summary API](/docs/reference/config-api/kubelet-stats.v1alpha1/) for detailed schema. -This feature is enabled by default, by setting the `KubeletPSI` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/). The information is also exposed in +Starting with Kubernetes v.1.36, the `KubeletPSI` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is locked to true and cannot be disabled. The information is also exposed in [Prometheus metrics](/docs/concepts/cluster-administration/system-metrics#psi-metrics). You can learn how to interpret the PSI metrics in [Understand PSI Metrics](/docs/reference/instrumentation/understand-psi-metrics/). diff --git a/content/en/docs/reference/instrumentation/understand-psi-metrics.md b/content/en/docs/reference/instrumentation/understand-psi-metrics.md index 405d0ed60374e..bf6ad55a61998 100644 --- a/content/en/docs/reference/instrumentation/understand-psi-metrics.md +++ b/content/en/docs/reference/instrumentation/understand-psi-metrics.md @@ -8,12 +8,12 @@ description: >- -{{< feature-state for_k8s_version="v1.34" state="beta" >}} +{{< feature-state feature_gate_name="KubeletPSI" >}} -As a beta feature, Kubernetes lets you configure the kubelet to collect Linux kernel +As a stable feature, Kubernetes lets you configure the kubelet to collect Linux kernel [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html) (PSI) for CPU, memory, and I/O usage. The information is collected at node, pod and container level. -This feature is enabled by default by setting the `KubeletPSI` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/). +Starting with Kubernetes v.1.36, the `KubeletPSI` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is locked to true and cannot be disabled. PSI metrics are exposed through two different sources: - The kubelet's [Summary API](/docs/reference/config-api/kubelet-stats.v1alpha1/), which provides PSI data at the node, pod, and container level.