Skip to content
77 changes: 71 additions & 6 deletions content/en/docs/concepts/cluster-administration/system-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,13 +177,14 @@ flag to expose these alpha stability metrics.

### kubelet Pressure Stall Information (PSI) metrics

{{< feature-state for_k8s_version="v1.34" state="beta" >}}
{{< feature-state feature_gate_name="KubeletPSI" >}}

As a beta feature, Kubernetes lets you configure kubelet to collect Linux kernel
The kubelet collects Linux kernel
[Pressure Stall Information](https://docs.kernel.org/accounting/psi.html)
(PSI) for CPU, memory and I/O usage.
The information is collected at node, pod and container level.
The metrics are exposed at the `/metrics/cadvisor` endpoint with the following names:

*Prometheus Metrics*: Exposed at the `/metrics/cadvisor` endpoint as cumulative counters (totals) representing the total stall time in seconds. The metrics are exposed at this endpoint with the following names:

```
container_pressure_cpu_stalled_seconds_total
Expand All @@ -193,11 +194,75 @@ container_pressure_memory_waiting_seconds_total
container_pressure_io_stalled_seconds_total
container_pressure_io_waiting_seconds_total
```
*Summary API*: Exposed at the `/stats/summary` endpoint, providing both the cumulative `totals` and the moving averages (`avg10`, `avg60`, `avg300`) in a JSON format. These averages represent the percentage of time that tasks were stalled on a resource over the respective 10-second, 60-second, and 5-minute intervals.

These metrics are also natively exported through the node's respective file in `/proc/pressure/` -- cpu, memory, and io in the following format:

```
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most markdown linters like the empty line before the ```

some avg10=0.00 avg60=0.00 avg300=0.00 total=0
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
```

How can these metrics be interpreted together? Take for example the following query from the Summary API:
`kubectl get --raw "/api/v1/nodes/$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')/proxy/stats/summary" | jq '.pods[].containers[] | select(.name=="<CONTAINER_NAME>") | {name, cpu: .cpu.psi, memory: .memory.psi, io: .io.psi}'`.
This returns the information in a json format as such.

```
{
"name": "<CONTAINER_NAME>",
"cpu": {
"full": {
"total": 0,
"avg10": 0,
"avg60": 0,
"avg300": 0
},
"some": {
"total": 35232438,
"avg10": 0.74,
"avg60": 0.52,
"avg300": 0.21,
},
},
"memory": {
"full": {
"total": 539105,
"avg10": 0,
"avg60": 0,
"avg300": 0
},
"some": {
"total": 658164,
"avg10": 0.01,
"avg60": 0.01,
"avg300": 0.00,
},
}
},
"io": {
"full": {
"total": 33190987,
"avg10": 0.31,
"avg60": 0.22,
"avg300": 0.05,
},
"some": {
"total": 40809937,
"avg10": 0.52,
"avg60": 0.45,
"avg300": 0.12,
}
}
}
```

Here is a simple spike scenario. The `avg10` value of `0.74` indicates that in the last 10 seconds, at least one task in this container was stalled on the CPU for 0.74% of the time (0.0074 seconds or 74 milliseconds). Because `avg10` (0.74) is significantly higher than `avg300` (0.21) on the same resource, this suggests a recent surge in resource contention rather than a sustained long-term bottleneck. If monitored continuously and the `avg300` metrics increase as well, we can diagnose a more serious, lasting issue!

This feature is enabled by default, by setting the `KubeletPSI` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/). The information is also exposed in the
[Summary API](/docs/reference/instrumentation/node-metrics#psi).
Additionally, notice how in this example `cpu.some` shows pressure, while `cpu.full` remains at 0.00. This tells us that while some processes were delayed waiting for CPU time, the container as a whole was still making progress. A non-zero full value would indicate that all non-idle tasks were stalled simultaneously - a much bigger problem.
Although not as human-readable, the `total` value of 35232438 represents the cumulative stall time in microseconds, that allow latency spike detection that otherwise may not show in the averages. They are also useful for monitoring systems, like Prometheus, to calculate precise rates of increase over specific time windows.
As a final note, when observing high I/O Pressure alongside low Memory Pressure, it can indicate that the application is waiting on disk throughput rather than failing due to a lack of available RAM. The node is not over-committed on memory, and a different diagnosis for disk consumption can be investigated.

You can learn how to interpret the PSI metrics in [Understand PSI Metrics](/docs/reference/instrumentation/understand-psi-metrics/).
PSI metrics unlock a more robust way to monitor realitime resource contention at all levels for every cgroup, opening up the opportunity to dynamically handle workloads across the system. You can read more about the PSI metrics in [Understand PSI Metrics](/docs/reference/instrumentation/understand-psi-metrics/).

#### Requirements

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,10 @@ stages:
- stage: beta
defaultValue: true
fromVersion: "1.34"
toVersion: "1.35"
- stage: stable
defaultValue: true
fromVersion: "1.36"
locked: true
---
Enable kubelet to surface Pressure Stall Information (PSI) metrics in the Summary API and Prometheus metrics.
6 changes: 3 additions & 3 deletions content/en/docs/reference/instrumentation/node-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,13 @@ the kubelet [fetches Pod- and container-level metric data using CRI](/docs/refer

## Pressure Stall Information (PSI) {#psi}

{{< feature-state for_k8s_version="v1.34" state="beta" >}}
{{< feature-state feature_gate_name="KubeletPSI" >}}

As a beta feature, Kubernetes lets you configure kubelet to collect Linux kernel
As a stable feature, Kubernetes lets you configure kubelet to collect Linux kernel
[Pressure Stall Information](https://docs.kernel.org/accounting/psi.html)
(PSI) for CPU, memory, and I/O usage. The information is collected at node, pod and container level.
See [Summary API](/docs/reference/config-api/kubelet-stats.v1alpha1/) for detailed schema.
This feature is enabled by default, by setting the `KubeletPSI` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/). The information is also exposed in
Starting with Kubernetes v.1.36, the `KubeletPSI` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is locked to true and cannot be disabled. The information is also exposed in
[Prometheus metrics](/docs/concepts/cluster-administration/system-metrics#psi-metrics).

You can learn how to interpret the PSI metrics in [Understand PSI Metrics](/docs/reference/instrumentation/understand-psi-metrics/).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@ description: >-

<!-- overview -->

{{< feature-state for_k8s_version="v1.34" state="beta" >}}
{{< feature-state feature_gate_name="KubeletPSI" >}}

As a beta feature, Kubernetes lets you configure the kubelet to collect Linux kernel
As a stable feature, Kubernetes lets you configure the kubelet to collect Linux kernel
[Pressure Stall Information](https://docs.kernel.org/accounting/psi.html)
(PSI) for CPU, memory, and I/O usage. The information is collected at node, pod and container level.
This feature is enabled by default by setting the `KubeletPSI` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
Starting with Kubernetes v.1.36, the `KubeletPSI` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is locked to true and cannot be disabled.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment above. I think we can add an example with non-zero PSI data here to help users understand how to interpret the PSI data.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I reuse the same example as the system-metrics.md page or make a new one?


PSI metrics are exposed through two different sources:
- The kubelet's [Summary API](/docs/reference/config-api/kubelet-stats.v1alpha1/), which provides PSI data at the node, pod, and container level.
Expand Down