diff --git a/issues/disk-pressure-eviction.org b/issues/disk-pressure-eviction.org new file mode 100644 index 0000000..7954107 --- /dev/null +++ b/issues/disk-pressure-eviction.org @@ -0,0 +1,475 @@ +#+title: Cloudnative Nz Down +* TLDR +Zach noted that space.cloudnative.nz was down. + +When available storage drops below 15% on that disk, pods are evicted (deleted). + +This affected use due to 85% utilization of the OS / Ubuntu level files system used for *imagefs* + +See https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#hard-eviction-thresholds + +Temporary fix was to double the space (100GB to 200GB) allocated to the root Ubuntu logical volume from the physical 500GB volume. + +Long term fix will be to setup our nodes with a dedicated *imagesfs* volume and monitor utilization. + + +#+begin_src shell +ssh root@k8s.cloudnative.nz df -h -t ext4 / +#+end_src + +#+RESULTS: +#+begin_example +Filesystem Size Used Avail Use% Mounted on +/dev/mapper/ubuntu--vg-ubuntu--lv 197G 79G 109G 43% / +#+end_example + +#+begin_src shell +curl https://space.cloudnative.nz --head | grep HTTP +#+end_src + +#+RESULTS: +#+begin_example +HTTP/2 200 +#+end_example +* Background Reading +** Ephemeral storage +https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#local-ephemeral-storage +https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#configurations-for-local-ephemeral-storage +** Eviction +https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/ +Node-pressure eviction is the process by which the kubelet proactively terminates pods to reclaim resources on nodes. +** https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#hard-eviction-thresholds +A hard eviction threshold has no grace period. When a hard eviction threshold is met, the kubelet kills pods immediately without graceful termination to reclaim the starved resource. + +The kubelet has the following default hard eviction thresholds: + +- memory.available<100Mi +- nodefs.available<10% +- imagefs.available<15% +- nodefs.inodesFree<5% (Linux nodes) + +These default values of hard eviction thresholds will only be set if none of the parameters is changed. If you changed the value of any parameter, then the values of other parameters will not be inherited as the default values and will be set to zero. In order to provide custom values, you should provide all the thresholds respectively. +** https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#node-conditions +The kubelet reports node conditions to reflect that the node is under pressure because hard or soft eviction threshold is met, independent of configured grace periods. + +- DiskPressure + - nodefs.available, nodefs.inodesFree, imagefs.available, or imagefs.inodesFree + - Available disk space and inodes on either the node's root filesystem or image filesystem has satisfied an eviction threshold + +* Check that it's down +#+begin_src shell +curl https://space.cloudnative.nz --head | grep HTTP +#+end_src + +#+RESULTS: +#+begin_example +HTTP/2 503 +#+end_example +* check on coder ingress +#+begin_src shell +kubectl -n coder get ingress +#+end_src + +#+RESULTS: +#+begin_example +NAME CLASS HOSTS ADDRESS PORTS AGE +coder nginx space.cloudnative.nz,*.cloudnative.nz 123.253.178.101 80, 443 10d +#+end_example +* check on coder ingress.spec.rules[0].http.paths + +Here we look for the http paths that route */* to a backend service + +#+begin_src shell +kubectl -n coder get ingress coder -o yaml \ + | yq '.spec.rules[0].http.paths' +#+end_src + +#+RESULTS: +#+begin_example +- backend: + service: + name: coder + port: + name: http + path: / + pathType: Prefix +#+end_example +* check on coder svc +#+begin_src shell +kubectl -n coder get svc coder +#+end_src + +#+RESULTS: +#+begin_example +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +coder ClusterIP 10.104.202.123 80/TCP 10d +#+end_example +* determine coder svc ports +#+begin_src shell :wrap "src yaml" +kubectl -n coder get svc coder -o yaml \ + | yq .spec.ports +#+end_src + +#+RESULTS: +#+begin_src yaml +- name: http + port: 80 + protocol: TCP + targetPort: http +#+end_src +* determine coder svc selector +#+begin_src shell :wrap "src yaml" +kubectl -n coder get svc coder -o yaml \ + | yq .spec.selector +#+end_src + +#+RESULTS: +#+begin_src yaml +app.kubernetes.io/instance: coder +app.kubernetes.io/name: coder +#+end_src +* search for coder svc target pods +#+begin_src shell +kubectl -n coder get pods -l app.kubernetes.io/name=coder +#+end_src + +#+RESULTS: +#+begin_example +NAME READY STATUS RESTARTS AGE +coder-7996486845-6cph8 0/1 ContainerStatusUnknown 1 75m +coder-7996486845-bkffz 0/1 ContainerStatusUnknown 1 114m +coder-7996486845-bqmqp 0/1 ContainerStatusUnknown 1 30m +coder-7996486845-cf577 0/1 ContainerStatusUnknown 1 121m +coder-7996486845-dqnn8 1/1 Running 0 14m +coder-7996486845-dsrbr 0/1 ContainerStatusUnknown 1 46m +coder-7996486845-ptc6n 0/1 ContainerStatusUnknown 1 107m +coder-7996486845-rtgcj 0/1 ContainerStatusUnknown 1 153m +coder-7996486845-rvkjx 0/1 ContainerStatusUnknown 1 92m +coder-7996486845-sdz9n 0/1 ContainerStatusUnknown 1 70m +coder-7996486845-vdgr9 0/1 ContainerStatusUnknown 1 137m +coder-7996486845-x5cvp 0/1 ContainerStatusUnknown 6 (2d8h ago) 4d11h +coder-7996486845-xz6b7 0/1 ContainerStatusUnknown 1 101m +#+end_example +* inspect Events for pods that seem to be having issues +#+begin_src shell +kubectl -n coder events --for=pod/coder-7996486845-bqmqp +#+end_src + +#+RESULTS: +#+begin_example +LAST SEEN TYPE REASON OBJECT MESSAGE +30m (x2 over 35m) Warning FailedScheduling Pod/coder-7996486845-bqmqp 0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.. +29m Normal Scheduled Pod/coder-7996486845-bqmqp Successfully assigned coder/coder-7996486845-bqmqp to srv1 +29m Normal Pulling Pod/coder-7996486845-bqmqp Pulling image "ghcr.io/coder/coder:v0.27.1" +28m Normal Pulled Pod/coder-7996486845-bqmqp Successfully pulled image "ghcr.io/coder/coder:v0.27.1" in 14.957685446s (14.957810454s including waiting) +28m Normal Created Pod/coder-7996486845-bqmqp Created container coder +28m Normal Started Pod/coder-7996486845-bqmqp Started container coder +28m (x2 over 28m) Warning Unhealthy Pod/coder-7996486845-bqmqp Readiness probe failed: Get "http://10.0.0.119:8080/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) +19m Warning Evicted Pod/coder-7996486845-bqmqp The node was low on resource: ephemeral-storage. Threshold quantity: 15763389861, available: 14492980Ki. Container coder was using 421700Ki, request is 0, has larger consumption of ephemeral-storage. +19m Normal Killing Pod/coder-7996486845-bqmqp Stopping container coder +19m Warning ExceededGracePeriod Pod/coder-7996486845-bqmqp Container runtime did not kill the pod within specified grace period. +#+end_example +* inspect status for pods that seem to be having issues +#+begin_src shell :wrap "src yaml" +kubectl -n coder get pod/coder-7996486845-bqmqp -o yaml \ + | yq .status \ + | grep ^message:\\\|^phase:\\\|^reason: +#+end_src + +#+RESULTS: +#+begin_src yaml +message: 'The node was low on resource: ephemeral-storage. Threshold quantity: 15763389861, available: 14492980Ki. Container coder was using 421700Ki, request is 0, has larger consumption of ephemeral-storage. ' +phase: Failed +reason: Evicted +#+end_src +* inspect status.containerStatuses for pods that seem to be having issues +#+begin_src shell :wrap "src yaml" +kubectl -n coder get pod/coder-7996486845-bqmqp -o yaml \ + | yq .status.containerStatuses.0 +#+end_src + +#+RESULTS: +#+begin_src yaml +image: ghcr.io/coder/coder:v0.27.1 +imageID: "" +lastState: + terminated: + exitCode: 137 + finishedAt: null + message: The container could not be located when the pod was deleted. The container used to be Running + reason: ContainerStatusUnknown + startedAt: null +name: coder +ready: false +restartCount: 1 +started: false +state: + terminated: + exitCode: 137 + finishedAt: null + message: The container could not be located when the pod was terminated + reason: ContainerStatusUnknown + startedAt: null +#+end_src + +* inspect status.conditions for pods that seem to be having issues +#+begin_src shell :wrap "src yaml" +kubectl -n coder get pod/coder-7996486845-bqmqp -o yaml \ + | yq .status.conditions.0 +#+end_src + +#+RESULTS: +#+begin_src yaml +lastProbeTime: null +lastTransitionTime: "2023-07-28T06:44:40Z" +message: 'The node was low on resource: ephemeral-storage. Threshold quantity: 15763389861, available: 14492980Ki. Container coder was using 421700Ki, request is 0, has larger consumption of ephemeral-storage. ' +reason: TerminationByKubelet +status: "True" +type: DisruptionTarget +#+end_src +* figure out node for broken pod +#+begin_src shell :wrap "src yaml" +kubectl -n coder get pod/coder-7996486845-bqmqp -o jsonpath="{.spec.nodeName}" +#+end_src + +#+RESULTS: +#+begin_src yaml +srv1 +#+end_src + +* get nodes +#+begin_src shell +kubectl get nodes +#+end_src + +#+RESULTS: +#+begin_example +NAME STATUS ROLES AGE VERSION +srv1 Ready control-plane 10d v1.27.3 +#+end_example +* events for node +#+begin_src shell +kubectl events -A --for=node/srv1 +#+end_src + +#+RESULTS: +#+begin_example +NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE +default 60m Warning FreeDiskSpaceFailed Node/srv1 Failed to garbage collect required amount of images. Attempted to free 5100226969 bytes, but only found 4423240768 bytes eligible to free. +longhorn 52m Warning Schedulable Node/srv1 the disk default-disk-e4eb62364051e56c(/var/lib/longhorn/) on the node srv1 has 25585254400 available, but requires reserved 31526778470, minimal 25% to schedule more replicas +default 44m Warning FreeDiskSpaceFailed Node/srv1 Failed to garbage collect required amount of images. Attempted to free 5104617881 bytes, but only found 4423240768 bytes eligible to free. +longhorn 38m (x2 over 44h) Warning Schedulable Node/srv1 the disk default-disk-e4eb62364051e56c(/var/lib/longhorn/) on the node srv1 has 26109542400 available, but requires reserved 31526778470, minimal 25% to schedule more replicas +default 34m Warning FreeDiskSpaceFailed Node/srv1 Failed to garbage collect required amount of images. Attempted to free 5029218713 bytes, but only found 301773 bytes eligible to free. +default 29m Warning FreeDiskSpaceFailed Node/srv1 Failed to garbage collect required amount of images. Attempted to free 5111343513 bytes, but only found 4423240768 bytes eligible to free. +longhorn 21m (x2 over 84m) Warning Schedulable Node/srv1 the disk default-disk-e4eb62364051e56c(/var/lib/longhorn/) on the node srv1 has 26214400000 available, but requires reserved 31526778470, minimal 25% to schedule more replicas +default 17m (x16 over 24h) Normal NodeHasDiskPressure Node/srv1 Node srv1 status is now: NodeHasDiskPressure +longhorn 17m (x929 over 24h) Warning Ready Node/srv1 Kubernetes node srv1 has pressure: KubeletHasDiskPressure, kubelet has disk pressure +longhorn 5m (x1037 over 2d9h) Normal Ready Node/srv1 Node srv1 is ready +default 4m18s (x2379 over 2d16h) Normal NodeHasNoDiskPressure Node/srv1 Node srv1 status is now: NodeHasNoDiskPressure +default 2m11s (x72 over 24h) Warning EvictionThresholdMet Node/srv1 Attempting to reclaim ephemeral-storage +#+end_example +* node.spec.taints +#+begin_src shell :wrap "src yaml" +kubectl get node srv1 -o yaml \ + | yq .spec.taints +#+end_src + +#+RESULTS: +#+begin_src yaml +- effect: NoSchedule + key: node.kubernetes.io/disk-pressure + timeAdded: "2023-07-28T07:38:40Z" +#+end_src + +* node.status.allocatable +#+begin_src shell :wrap "src yaml" +kubectl get node srv1 -o yaml \ + | yq .status.allocatable +#+end_src + +#+RESULTS: +#+begin_src yaml +cpu: "24" +ephemeral-storage: "94580335255" +hugepages-1Gi: "0" +hugepages-2Mi: "0" +memory: 197909196Ki +pods: "110" +#+end_src + +* node.status.capacity +#+begin_src shell :wrap "src yaml" +kubectl get node srv1 -o yaml \ + | yq .status.capacity +#+end_src + +#+RESULTS: +#+begin_src yaml +cpu: "24" +ephemeral-storage: 102626232Ki +hugepages-1Gi: "0" +hugepages-2Mi: "0" +memory: 198011596Ki +pods: "110" +#+end_src + +* node.status.conditions +#+begin_src shell :wrap "src yaml" +kubectl get node srv1 -o yaml \ + | yq .status.conditions +#+end_src + +#+RESULTS: +#+begin_src yaml +- lastHeartbeatTime: "2023-07-17T14:56:32Z" + lastTransitionTime: "2023-07-17T14:56:32Z" + message: Cilium is running on this node + reason: CiliumIsUp + status: "False" + type: NetworkUnavailable +- lastHeartbeatTime: "2023-07-28T07:45:28Z" + lastTransitionTime: "2023-07-25T22:14:35Z" + message: kubelet has sufficient memory available + reason: KubeletHasSufficientMemory + status: "False" + type: MemoryPressure +- lastHeartbeatTime: "2023-07-28T07:45:28Z" + lastTransitionTime: "2023-07-28T07:44:38Z" + message: kubelet has no disk pressure + reason: KubeletHasNoDiskPressure + status: "False" + type: DiskPressure +- lastHeartbeatTime: "2023-07-28T07:45:28Z" + lastTransitionTime: "2023-07-25T22:14:35Z" + message: kubelet has sufficient PID available + reason: KubeletHasSufficientPID + status: "False" + type: PIDPressure +- lastHeartbeatTime: "2023-07-28T07:45:28Z" + lastTransitionTime: "2023-07-25T22:14:35Z" + message: kubelet is posting ready status. AppArmor enabled + reason: KubeletReady + status: "True" + type: Ready +#+end_src +* node.status.condition of interest (DiskPressure) +#+begin_src shell :wrap "src yaml" +kubectl get node srv1 -o yaml \ + | yq .status.conditions.2 +#+end_src + +#+RESULTS: +#+begin_src yaml +lastHeartbeatTime: "2023-07-28T08:04:53Z" +lastTransitionTime: "2023-07-28T08:00:28Z" +message: kubelet has disk pressure +reason: KubeletHasDiskPressure +status: "True" +type: DiskPressure +#+end_src + +* node.stats.runtime +#+begin_src shell :wrap "src yaml" +kubectl get --raw "/api/v1/nodes/srv1/proxy/stats/summary" \ + | yq -P .node.runtime +#+end_src + +#+RESULTS: +#+begin_src yaml +imageFs: + time: "2023-07-28T08:01:03Z" + availableBytes: 15480643584 + capacityBytes: 105089261568 + usedBytes: 46888923136 + inodesFree: 4701743 + inodes: 6553600 + inodesUsed: 1577494 +#+end_src + +* node.stats.fs +#+begin_src shell :wrap "src yaml" +kubectl get --raw "/api/v1/nodes/srv1/proxy/stats/summary" \ + | yq -P .node.fs +#+end_src + +#+RESULTS: +#+begin_src yaml +time: "2023-07-28T08:01:33Z" +availableBytes: 15624622080 +capacityBytes: 105089261568 +usedBytes: 84079153152 +inodesFree: 4703165 +inodes: 6553600 +inodesUsed: 1850435 +#+end_src + +* Take a look at node ext4 filesystem from OS level +Looks like the filesystem is filling up to closer to 85% (that's when pods get evicted) +#+begin_src shell :nodir "/ssh:root@k8s.cloudnative.nz:/" +ssh root@k8s.cloudnative.nz df -h -t ext4 +#+end_src + +#+RESULTS: +#+begin_example +Filesystem Size Used Avail Use% Mounted on +/dev/mapper/ubuntu--vg-ubuntu--lv 98G 79G 15G 85% / +/dev/sda2 2.0G 253M 1.6G 14% /boot +/dev/longhorn/pvc-c995ff5d-f177-4d8c-a88c-bbc830375827 7.8G 233M 7.6G 3% /var/lib/kubelet/pods/73537501-f49d-4a63-a07c-436bf71b5d5b/volumes/kubernetes.io~csi/pvc-c995ff5d-f177-4d8c-a88c-bbc830375827/mount +#+end_example +* extend the root logical volume +Looks like the filesystem is filling up to closer to 85% (that's when pods get evicted) +#+begin_src shell :noeval +lvextend -L200G /dev/mapper/ubuntu--vg-ubuntu--lv +#+end_src + +#+begin_example + Size of logical volume ubuntu-vg/ubuntu-lv changed from 100.00 GiB (25600 extents) to 200.00 GiB (51200 extents). + Logical volume ubuntu-vg/ubuntu-lv successfully resized. +#+end_example +* Inspect resized logical volumes +Looks like the filesystem is filling up to closer to 85% (that's when pods get evicted) +#+begin_src shell :nodir "/ssh:root@k8s.cloudnative.nz:/" +ssh root@k8s.cloudnative.nz lvs +#+end_src + +#+RESULTS: +#+begin_example + LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert + ubuntu-lv ubuntu-vg -wi-ao---- 200.00g +#+end_example + +* Inspect physical volumes allocation +Looks like the filesystem is filling up to closer to 85% (that's when pods get evicted) +#+begin_src shell :nodir "/ssh:root@k8s.cloudnative.nz:/" +ssh root@k8s.cloudnative.nz pvs +#+end_src + +#+RESULTS: +#+begin_example + PV VG Fmt Attr PSize PFree + /dev/sda3 ubuntu-vg lvm2 a-- <463.73g <263.73g +#+end_example +* Resize the root filesystem (on top of the now larger Logical Volume) +#+begin_src +resize2fs /dev/mapper/ubuntu--vg-ubuntu--lv +#+end_src +#+begin_example +resize2fs 1.46.5 (30-Dec-2021) +Filesystem at /dev/mapper/ubuntu--vg-ubuntu--lv is mounted on /; on-line resizing required +old_desc_blocks = 13, new_desc_blocks = 25 +The filesystem on /dev/mapper/ubuntu--vg-ubuntu--lv is now 52428800 (4k) blocks long. +#+end_example + +* check free space at OS now that volume is extended +#+begin_src shell :nodir "/ssh:root@k8s.cloudnative.nz:/" +ssh root@k8s.cloudnative.nz df -h -t ext4 +#+end_src + +#+RESULTS: +#+begin_example +Filesystem Size Used Avail Use% Mounted on +/dev/mapper/ubuntu--vg-ubuntu--lv 197G 79G 109G 42% / +/dev/sda2 2.0G 253M 1.6G 14% /boot +/dev/longhorn/pvc-c995ff5d-f177-4d8c-a88c-bbc830375827 7.8G 233M 7.6G 3% /var/lib/kubelet/pods/73537501-f49d-4a63-a07c-436bf71b5d5b/volumes/kubernetes.io~csi/pvc-c995ff5d-f177-4d8c-a88c-bbc830375827/mount +#+end_example +