Skip to content

Division by zero when sampleCount is 0 #15

@wenjianhn

Description

@wenjianhn

sampleCount was 0:

*averageUsage = sum/sampleCount;

Kubelet crashed due to SIGFPE which is a "divide error" on X86:

Sep 06 16:51:50 foobar kernel: NVRM: Xid (PCI:0000:db:00): 95, pid=7101, name=kubelet, Uncontained: FBHUB. RST: Yes, D-RST: No
Sep 06 16:51:51 foobar kernel: NVRM: Xid (PCI:0000:db:00): 95, pid=2362883, name=python, Ch 0000000a
Sep 06 16:51:51 foobar kernel: NVRM: Xid (PCI:0000:db:00): 95, pid=7101, name=kubelet, Uncontained: FBHUB. RST: Yes, D-RST: No
Sep 06 16:51:51 foobar kernel: NVRM: Xid (PCI:0000:db:00): 95, pid=2362883, name=python, Ch 0000000a
Sep 06 16:51:51 foobar kernel: NVRM: Xid (PCI:0000:db:00): 95, pid=7101, name=kubelet, Uncontained: PCIE. RST: Yes, D-RST: No
Sep 06 16:51:51 foobar kernel: NVRM: Xid (PCI:0000:db:00): 95, pid=2362883, name=python, Ch 0000000a
Sep 06 16:51:51 foobar kubelet[7101]: fatal error: unexpected signal during runtime execution
Sep 06 16:51:51 foobar kernel: NVRM: _kgspProcessRpcEvent: Unexpected RPC event from GPU7: 0x4c (GSP_RM_CONTROL)
Sep 06 16:51:51 foobar kubelet[7101]: [signal SIGFPE: floating-point exception code=0x1 addr=0x7ff06af196fc pc=0x7ff06af196fc]
Sep 06 16:51:51 foobar kubelet[7101]: runtime stack:
Sep 06 16:51:51 foobar kubelet[7101]: runtime.throw(0x4ba98c4, 0x2a)
Sep 06 16:51:51 foobar kubelet[7101]: /usr/local/go/src/runtime/panic.go:1117 +0x72
Sep 06 16:51:51 foobar kubelet[7101]: runtime.sigpanic()
Sep 06 16:51:51 foobar kubelet[7101]: /usr/local/go/src/runtime/signal_unix.go:718 +0x2e5
Sep 06 16:51:51 foobar kubelet[7101]: goroutine 14100013 [syscall]:
Sep 06 16:51:51 foobar kubelet[7101]: runtime.cgocall(0x3ad6530, 0xc0015368f8, 0x186408826e500)
Sep 06 16:51:51 foobar kubelet[7101]: /usr/local/go/src/runtime/cgocall.go:154 +0x5b fp=0xc0015368c8 sp=0xc001536890 pc=0x40707b
Sep 06 16:51:51 foobar kubelet[7101]: k8s.io/kubernetes/vendor/github.com/mindprince/gonvml._Cfunc_nvmlDeviceGetAverageUsage(0x7ff06b4df8a8, 0xc000000001, 0x63e1e0e4c5955, 0xc00578>
Sep 06 16:51:51 foobar kubelet[7101]: _cgo_gotypes.go:100 +0x48 fp=0xc0015368f8 sp=0xc0015368c8 pc=0x1f15348
Sep 06 16:51:51 foobar kubelet[7101]: k8s.io/kubernetes/vendor/github.com/mindprince/gonvml.Device.AverageGPUUtilization.func1(0x7ff06b4df8a8, 0x63e1e0e4c5955, 0xc005789b00, 0xffff>
Sep 06 16:51:51 foobar kubelet[7101]: /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/mindprince/gonvml/bindings.go:477 +0x6f fp=0xc>
Sep 06 16:51:51 foobar kubelet[7101]: k8s.io/kubernetes/vendor/github.com/mindprince/gonvml.Device.AverageGPUUtilization(0x7ff06b4df8a8, 0x2540be400, 0x166f200000, 0x0, 0x0)
Sep 06 16:51:51 foobar kubelet[7101]: /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/mindprince/gonvml/bindings.go:477 +0xf4 fp=0xc>
Sep 06 16:51:51 foobar kubelet[7101]: k8s.io/kubernetes/vendor/github.com/google/cadvisor/accelerators.(*nvidiaCollector).UpdateStats(0xc0080c1c20, 0xc00c33b800, 0x1860a9fe1415b, 0>
Sep 06 16:51:51 foobar kubelet[7101]: /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/accelerators/nvidia.go:260 +0x>
Sep 06 16:51:51 foobar kubelet[7101]: k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager.(*containerData).updateStats(0xc0064d6b40, 0xc2271b56b01531ab, 0x1860a9fd870c3)
Sep 06 16:51:51 foobar kubelet[7101]: /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager/container.go:688 +0x9f>
Sep 06 16:51:51 foobar kubelet[7101]: k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager.(*containerData).housekeepingTick(0xc0064d6b40, 0xc008b10300, 0x5f5e100, 0xc000d14>
Sep 06 16:51:51 foobar kubelet[7101]: /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager/container.go:583 +0x15>
Sep 06 16:51:51 foobar kubelet[7101]: k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager.(*containerData).housekeeping(0xc0064d6b40)
Sep 06 16:51:51 foobar kubelet[7101]: /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager/container.go:531 +0x28>
Sep 06 16:51:51 foobar kubelet[7101]: runtime.goexit()
Sep 06 16:51:51 foobar kubelet[7101]: /usr/local/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc001537fe0 sp=0xc001537fd8 pc=0x475f21
Sep 06 16:51:51 foobar kubelet[7101]: created by k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager.(*containerData).Start
Sep 06 16:51:51 foobar kubelet[7101]: /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager/container.go:119 +0x3f`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions