Skip to content

Failed SSD leads to UNKNOWN instead of CRITICAL #110

@robert-scheck

Description

@robert-scheck

First of all, thank you very much for your check_smart. Unfortunately, it doesn't seem to catch an entirely failed SSD.

Example of a working SSD:

$ /usr/lib/nagios/plugins/check_smart.pl -d /dev/nvme0n1 -i nvme
OK: Drive  ATP NVMe M.2 2280 SSD S/N 12345678-901234: no SMART errors detected. |Temperature=36 Available_Spare=100 Available_Spare_Threshold=10 Percentage_Used=2 Data_Units_Read=78184185 Data_Units_Written=17963617 Host_Read_Commands=277085817 Host_Write_Commands=1235704305 Controller_Busy_Time=43946 Power_Cycles=33 Power_On_Hours=21848 Unsafe_Shutdowns=12 Media_and_Data_Integrity_Errors=0 Error_Information_Log_Entries=0 Warning__Comp_Temperature_Time=0 Critical_Comp_Temperature_Time=0 Temperature_Sensor_1=39 Temperature_Sensor_2=36 Temperature_Sensor_3=45 Temperature_Sensor_4=36 Temperature_Sensor_5=36 Temperature_Sensor_6=34

Example of a non-working SSD (entirely failed):

$ /usr/lib/nagios/plugins/check_smart.pl -d /dev/nvme1n1 -i nvme
UNKNOWN: Drive  S/N :  No health status line found, |

When looking to the non-working SSD directly using smartctl:

$ smartctl -a /dev/nvme1n1
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.14.0-2-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Input/output error
$ smartctl -x /dev/nvme1n1
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.14.0-2-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Input/output error 

However:

$ ls -l /dev/nvme1n1
brw-rw---- 1 root disk 259, 1 May 22 14:45 /dev/nvme1n1

Did I overlook a possibility to turn an entirely failed SSD into a CRITICAL instead of UNKNOWN? Or is there any chance to catch "NVME_IOCTL_ADMIN_CMD: Input/output error" and to turn this into an error?

Just let me know if you need further details and/or command outputs from the entirely failed SSD.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions