Skip to content

Collector 500s when drives resume from idle after exporter was started #326

@chrishoage

Description

@chrishoage

I run smartctl_exporter in a homelab environment, where my disks are spun down most of the time.

When smartctl_exporter starts in a condition where all the drives are spun down /metrics returns 200 with no smartctl metrics (fine, and expected)

When drives come back online smartctl_exporter gets stuck in an error state with output line this

curl -Lv http://127.0.0.1:9633/metrics
*   Trying 127.0.0.1:9633...
* Connected to 127.0.0.1 (127.0.0.1) port 9633
* using HTTP/1.x
> GET /metrics HTTP/1.1
> Host: 127.0.0.1:9633
> User-Agent: curl/8.14.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 500 Internal Server Error
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Sun, 21 Dec 2025 17:07:14 GMT
< Transfer-Encoding: chunked
<
An error has occurred while serving metrics:

337 error(s) occurred:
* collected metric smartctl_device_smartctl_exit_status label:{name:"device" value:"ata-WDC_WD140EDGZ-11B1PA0_9LK4YMLG"} gauge:{value:0} with unregistered descriptor Desc{fqName: "smartctl_device_smartctl_exit_status", help: "Exit status of smartctl on device", constLabels: {}, variableLabels: {device}}
* collected metric smartctl_device label:{name:"ata_additional_product_id" value:"unknown"} label:{name:"ata_version" value:"ACS-2, ATA8-ACS T13/1699-D revision 4"} label:{name:"device" value:"ata-WDC_WD140EDGZ-11B1PA0_9LK4YMLG"} label:{name:"firmware_version" value:"85.00A85"} label:{name:"form_factor" value:"3.5 inches"} label:{name:"interface" value:"sat"} label:{name:"model_family" value:"Western Digital Ultrastar (He10/12)"} label:{name:"model_name" value:"WDC WD140EDGZ-11B1PA0"} label:{name:"protocol" value:"ATA"} label:{name:"sata_version" value:"SATA 3.2"} label:{name:"scsi_product" value:""} label:{name:"scsi_revision" value:""} label:{name:"scsi_vendor" value:""} label:{name:"scsi_version" value:""} label:{name:"serial_number" value:"9LK4YMLG"} gauge:{value:1} with unregistered descriptor Desc{fqName: "smartctl_device", help: "Device info", constLabels: {}, variableLabels: {device,interface,protocol,model_family,model_name,serial_number,ata_additional_product_id,firmware_version,ata_version,sata_version,form_factor,scsi_vendor,scsi_product,scsi_revision,scsi_version}}

I suspect this may be a fundamental limitation with how smartctl_exporter works where it expects the drives to be queryable during its first run in order to register available metrics.

I don't expect to have this solved any time soon (and will likely switch to a textfile collector for this reason) however it seems like for a "version 2" of this exporter it may be ideal to use labels for most of these metrics instead of metrics like smartctl_device_block_size

Or, at the very least, explicitly define some of the ones which will always be present

Possibly related: #305

Separately, I also ran into #265 - I worked around it with a separate exporter on a different port, but it would be handy to define the nocheck option per drive

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions