So, I've been trying to figure out why the report consistently shows some rather alarming numbers for Seek Error Health. The drives I'm testing against do not actually list any errors at all, and yet both the FARM stats from Seagate, as well as the non-raw values reported in section 7 return extreme values as well. As an example, this is a section of the report on one of the drives in question:
| Device |
Model |
Serial Number |
RPM |
Capacity |
SMART Status |
Temp |
Power-On Time (ymdh) |
Start Stop Count |
Spin Retry Count |
Realloc Sectors |
Realloc Events |
Current Pending Sectors |
Offline Uncorrectable Sectors |
CRC Errors |
Seek Error Health |
Last Test Age (days) |
Last Test Type |
| /dev/sdg |
ST18000NM000J-2TV103 |
ZR5AR716 |
7200 |
[18.0 TB] |
PASSED |
34°C |
1y 0m 14d 5h |
7 |
0 |
0 |
|
0 |
0 |
0 |
85% |
0 |
Extended offline |
So what the heck is going on there? Digging in manually leads to some interesting conclusions. First, here's the raw SMART output on that drive, both standard ata_smart and the somewhat experimental still --log farm output from smartmontools 7.4 (Yes, this annoyed me enough that I backported it to SCALE).
.ata_smart_attributes.table[]
{
"id": 7,
"name": "Seek_Error_Rate",
"value": 85,
"worst": 60,
"thresh": 45,
"when_failed": "",
"flags": {
"value": 15,
"string": "POSR-- ",
"prefailure": true,
"updated_online": true,
"performance": true,
"error_rate": true,
"event_count": false,
"auto_keep": false
},
"raw": {
"value": 343129453,
"string": "343129453"
}
and
.seagate_farm_log.page_5_reliability_statistics (truncated)`
},
"page_5_reliability_statistics": {
"attr_error_rate_raw": 12821624,
"error_rate_normalized": 71,
"error_rate_worst": 64,
"attr_seek_error_rate_raw": 343129453,
"seek_error_rate_normalized": 85,
"seek_error_rate_worst": 60,
"high_priority_unload_events": 1,
"helium_presure_trip": 0,
....
}
The interesting part is when we actually 'decode' the RAW value. Seagate's Seek Error Rate attribute consists of two parts -- a 16-bit count of seek errors in the uppermost 4 nibbles, and a 32-bit count of seeks in the lowermost 8 nibbles. So in order to get useable data, we need to:
convert to hex: 0000 1473 BD6D
Split the hex for the 16 and 32 bits: 0000 and 1473BD6D
So, right away we can see that in reality, this drive has had literally zero seek errors recorded! This is important, because as you can see in the output above, the normalized values are being reported as 85 in both possible outputs.
Question becomes yeah but why? So it turns out that the normalized values represent a logarithmic error rate. Doing the math:
What I think many assume is that the calculation should be something like: $\frac{0000}{1473BD6D}=0$
The issue is the normalization used, as mentioned, is logarithmic, and as such the "real" equation is indeterminate: $-10\log\frac{0000}{1473BD6D}$
So what they apparently do instead is: $-10\log\frac{0001}{1473BD6D}=85$
And so do we get the mildly panic inducing if you're not aware, report (it even highlights it in red! woo!). I'm not 100% sure the best way to handle this to be honest; might just have deal with self-maintaining an edited version of the script specifically for the RAW values that use this weird encoding or something.
So, I've been trying to figure out why the report consistently shows some rather alarming numbers for Seek Error Health. The drives I'm testing against do not actually list any errors at all, and yet both the FARM stats from Seagate, as well as the non-raw values reported in section 7 return extreme values as well. As an example, this is a section of the report on one of the drives in question:
Number
Status
Time
(ymdh)
Stop
Count
Retry
Count
Sectors
Events
Pending
Sectors
Uncorrectable
Sectors
Errors
Error
Health
Age (days)
Type
So what the heck is going on there? Digging in manually leads to some interesting conclusions. First, here's the raw SMART output on that drive, both standard ata_smart and the somewhat experimental still --log farm output from smartmontools 7.4 (Yes, this annoyed me enough that I backported it to SCALE).
.ata_smart_attributes.table[]and
.seagate_farm_log.page_5_reliability_statistics(truncated)`The interesting part is when we actually 'decode' the RAW value. Seagate's Seek Error Rate attribute consists of two parts -- a 16-bit count of seek errors in the uppermost 4 nibbles, and a 32-bit count of seeks in the lowermost 8 nibbles. So in order to get useable data, we need to:
convert to hex:
0000 1473 BD6DSplit the hex for the 16 and 32 bits:
0000and1473BD6DSo, right away we can see that in reality, this drive has had literally zero seek errors recorded! This is important, because as you can see in the output above, the normalized values are being reported as 85 in both possible outputs.
Question becomes yeah but why? So it turns out that the normalized values represent a logarithmic error rate. Doing the math:
What I think many assume is that the calculation should be something like:$\frac{0000}{1473BD6D}=0$
The issue is the normalization used, as mentioned, is logarithmic, and as such the "real" equation is indeterminate:$-10\log\frac{0000}{1473BD6D}$
So what they apparently do instead is:$-10\log\frac{0001}{1473BD6D}=85$
And so do we get the mildly panic inducing if you're not aware, report (it even highlights it in red! woo!). I'm not 100% sure the best way to handle this to be honest; might just have deal with self-maintaining an edited version of the script specifically for the RAW values that use this weird encoding or something.