Cleaning up Seagate weirdness

So, I've been trying to figure out why the report consistently shows some rather alarming numbers for Seek Error Health. The drives I'm testing against do not actually list any errors at all, and yet both the FARM stats from Seagate, as well as the non-raw values reported in section 7 return extreme values as well. As an example, this is a section of the report on one of the drives in question:

<table style="border:1px solid black;border-collapse:collapse">
<th style="text-align:center;width:100px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">Device</th>
<th style="text-align:center;width:130px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">Model</th>
<th style="text-align:center;width:130px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">Serial Number</th>
<th style="text-align:center;width:80px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">RPM</th>
<th style="text-align:center;width:100px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">Capacity</th>
<th style="text-align:center;width:80px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">SMART Status</th>
<th style="text-align:center;width:80px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">Temp</th>
<th style="text-align:center;width:120px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">Power-On Time (ymdh)</th>
<th style="text-align:center;width:80px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">Start Stop Count</th>
<th style="text-align:center;width:80px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">Spin Retry Count</th>
<th style="text-align:center;width:80px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">Realloc Sectors</th>
<th style="text-align:center;width:80px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">Realloc Events</th>
<th style="text-align:center;width:80px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">Current Pending Sectors</th>
<th style="text-align:center;width:120px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">Offline Uncorrectable Sectors</th>
<th style="text-align:center;width:80px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">CRC Errors</th>
<th style="text-align:center;width:80px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">Seek Error Health</th>
<th style="text-align:center;width:100px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">Last Test Age (days)</th>
<th style="text-align:center;width:100px;height:60px;border:1px solid black;border-collapse:collapse;font-family:courier">Last Test Type</th>
</tr>
<tr style="background-color:#ffffff">
<td style="text-align:center;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">/dev/sdg</td>
<td style="text-align:center;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">ST18000NM000J-2TV103</td>
<td style="text-align:center;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">ZR5AR716</td>
<td style="text-align:center;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">7200</td>
<td style="text-align:center;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">[18.0 TB]</td>
<td style="text-align:center;background-color:#c9ffcc;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">PASSED</td>
<td style="text-align:center;background-color:#ffffff;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">34°C</td>
<td style="text-align:center;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">1y 0m 14d 5h</td>
<td style="text-align:center;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">7</td>
<td style="text-align:center;background-color:#ffffff;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">0</td>
<td style="text-align:center;background-color:#ffffff;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">0</td>
<td style="text-align:center;background-color:#ffffff;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier"></td>
<td style="text-align:center;background-color:#ffffff;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">0</td>
<td style="text-align:center;background-color:#ffffff;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">0</td>
<td style="text-align:center;background-color:#ffffff;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">0</td>
<td style="text-align:center;background-color:#ffd6d6;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">85%</td>
<td style="text-align:center;background-color:#ffffff;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">0</td>
<td style="text-align:center;background-color:#ffffff;height:25px;border:1px solid black;border-collapse:collapse;font-family:courier">Extended offline</td>
</tr>
</table>

So what the heck is going on there? Digging in manually leads to some interesting conclusions. First, here's the raw SMART output on that drive, both standard ata_smart and the somewhat experimental still --log farm output from smartmontools 7.4 (Yes, this annoyed me enough that I backported it to SCALE).

`.ata_smart_attributes.table[]`
```
{
 "id": 7,
 "name": "Seek_Error_Rate",
 "value": 85,
 "worst": 60,
 "thresh": 45,
 "when_failed": "",
 "flags": {
 "value": 15,
 "string": "POSR-- ",
 "prefailure": true,
 "updated_online": true,
 "performance": true,
 "error_rate": true,
 "event_count": false,
 "auto_keep": false
 },
 "raw": {
 "value": 343129453,
 "string": "343129453"
 }
```

and 

`.seagate_farm_log.page_5_reliability_statistics` (truncated)`

```
},
 "page_5_reliability_statistics": {
 "attr_error_rate_raw": 12821624,
 "error_rate_normalized": 71,
 "error_rate_worst": 64,
 "attr_seek_error_rate_raw": 343129453,
 "seek_error_rate_normalized": 85,
 "seek_error_rate_worst": 60,
 "high_priority_unload_events": 1,
 "helium_presure_trip": 0,
....
}
```

The interesting part is when we actually 'decode' the RAW value. Seagate's Seek Error Rate attribute consists of two parts -- a 16-bit count of seek errors in the uppermost 4 nibbles, and a 32-bit count of seeks in the lowermost 8 nibbles. So in order to get useable data, we need to: 

convert to hex: `0000 1473 BD6D`
Split the hex for the 16 and 32 bits: `0000` and `1473BD6D`

So, right away we can see that in reality, this drive has had literally **zero seek errors recorded**! This is important, because as you can see in the output above, the normalized values are being reported as 85 in both possible outputs.

Question becomes yeah but why? So it turns out that the normalized values represent a logarithmic error rate. Doing the math:

What I think many assume is that the calculation should be something like: $\frac{0000}{1473BD6D}=0$

The issue is the normalization used, as mentioned, is logarithmic, and as such the "real" equation is indeterminate: $-10\log\frac{0000}{1473BD6D}$

So what they apparently do instead is: $-10\log\frac{0001}{1473BD6D}=85$

And so do we get the mildly panic inducing if you're not aware, report (it even highlights it in red! woo!). I'm not 100% sure the best way to handle this to be honest; might just have deal with self-maintaining an edited version of the script specifically for the RAW values that use this weird encoding or something.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleaning up Seagate weirdness #31

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Cleaning up Seagate weirdness #31

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions