Dealing with single blockdev failure.

Hi team. I have a pool with 2 blockdevs. 
```
$ stratis pool
Name                             Total Physical   Properties                                   UUID   Alerts
stratis-pool-1   8.19 TiB / 2.94 TiB / 5.25 TiB      ~Ca,~Cr   fc7cc699-82e6-46cc-8fa9-05f23283cd12

$ stratis blockdev
Pool Name        Device Node   Physical Size   Tier
stratis-pool-1   /dev/sda1          2.73 TiB   DATA
stratis-pool-1   /dev/sdd           5.46 TiB   DATA

$ stratis fs
Pool Name        Name      Size                          Created             Device                                UUID
stratis-pool-1   SPOOL_1   4 TiB / 2.93 TiB / 1.07 TiB   Feb 13 2022 16:18   /dev/stratis/stratis-pool-1/SPOOL_1   b74c306b-7377-45a8-894b-129d5a13357a
```
`sdd` was used to create the pool and initialized about 3TB of data, then added `sda1`.

No snapshot created ever.

Now, the problem is `sda1` has failed.

```
$ dmesg
[ 9086.999366] device-mapper: thin: 253:18: unable to service pool target messages in READ_ONLY or FAIL mode
[ 9095.724451] device-mapper: thin: 253:18: unable to switch pool to write mode until repaired.

## dmesg is flooded by "device-mapper: thin: 253:18: unable to switch pool to write mode until repaired."

$ smartctl -a /dev/sda
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     35414         502363895
# 2  Short offline       Completed: read failure       90%     35414         502363897
```

However, stratis reports nothing from this disk failure, no alerts issued.

The FS is inaccessible, `ls` command timeouts. If attempt to snapshot, following error was returned:
```
stratis fs snapshot stratis-pool-1 SPOOL_1 spool1_shot1
Execution failed:
stratisd failed to perform the operation that you requested. It returned the following information via the D-Bus: ERROR: failed to create spool1_shot1 snapshot for SPOOL_1 - DM Core error: low-level ioctl error due to nix error; header result: Some(DeviceInfo { version: Version { major: 4, minor: 45, patch: 0 }, data_size: 16384, data_start: 312, target_count: 0, open_count: 0, flags: (empty), event_nr: 0, dev: Device { major: 0, minor: 0 }, name: Some(DmNameBuf { inner: "stratis-1-private-fc7cc69982e646cc8fa905f23283cd12-thinpool-pool" }), uuid: None }), error: EOPNOTSUPP: Operation not supported on transport endpoint.
```

According to stratis-storage/stratisd#2570, current stratis data allocation strategy is similar to LVM, which data will be written to next disk(blockdev) after current disk is full. In this case, `sdd` is the "current" blockdev in pool.

It seems that stratis is not detecting the error from the failing *spare* disk (in this case `sda1` which I guess it was never being utilized.)

Please test if this issue is reproducible.

If so, is there a option to grab the data out from the pool? The `sdd` one is healthy.

Thank you!

Some extra info:
```
OS: Fedora 35
$ stratis --version
3.0.1
$ stratis daemon version
3.0.4
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with single blockdev failure. #439

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dealing with single blockdev failure. #439

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions