-
Notifications
You must be signed in to change notification settings - Fork 1
Dealing with single blockdev failure. #439
Description
Hi team. I have a pool with 2 blockdevs.
$ stratis pool
Name Total Physical Properties UUID Alerts
stratis-pool-1 8.19 TiB / 2.94 TiB / 5.25 TiB ~Ca,~Cr fc7cc699-82e6-46cc-8fa9-05f23283cd12
$ stratis blockdev
Pool Name Device Node Physical Size Tier
stratis-pool-1 /dev/sda1 2.73 TiB DATA
stratis-pool-1 /dev/sdd 5.46 TiB DATA
$ stratis fs
Pool Name Name Size Created Device UUID
stratis-pool-1 SPOOL_1 4 TiB / 2.93 TiB / 1.07 TiB Feb 13 2022 16:18 /dev/stratis/stratis-pool-1/SPOOL_1 b74c306b-7377-45a8-894b-129d5a13357a
sdd was used to create the pool and initialized about 3TB of data, then added sda1.
No snapshot created ever.
Now, the problem is sda1 has failed.
$ dmesg
[ 9086.999366] device-mapper: thin: 253:18: unable to service pool target messages in READ_ONLY or FAIL mode
[ 9095.724451] device-mapper: thin: 253:18: unable to switch pool to write mode until repaired.
## dmesg is flooded by "device-mapper: thin: 253:18: unable to switch pool to write mode until repaired."
$ smartctl -a /dev/sda
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 35414 502363895
# 2 Short offline Completed: read failure 90% 35414 502363897
However, stratis reports nothing from this disk failure, no alerts issued.
The FS is inaccessible, ls command timeouts. If attempt to snapshot, following error was returned:
stratis fs snapshot stratis-pool-1 SPOOL_1 spool1_shot1
Execution failed:
stratisd failed to perform the operation that you requested. It returned the following information via the D-Bus: ERROR: failed to create spool1_shot1 snapshot for SPOOL_1 - DM Core error: low-level ioctl error due to nix error; header result: Some(DeviceInfo { version: Version { major: 4, minor: 45, patch: 0 }, data_size: 16384, data_start: 312, target_count: 0, open_count: 0, flags: (empty), event_nr: 0, dev: Device { major: 0, minor: 0 }, name: Some(DmNameBuf { inner: "stratis-1-private-fc7cc69982e646cc8fa905f23283cd12-thinpool-pool" }), uuid: None }), error: EOPNOTSUPP: Operation not supported on transport endpoint.
According to stratis-storage/stratisd#2570, current stratis data allocation strategy is similar to LVM, which data will be written to next disk(blockdev) after current disk is full. In this case, sdd is the "current" blockdev in pool.
It seems that stratis is not detecting the error from the failing spare disk (in this case sda1 which I guess it was never being utilized.)
Please test if this issue is reproducible.
If so, is there a option to grab the data out from the pool? The sdd one is healthy.
Thank you!
Some extra info:
OS: Fedora 35
$ stratis --version
3.0.1
$ stratis daemon version
3.0.4