Skip to content

Dealing with single blockdev failure. #439

@star-39

Description

@star-39

Hi team. I have a pool with 2 blockdevs.

$ stratis pool
Name                             Total Physical   Properties                                   UUID   Alerts
stratis-pool-1   8.19 TiB / 2.94 TiB / 5.25 TiB      ~Ca,~Cr   fc7cc699-82e6-46cc-8fa9-05f23283cd12

$ stratis blockdev
Pool Name        Device Node   Physical Size   Tier
stratis-pool-1   /dev/sda1          2.73 TiB   DATA
stratis-pool-1   /dev/sdd           5.46 TiB   DATA

$ stratis fs
Pool Name        Name      Size                          Created             Device                                UUID
stratis-pool-1   SPOOL_1   4 TiB / 2.93 TiB / 1.07 TiB   Feb 13 2022 16:18   /dev/stratis/stratis-pool-1/SPOOL_1   b74c306b-7377-45a8-894b-129d5a13357a

sdd was used to create the pool and initialized about 3TB of data, then added sda1.

No snapshot created ever.

Now, the problem is sda1 has failed.

$ dmesg
[ 9086.999366] device-mapper: thin: 253:18: unable to service pool target messages in READ_ONLY or FAIL mode
[ 9095.724451] device-mapper: thin: 253:18: unable to switch pool to write mode until repaired.

## dmesg is flooded by "device-mapper: thin: 253:18: unable to switch pool to write mode until repaired."

$ smartctl -a /dev/sda
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     35414         502363895
# 2  Short offline       Completed: read failure       90%     35414         502363897

However, stratis reports nothing from this disk failure, no alerts issued.

The FS is inaccessible, ls command timeouts. If attempt to snapshot, following error was returned:

stratis fs snapshot stratis-pool-1 SPOOL_1 spool1_shot1
Execution failed:
stratisd failed to perform the operation that you requested. It returned the following information via the D-Bus: ERROR: failed to create spool1_shot1 snapshot for SPOOL_1 - DM Core error: low-level ioctl error due to nix error; header result: Some(DeviceInfo { version: Version { major: 4, minor: 45, patch: 0 }, data_size: 16384, data_start: 312, target_count: 0, open_count: 0, flags: (empty), event_nr: 0, dev: Device { major: 0, minor: 0 }, name: Some(DmNameBuf { inner: "stratis-1-private-fc7cc69982e646cc8fa905f23283cd12-thinpool-pool" }), uuid: None }), error: EOPNOTSUPP: Operation not supported on transport endpoint.

According to stratis-storage/stratisd#2570, current stratis data allocation strategy is similar to LVM, which data will be written to next disk(blockdev) after current disk is full. In this case, sdd is the "current" blockdev in pool.

It seems that stratis is not detecting the error from the failing spare disk (in this case sda1 which I guess it was never being utilized.)

Please test if this issue is reproducible.

If so, is there a option to grab the data out from the pool? The sdd one is healthy.

Thank you!

Some extra info:

OS: Fedora 35
$ stratis --version
3.0.1
$ stratis daemon version
3.0.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions