Skip to content

ZFSPoolDiskChange

Luca Finzi Contini edited this page Nov 2, 2024 · 17 revisions

ZFS Pool Disk Change

This is the most serious topic in this whole wiki actually :)

And this is the area where in my opinion ZFS really shines.

Let's enumerate some scenarios:

  • One of the disks in my raidz vdev shows signs of malfunction and needs to be replaced

    This is a bad situation that could compromise all of your data if you do not act reasonably fast. Fortunately, raidz vdev allows one disk to fail and will preserve all your data anyway. The pool would become DEGRADED, your data would be accessible, and you should get a new disk as soon as possible. Of course, you need to get a disk that is at least of the same size of the broken one.

  • I just want to add more disk space

    This seems to be easy, but it is a bit tricky with ZFS.

    Long story short: in a raidz vdev, you need to upgrade ALL of your disks before you see any increase in disk space availability.

How to Change ZFS Disks on the raidz Local Server

You might want to change one or all your raidz disks for various reasons, for example in case of a single disk failure to replace the faulty drive, or to upgrade to bigger storage.

Adding a new disk and maintaining the previous one

The first command involved is

sudo zpool offline <pool_name> <disk_id_to_be_removed>

which takes the disk with id disk_id_to_be_removed offline so it will be not part of the pool's operations.

This command shall then be followed by this one:

sudo zpool replace <pool_name> <disk_id_to_be_removed> <disk_id_to_be_added>

On my nas server, I created a new disk with ID scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016 and attached to the (virtual) machine. I can do this because it is a virtual machine; on real machines, you might add a new disk to the machine only if you have an available disk port.

Here is the list ordered by the last column:

user@nas:~$ ls -la /dev/disk/by-id/ | grep scsi-SATA | sort -k 11
lrwxrwxrwx  1 root root    9 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VBfbc51701-74fd9574 -> ../../sda
lrwxrwxrwx  1 root root   10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VBfbc51701-74fd9574-part1 -> ../../sda1
lrwxrwxrwx  1 root root   10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VBfbc51701-74fd9574-part2 -> ../../sda2
lrwxrwxrwx  1 root root   10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VBfbc51701-74fd9574-part3 -> ../../sda3
lrwxrwxrwx  1 root root    9 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b -> ../../sdb
lrwxrwxrwx  1 root root   10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b-part1 -> ../../sdb1
lrwxrwxrwx  1 root root   10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b-part9 -> ../../sdb9
lrwxrwxrwx  1 root root    9 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70 -> ../../sdc
lrwxrwxrwx  1 root root   10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70-part1 -> ../../sdc1
lrwxrwxrwx  1 root root   10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70-part9 -> ../../sdc9
lrwxrwxrwx  1 root root    9 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2 -> ../../sdd
lrwxrwxrwx  1 root root   10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2-part1 -> ../../sdd1
lrwxrwxrwx  1 root root   10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2-part9 -> ../../sdd9
lrwxrwxrwx  1 root root    9 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016 -> ../../sde
user@nas:~$

sda is the system disk; sdb through sdd are ZFS disk that are part of the pool; sde is the new disk.

As you can see, sde is 20GB whereas the previous disks are all 10GB.

user@nas:~$ lsblk
NAME                      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0                       7:0    0  55.4M  1 loop /snap/core18/2846
loop1                       7:1    0  55.7M  1 loop /snap/core18/2829
loop2                       7:2    0 326.3M  1 loop /snap/nextcloud/44729
loop3                       7:3    0 309.1M  1 loop /snap/nextcloud/44812
loop4                       7:4    0  38.8M  1 loop /snap/snapd/21759
sda                         8:0    0     8G  0 disk
├─sda1                      8:1    0     1M  0 part
├─sda2                      8:2    0   1.8G  0 part /boot
└─sda3                      8:3    0   6.2G  0 part
  └─ubuntu--vg-ubuntu--lv 252:0    0   6.2G  0 lvm  /
sdb                         8:16   0    10G  0 disk
├─sdb1                      8:17   0    10G  0 part
└─sdb9                      8:25   0     8M  0 part
sdc                         8:32   0    10G  0 disk
├─sdc1                      8:33   0    10G  0 part
└─sdc9                      8:41   0     8M  0 part
sdd                         8:48   0    10G  0 disk
├─sdd1                      8:49   0    10G  0 part
└─sdd9                      8:57   0     8M  0 part
sde                         8:64   0    20G  0 disk
sr0                        11:0    1  1024M  0 rom
user@nas:~$

Oh, an important note: it is VERY important to use disk IDs and not use device names like sda, because simply we cannot rely on those disk names being consistent across boots of the system. I have spoken.

This is the initial state:

user@nas:~$ zpool status
  pool: zfspool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:02 with 0 errors on Sun Jul 28 21:20:27 2024
config:

        NAME                                             STATE     READ WRITE CKSUM
        zfspool                                          ONLINE       0     0     0
          raidz1-0                                       ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b  ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70  ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2  ONLINE       0     0     0

errors: No known data errors
user@nas:~$

Now let's put sdb aka scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b offline.

user@nas:~$ sudo zpool offline zfspool scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b
user@nas:~$ zpool status
  pool: zfspool
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 0B in 00:00:02 with 0 errors on Sun Jul 28 21:20:27 2024
config:

        NAME                                             STATE     READ WRITE CKSUM
        zfspool                                          DEGRADED     0     0     0
          raidz1-0                                       DEGRADED     0     0     0
            scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b  OFFLINE      0     0     0
            scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70  ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2  ONLINE       0     0     0

errors: No known data errors
user@nas:~$

We immediately see that the pool got to the DEGRADED state. Moreover, ZFS suggests us what to do: either put the disk back online, or use the command zpool replace.

Be aware of the fact that, immediately after issuing the zpool replace, the system will begin the resilvering operation, which aligns data on the newly added disk. That could easily last hours on a, say, multiple Terabytes disk. You should just let it do its work.

So let's replace scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b with the newly added scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016:

user@nas:~$ sudo zpool replace zfspool scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016
user@nas:~$ zpool status
  pool: zfspool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Oct 30 00:05:55 2024
        837M / 837M scanned, 533M / 836M issued at 133M/s
        169M resilvered, 63.76% done, 00:00:02 to go
config:

        NAME                                               STATE     READ WRITE CKSUM
        zfspool                                            DEGRADED     0     0     0
          raidz1-0                                         DEGRADED     0     0     0
            replacing-0                                    DEGRADED     0     0     0
              scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b  OFFLINE      0     0     0
              scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016  ONLINE       0     0     0  (resilvering)
            scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70    ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2    ONLINE       0     0     0

errors: No known data errors
user@nas:~$

Here we see that the system is actually replacing the disk and starting the resilvering operation on scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016.

At the end of the operation there is a simple "normal" status, with the new disk:

user@nas:~$ zpool status
  pool: zfspool
 state: ONLINE
  scan: resilvered 281M in 00:00:06 with 0 errors on Wed Oct 30 00:06:01 2024
config:

        NAME                                             STATE     READ WRITE CKSUM
        zfspool                                          ONLINE       0     0     0
          raidz1-0                                       ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016  ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70  ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2  ONLINE       0     0     0

errors: No known data errors
user@nas:~$

About available disk space, it has not changed:

user@nas:~$ zfs list
NAME                USED  AVAIL  REFER  MOUNTPOINT
zfspool             558M  18.5G   128K  /mnt/raid
zfspool/Documents   555M  18.5G   421M  /mnt/raid/Documents

And also, all our snapshots are there!

user@nas:~$ zfs list -t snapshot
NAME                                    USED  AVAIL  REFER  MOUNTPOINT
zfspool/Documents@2024.07.26-00.48.47   346K      -   474M  -
zfspool/Documents@2024.10.27-21.59.11   128K      -   474M  -
zfspool/Documents@2024.10.27-22.13.33   128K      -   555M  -
zfspool/Documents@2024.10.27-22.15.19   117K      -   421M  -
zfspool/Documents@2024.10.27-22.18.59  95.9K      -   421M  -
user@nas:~$

Now we could experiment with physically removing a disk and inserting a new one in its place.

Swapping a drive with a new one.

Let's remove a drive: this simulates a real failure, or the situation when we have no spare SATA ports to use for disk replacing.

I removed the disk scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70, which was 10GB in size, and added a new 20GB disk.

Let's start up the nas server. As soon as it boots up, we check the zpool status:

user@nas:~$ zpool status
  pool: zfspool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: resilvered 281M in 00:00:06 with 0 errors on Wed Oct 30 00:06:01 2024
config:

        NAME                                             STATE     READ WRITE CKSUM
        zfspool                                          DEGRADED     0     0     0
          raidz1-0                                       DEGRADED     0     0     0
            scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016  ONLINE       0     0     0
            14696350533427175973                         UNAVAIL      0     0     0  was /dev/disk/by-id/scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70-part1
            scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2  ONLINE       0     0     0

errors: No known data errors
user@nas:~$

As you can see, loads of useful information are provided by the ZFS commands! zpool status even tells us which specific disk was expected to sit as the 2nd disk in the pool, but was UNAVAILable !

Now the lsblk command tells us that there is a sdb with no partitions.

user@nas:~$ lsblk
NAME                      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0                       7:0    0  55.4M  1 loop /snap/core18/2846
loop1                       7:1    0  55.7M  1 loop /snap/core18/2829
loop2                       7:2    0 326.3M  1 loop /snap/nextcloud/44729
loop3                       7:3    0 309.1M  1 loop /snap/nextcloud/44812
loop4                       7:4    0  38.8M  1 loop /snap/snapd/21759
sda                         8:0    0     8G  0 disk
├─sda1                      8:1    0     1M  0 part
├─sda2                      8:2    0   1.8G  0 part /boot
└─sda3                      8:3    0   6.2G  0 part
  └─ubuntu--vg-ubuntu--lv 252:0    0   6.2G  0 lvm  /
sdb                         8:16   0    20G  0 disk
sdc                         8:32   0    10G  0 disk
├─sdc1                      8:33   0    10G  0 part
└─sdc9                      8:41   0     8M  0 part
sdd                         8:48   0    20G  0 disk
├─sdd1                      8:49   0    20G  0 part
└─sdd9                      8:57   0     8M  0 part
sr0                        11:0    1  1024M  0 rom
user@nas:~$

That's interesting. Let's check it also in the disk/by-id list:

user@nas:~$ ls -la /dev/disk/by-id/
total 0
drwxr-xr-x  2 root root 1000 Nov  2 17:49 .
drwxr-xr-x 10 root root  200 Nov  2 17:49 ..
lrwxrwxrwx  1 root root    9 Nov  2 17:49 ata-VBOX_CD-ROM_VB2-01700376 -> ../../sr0
lrwxrwxrwx  1 root root    9 Nov  2 17:49 ata-VBOX_HARDDISK_VB1600af16-50c06ff2 -> ../../sdc
[...]
lrwxrwxrwx  1 root root    9 Nov  2 17:49 scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2 -> ../../sdc
lrwxrwxrwx  1 root root   10 Nov  2 17:49 scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2-part1 -> ../../sdc1
lrwxrwxrwx  1 root root   10 Nov  2 17:49 scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2-part9 -> ../../sdc9
lrwxrwxrwx  1 root root    9 Nov  2 17:49 scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016 -> ../../sdd
lrwxrwxrwx  1 root root   10 Nov  2 17:49 scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016-part1 -> ../../sdd1
lrwxrwxrwx  1 root root   10 Nov  2 17:49 scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016-part9 -> ../../sdd9
lrwxrwxrwx  1 root root    9 Nov  2 17:49 scsi-SATA_VBOX_HARDDISK_VBb5503a9e-16b49d75 -> ../../sdb
lrwxrwxrwx  1 root root    9 Nov  2 17:49 scsi-SATA_VBOX_HARDDISK_VBfbc51701-74fd9574 -> ../../sda
lrwxrwxrwx  1 root root   10 Nov  2 17:49 scsi-SATA_VBOX_HARDDISK_VBfbc51701-74fd9574-part1 -> ../../sda1
lrwxrwxrwx  1 root root   10 Nov  2 17:49 scsi-SATA_VBOX_HARDDISK_VBfbc51701-74fd9574-part2 -> ../../sda2
lrwxrwxrwx  1 root root   10 Nov  2 17:49 scsi-SATA_VBOX_HARDDISK_VBfbc51701-74fd9574-part3 -> ../../sda3
user@nas:~$

OK, so we found that the only disk without partitions is /dev/disk/by-id/scsi-SATA_VBOX_HARDDISK_VBb5503a9e-16b49d75. Let's use this one to replace the missing disk.

user@nas:~$ sudo zpool replace zfspool scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70 scsi-SATA_VBOX_HARDDISK_VBb5503a9e-16b49d75
user@nas:~$ zpool status
  pool: zfspool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Nov  2 18:10:38 2024
        837M / 837M scanned, 658M / 837M issued at 164M/s
        205M resilvered, 78.59% done, 00:00:01 to go
config:

        NAME                                               STATE     READ WRITE CKSUM
        zfspool                                            DEGRADED     0     0     0
          raidz1-0                                         DEGRADED     0     0     0
            scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016    ONLINE       0     0     0
            replacing-1                                    DEGRADED     0     0     0
              14696350533427175973                         UNAVAIL      0     0     0  was /dev/disk/by-id/scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70-part1
              scsi-SATA_VBOX_HARDDISK_VBb5503a9e-16b49d75  ONLINE       0     0     0  (resilvering)
            scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2    ONLINE       0     0     0

errors: No known data errors
user@nas:~$

... and after some lengthy resilvering hours we would finally get:

user@nas:~$ zpool status
  pool: zfspool
 state: ONLINE
  scan: resilvered 281M in 00:00:05 with 0 errors on Sat Nov  2 18:10:43 2024
config:

        NAME                                             STATE     READ WRITE CKSUM
        zfspool                                          ONLINE       0     0     0
          raidz1-0                                       ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016  ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VBb5503a9e-16b49d75  ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2  ONLINE       0     0     0

errors: No known data errors
user@nas:~$

Now the last disk replacement, the most important one: the one that will trigger the pool resizing!

Changing the last disk and augmenting pool capacity

To augment your vdev capacity, you need to perform the disk change operation N-1 times where N is the number of disks in the pool; before performing the Nth substitution, you need to modify a property of the pool: autoexpand. Let's get to it!

We detach the last disk from the system and replace it with the new device.

Before actually replacing the disk in the pool, we need to set the autoexpand property:

user@nas:~$ sudo zpool set autoexpand=on zfspool
user@nas:~$

Now we launch the replace operation:

user@nas:~$ sudo zpool replace zfspool scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2 scsi-SATA_VBOX_HARDDISK_VB137e6d9d-217cfd0a
user@nas:~$ zpool status
  pool: zfspool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Nov  2 20:52:37 2024
        837M / 837M scanned, 433M / 837M issued at 144M/s
        129M resilvered, 51.73% done, 00:00:02 to go
config:

        NAME                                               STATE     READ WRITE CKSUM
        zfspool                                            DEGRADED     0     0     0
          raidz1-0                                         DEGRADED     0     0     0
            scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016    ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VBb5503a9e-16b49d75    ONLINE       0     0     0
            replacing-2                                    DEGRADED     0     0     0
              11050710442926509085                         UNAVAIL      0     0     0  was /dev/disk/by-id/scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2-part1
              scsi-SATA_VBOX_HARDDISK_VB137e6d9d-217cfd0a  ONLINE       0     0     0  (resilvering)

errors: No known data errors
user@nas:~$

So the silvering operation has started!

We will wait until the resilvering has completed to check for disk space...

user@nas:~$ zpool status
  pool: zfspool
 state: ONLINE
  scan: resilvered 281M in 00:00:04 with 0 errors on Sat Nov  2 20:52:41 2024
config:

        NAME                                             STATE     READ WRITE CKSUM
        zfspool                                          ONLINE       0     0     0
          raidz1-0                                       ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016  ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VBb5503a9e-16b49d75  ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VB137e6d9d-217cfd0a  ONLINE       0     0     0

errors: No known data errors
user@nas:~$

Now the moment we all are waiting for: let's check for the pool capacity!

user@nas:~$ zfs list
NAME                USED  AVAIL  REFER  MOUNTPOINT
zfspool             558M  37.9G   128K  /mnt/raid
zfspool/Documents   555M  37.9G   421M  /mnt/raid/Documents

YAY! Double the disk space! The whole thing worked!

One last command: we need to set the autoexpand property to off otherwise the system would waste time and resources and useless logs monitoring for auto expand, so let's turn that off:

user@nas:~$ sudo zpool set autoexpand=off zfspool
user@nas:~$ zpool get autoexpand
NAME     PROPERTY    VALUE   SOURCE
zfspool  autoexpand  off     default
user@nas:~$

We made it! We changed all 3 disks of the zfspool disk pool, and the capacity is finally that of 2 of the 3 bigger disks: about 2 x 20GB, so 40GB.

Now let's check another topic: My Backup Disk is almost full!

Clone this wiki locally