-
Notifications
You must be signed in to change notification settings - Fork 0
ZFSPoolDiskChange
This is the most serious topic in this whole wiki actually :)
And this is the area where in my opinion ZFS really shines.
Let's enumerate some scenarios:
-
One of the disks in my
raidzvdev shows signs of malfunction and needs to be replacedThis is a bad situation that could compromise all of your data if you do not act reasonably fast. Fortunately,
raidzvdev allows one disk to fail and will preserve all your data anyway. The pool would becomeDEGRADED, your data would be accessible, and you should get a new disk as soon as possible. Of course, you need to get a disk that is at least of the same size of the broken one. -
I just want to add more disk space
This seems to be easy, but it is a bit tricky with ZFS.
Long story short: in a
raidzvdev, you need to upgrade ALL of your disks before you see any increase in disk space availability.
You might want to change one or all your raidz disks for various reasons, for example in case of a single disk failure to replace the faulty drive, or to upgrade to bigger storage.
The first command involved is
sudo zpool offline <pool_name> <disk_id_to_be_removed>
which takes the disk with id disk_id_to_be_removed offline so it will be not part of the pool's operations.
This command shall then be followed by this one:
sudo zpool replace <pool_name> <disk_id_to_be_removed> <disk_id_to_be_added>
On my nas server, I created a new disk with ID scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016 and attached to the (virtual) machine. I can do this because it is a virtual machine; on real machines, you might add a new disk to the machine only if you have an available disk port.
Here is the list ordered by the last column:
user@nas:~$ ls -la /dev/disk/by-id/ | grep scsi-SATA | sort -k 11
lrwxrwxrwx 1 root root 9 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VBfbc51701-74fd9574 -> ../../sda
lrwxrwxrwx 1 root root 10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VBfbc51701-74fd9574-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VBfbc51701-74fd9574-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VBfbc51701-74fd9574-part3 -> ../../sda3
lrwxrwxrwx 1 root root 9 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b -> ../../sdb
lrwxrwxrwx 1 root root 10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b-part9 -> ../../sdb9
lrwxrwxrwx 1 root root 9 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70 -> ../../sdc
lrwxrwxrwx 1 root root 10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70-part9 -> ../../sdc9
lrwxrwxrwx 1 root root 9 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2 -> ../../sdd
lrwxrwxrwx 1 root root 10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2-part9 -> ../../sdd9
lrwxrwxrwx 1 root root 9 Oct 29 23:46 scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016 -> ../../sde
user@nas:~$
sda is the system disk; sdb through sdd are ZFS disk that are part of the pool; sde is the new disk.
As you can see, sde is 20GB whereas the previous disks are all 10GB.
user@nas:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 55.4M 1 loop /snap/core18/2846
loop1 7:1 0 55.7M 1 loop /snap/core18/2829
loop2 7:2 0 326.3M 1 loop /snap/nextcloud/44729
loop3 7:3 0 309.1M 1 loop /snap/nextcloud/44812
loop4 7:4 0 38.8M 1 loop /snap/snapd/21759
sda 8:0 0 8G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 1.8G 0 part /boot
└─sda3 8:3 0 6.2G 0 part
└─ubuntu--vg-ubuntu--lv 252:0 0 6.2G 0 lvm /
sdb 8:16 0 10G 0 disk
├─sdb1 8:17 0 10G 0 part
└─sdb9 8:25 0 8M 0 part
sdc 8:32 0 10G 0 disk
├─sdc1 8:33 0 10G 0 part
└─sdc9 8:41 0 8M 0 part
sdd 8:48 0 10G 0 disk
├─sdd1 8:49 0 10G 0 part
└─sdd9 8:57 0 8M 0 part
sde 8:64 0 20G 0 disk
sr0 11:0 1 1024M 0 rom
user@nas:~$
Oh, an important note: it is VERY important to use disk IDs and not use device names like sda, because simply we cannot rely on those disk names being consistent across boots of the system. I have spoken.
This is the initial state:
user@nas:~$ zpool status
pool: zfspool
state: ONLINE
scan: scrub repaired 0B in 00:00:02 with 0 errors on Sun Jul 28 21:20:27 2024
config:
NAME STATE READ WRITE CKSUM
zfspool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b ONLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70 ONLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2 ONLINE 0 0 0
errors: No known data errors
user@nas:~$
Now let's put sdb aka scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b offline.
user@nas:~$ sudo zpool offline zfspool scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b
user@nas:~$ zpool status
pool: zfspool
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0B in 00:00:02 with 0 errors on Sun Jul 28 21:20:27 2024
config:
NAME STATE READ WRITE CKSUM
zfspool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b OFFLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70 ONLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2 ONLINE 0 0 0
errors: No known data errors
user@nas:~$
We immediately see that the pool got to the DEGRADED state. Moreover, ZFS suggests us what to do: either put the disk back online, or use the command zpool replace.
Be aware of the fact that, immediately after issuing the zpool replace, the system will begin the resilvering operation, which aligns data on the newly added disk. That could easily last hours on a, say, multiple Terabytes disk. You should just let it do its work.
So let's replace scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b with the newly added scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016:
user@nas:~$ sudo zpool replace zfspool scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016
user@nas:~$ zpool status
pool: zfspool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Oct 30 00:05:55 2024
837M / 837M scanned, 533M / 836M issued at 133M/s
169M resilvered, 63.76% done, 00:00:02 to go
config:
NAME STATE READ WRITE CKSUM
zfspool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
replacing-0 DEGRADED 0 0 0
scsi-SATA_VBOX_HARDDISK_VBdecad012-4b4a891b OFFLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016 ONLINE 0 0 0 (resilvering)
scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70 ONLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2 ONLINE 0 0 0
errors: No known data errors
user@nas:~$
Here we see that the system is actually replacing the disk and starting the resilvering operation on scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016.
At the end of the operation there is a simple "normal" status, with the new disk:
user@nas:~$ zpool status
pool: zfspool
state: ONLINE
scan: resilvered 281M in 00:00:06 with 0 errors on Wed Oct 30 00:06:01 2024
config:
NAME STATE READ WRITE CKSUM
zfspool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016 ONLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70 ONLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2 ONLINE 0 0 0
errors: No known data errors
user@nas:~$
About available disk space, it has not changed:
user@nas:~$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
zfspool 558M 18.5G 128K /mnt/raid
zfspool/Documents 555M 18.5G 421M /mnt/raid/Documents
And also, all our snapshots are there!
user@nas:~$ zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
zfspool/Documents@2024.07.26-00.48.47 346K - 474M -
zfspool/Documents@2024.10.27-21.59.11 128K - 474M -
zfspool/Documents@2024.10.27-22.13.33 128K - 555M -
zfspool/Documents@2024.10.27-22.15.19 117K - 421M -
zfspool/Documents@2024.10.27-22.18.59 95.9K - 421M -
user@nas:~$
Now we could experiment with physically removing a disk and inserting a new one in its place.
Let's remove a drive: this simulates a real failure, or the situation when we have no spare SATA ports to use for disk replacing.
I removed the disk scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70, which was 10GB in size, and added a new 20GB disk.
Let's start up the nas server. As soon as it boots up, we check the zpool status:
user@nas:~$ zpool status
pool: zfspool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
scan: resilvered 281M in 00:00:06 with 0 errors on Wed Oct 30 00:06:01 2024
config:
NAME STATE READ WRITE CKSUM
zfspool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016 ONLINE 0 0 0
14696350533427175973 UNAVAIL 0 0 0 was /dev/disk/by-id/scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70-part1
scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2 ONLINE 0 0 0
errors: No known data errors
user@nas:~$
As you can see, loads of useful information are provided by the ZFS commands! zpool status even tells us which specific disk was expected to sit as the 2nd disk in the pool, but was UNAVAILable !
Now the lsblk command tells us that there is a sdb with no partitions.
user@nas:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 55.4M 1 loop /snap/core18/2846
loop1 7:1 0 55.7M 1 loop /snap/core18/2829
loop2 7:2 0 326.3M 1 loop /snap/nextcloud/44729
loop3 7:3 0 309.1M 1 loop /snap/nextcloud/44812
loop4 7:4 0 38.8M 1 loop /snap/snapd/21759
sda 8:0 0 8G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 1.8G 0 part /boot
└─sda3 8:3 0 6.2G 0 part
└─ubuntu--vg-ubuntu--lv 252:0 0 6.2G 0 lvm /
sdb 8:16 0 20G 0 disk
sdc 8:32 0 10G 0 disk
├─sdc1 8:33 0 10G 0 part
└─sdc9 8:41 0 8M 0 part
sdd 8:48 0 20G 0 disk
├─sdd1 8:49 0 20G 0 part
└─sdd9 8:57 0 8M 0 part
sr0 11:0 1 1024M 0 rom
user@nas:~$
That's interesting. Let's check it also in the disk/by-id list:
user@nas:~$ ls -la /dev/disk/by-id/
total 0
drwxr-xr-x 2 root root 1000 Nov 2 17:49 .
drwxr-xr-x 10 root root 200 Nov 2 17:49 ..
lrwxrwxrwx 1 root root 9 Nov 2 17:49 ata-VBOX_CD-ROM_VB2-01700376 -> ../../sr0
lrwxrwxrwx 1 root root 9 Nov 2 17:49 ata-VBOX_HARDDISK_VB1600af16-50c06ff2 -> ../../sdc
[...]
lrwxrwxrwx 1 root root 9 Nov 2 17:49 scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2 -> ../../sdc
lrwxrwxrwx 1 root root 10 Nov 2 17:49 scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Nov 2 17:49 scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2-part9 -> ../../sdc9
lrwxrwxrwx 1 root root 9 Nov 2 17:49 scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016 -> ../../sdd
lrwxrwxrwx 1 root root 10 Nov 2 17:49 scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 Nov 2 17:49 scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016-part9 -> ../../sdd9
lrwxrwxrwx 1 root root 9 Nov 2 17:49 scsi-SATA_VBOX_HARDDISK_VBb5503a9e-16b49d75 -> ../../sdb
lrwxrwxrwx 1 root root 9 Nov 2 17:49 scsi-SATA_VBOX_HARDDISK_VBfbc51701-74fd9574 -> ../../sda
lrwxrwxrwx 1 root root 10 Nov 2 17:49 scsi-SATA_VBOX_HARDDISK_VBfbc51701-74fd9574-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Nov 2 17:49 scsi-SATA_VBOX_HARDDISK_VBfbc51701-74fd9574-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Nov 2 17:49 scsi-SATA_VBOX_HARDDISK_VBfbc51701-74fd9574-part3 -> ../../sda3
user@nas:~$
OK, so we found that the only disk without partitions is /dev/disk/by-id/scsi-SATA_VBOX_HARDDISK_VBb5503a9e-16b49d75. Let's use this one to replace the missing disk.
user@nas:~$ sudo zpool replace zfspool scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70 scsi-SATA_VBOX_HARDDISK_VBb5503a9e-16b49d75
user@nas:~$ zpool status
pool: zfspool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sat Nov 2 18:10:38 2024
837M / 837M scanned, 658M / 837M issued at 164M/s
205M resilvered, 78.59% done, 00:00:01 to go
config:
NAME STATE READ WRITE CKSUM
zfspool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016 ONLINE 0 0 0
replacing-1 DEGRADED 0 0 0
14696350533427175973 UNAVAIL 0 0 0 was /dev/disk/by-id/scsi-SATA_VBOX_HARDDISK_VB9568dd84-31039e70-part1
scsi-SATA_VBOX_HARDDISK_VBb5503a9e-16b49d75 ONLINE 0 0 0 (resilvering)
scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2 ONLINE 0 0 0
errors: No known data errors
user@nas:~$
... and after some lengthy resilvering hours we would finally get:
user@nas:~$ zpool status
pool: zfspool
state: ONLINE
scan: resilvered 281M in 00:00:05 with 0 errors on Sat Nov 2 18:10:43 2024
config:
NAME STATE READ WRITE CKSUM
zfspool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016 ONLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VBb5503a9e-16b49d75 ONLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2 ONLINE 0 0 0
errors: No known data errors
user@nas:~$
Now the last disk replacement, the most important one: the one that will trigger the pool resizing!
To augment your vdev capacity, you need to perform the disk change operation N-1 times where N is the number of disks in the pool; before performing the Nth substitution, you need to modify a property of the pool: autoexpand. Let's get to it!
We detach the last disk from the system and replace it with the new device.
Before actually replacing the disk in the pool, we need to set the autoexpand property:
user@nas:~$ sudo zpool set autoexpand=on zfspool
user@nas:~$
Now we launch the replace operation:
user@nas:~$ sudo zpool replace zfspool scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2 scsi-SATA_VBOX_HARDDISK_VB137e6d9d-217cfd0a
user@nas:~$ zpool status
pool: zfspool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sat Nov 2 20:52:37 2024
837M / 837M scanned, 433M / 837M issued at 144M/s
129M resilvered, 51.73% done, 00:00:02 to go
config:
NAME STATE READ WRITE CKSUM
zfspool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016 ONLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VBb5503a9e-16b49d75 ONLINE 0 0 0
replacing-2 DEGRADED 0 0 0
11050710442926509085 UNAVAIL 0 0 0 was /dev/disk/by-id/scsi-SATA_VBOX_HARDDISK_VB1600af16-50c06ff2-part1
scsi-SATA_VBOX_HARDDISK_VB137e6d9d-217cfd0a ONLINE 0 0 0 (resilvering)
errors: No known data errors
user@nas:~$
So the silvering operation has started!
We will wait until the resilvering has completed to check for disk space...
user@nas:~$ zpool status
pool: zfspool
state: ONLINE
scan: resilvered 281M in 00:00:04 with 0 errors on Sat Nov 2 20:52:41 2024
config:
NAME STATE READ WRITE CKSUM
zfspool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VB3705a07d-9aba9016 ONLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VBb5503a9e-16b49d75 ONLINE 0 0 0
scsi-SATA_VBOX_HARDDISK_VB137e6d9d-217cfd0a ONLINE 0 0 0
errors: No known data errors
user@nas:~$
Now the moment we all are waiting for: let's check for the pool capacity!
user@nas:~$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
zfspool 558M 37.9G 128K /mnt/raid
zfspool/Documents 555M 37.9G 421M /mnt/raid/Documents
YAY! Double the disk space! The whole thing worked!
One last command: we need to set the autoexpand property to off otherwise the system would waste time and resources and useless logs monitoring for auto expand, so let's turn that off:
user@nas:~$ sudo zpool set autoexpand=off zfspool
user@nas:~$ zpool get autoexpand
NAME PROPERTY VALUE SOURCE
zfspool autoexpand off default
user@nas:~$
We made it! We changed all 3 disks of the zfspool disk pool, and the capacity is finally that of 2 of the 3 bigger disks: about 2 x 20GB, so 40GB.
Now let's check another topic: My Backup Disk is almost full!
ZFS Backup, (c) 2024 Luca Finzi Contini - Use it at your own risk but enjoy doing so :)