Skip to content

simplesnap breaks if a dataset or volume is renamed #29

@craig-sanders

Description

@craig-sanders

I recently (Oct 12) renamed one of my zvols (used for VMs) from exp/debian10 to exp/volumes/debian10. All of my other VM zvols on my exp pool were under exp/volumes but this one was accidentally created at the top level of the pool and I wanted to tidy it up.

I only just noticed that this prevents simplesnap from backing up that zvol, failing with error: 141.

Worse, due to #1, it also prevents simplesnap from backing up anything after it (which is particular annoying on my system because the HDD exp bulk-storage pool sorts before my NVME rootpool hex, so hex's root pool hasn't been backed up to the backuphost since then....in fact, that's how I noticed the problem - I needed to examine some old kernel logs to find out exactly when a completely-unrelated problem started happening but the local zfsnap snaphost had expired and hadn't been backed up to my backuphost).

2022-12-17T04:23:05.694528+11:00 hex simplesnapwrap[866318]: Sending incremental stream back to exp/volumes/debian10@__simplesnap_mainset_2022-10-12T01:34:31__
2022-12-17T04:23:05.696821+11:00 hex simplesnapwrap[866318]: Running: /sbin/zfs send -I exp/volumes/debian10@__simplesnap_mainset_2022-10-12T01:34:31__ exp/volumes/debian10@__simplesnap_mainset_2022-12-17T04:23:05__
2022-12-17T04:23:06.741830+11:00 hex simplesnapwrap[866318]: zfs exited with error: 141

Some Workarounds:

  1. The user (i.e. me) can exclude those zvols with org.complete.simplesnap:exclude=on. Far from ideal. Renaming a backup shouldn't mean that it can no longer be backed up.

  2. A better, but still not ideal, workaround is for the user to undo the rename, i.e. rename them back to what they were. This is what I did. It works.

  3. The user can manually rename the receiving data set on the backuphost - e.g. I could have run zfs rename backup/simplesnap/hex/exp/debian10 backup/simplesnap/hex/exp/volumes/debian10. At least I expect that this would work, but I didn't test it.

    This requires them to be aware of the issue with renaming datasets, so it needs to be documented.

    It also requires that the user has zfs admin privileges on the backup host (which may not be true if the machine is a desktop in, for example, a corporate or academic environment...they may have root on their desktop but not on the backup server), or at least be able to communicate the problem correctly to the backuphost's admin. So, again, documentation. And maybe some helpful messages in the log.

A possible solution

Simplesnap should detect whether a dataset has been renamed (e.g. existence of two __simplesnap_* snapshots but no corresponding dataset on the backuphost under $STORE) and either:

  1. Explicitly log that a renamed dataset has been detected along with a suggested fix (i.e. suggest one or more of the workarounds above) or, at least, point to the section of the man page/other documentation where this problem is detailed.

  2. Treat it as a completely new dataset and do a full backup rather than incremental. This should work but has the down-side of being a duplicate of the original dataset name. It will also break if the renamed dataset gets renamed back to what it used to be. Or if a new dataset is created with the old name.

  3. Somehow figure out that a) a dataset has been backed up and b) what its original name was, and then rename the dataset on the backup store to match.

    The only reliable way I can think of to do that is with a new property, e.g. org.complete.simplesnap:renamed_from=oldname. This would require the user to manually add that property when they rename a dataset.

    After renaming the dataset on the backuphost and completing the backup, simplesnap should probably delete that property on the active host and the backup host (e.g. with zfs inherit which seems to be the only way to delete a property from a dataset) or set it to empty or - or something.

or

  1. Fix Premature exit when a zfs receive has an error  #1 so that it doesn't prevent remaining snapshots from being backed up. A problem with one dataset on a host really shouldn't prevent other datasets on that host from being backed up. Rather than logging the error and dying, it should log the error and continue with the next dataset.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions