My approach for backups from a Linux computer. For me to find this a quick reference, but also for everyone else who finds this useful. This is an opinionated approach and I do not guarantee that this will work for you.
I like the 3-2-1 backup rule, which states that there there should always be 3 copies of your data, on two different devices with one copy being off-site. Of course, exceeding this approach will not hurt, but it also increases efforts and complexity.
In addition to that golden rule, here are eight more considerations:
- Since all my primary instances of data are encrypted at rest, the backups on other devices should live up to that.
- Backups should be taken constantly. Maybe not for every savepoint of a file but definitely more often than daily.
- I want to be able to restore files immediately. Meaning: offline and without the need of physical access to another device.
- Backups should run automatically in the background and should not require me to remember doing them manually every now and then.
- On day X, I will not need a backup, I need a data restore. Only when backups are tested regularly, I can be sure to get that.
- I prefer simple solutions, where a backup is also just a bunch of files. A proprietary blob that can only be interpreted by a specific backup tool version is a nightmare in the making.
- I want to be able to store off-site copies on cheap public clouds without having to trust in their confidentiality, integrity, or availability.
- The user account I work with on a daily basis shall be able to add data to and restore from a backup, but never to modify or delete it. Think: ransomware.
To tackle the various requirements above, my backup approach is broken down into a number of layers. Each serving a different purpose and with the ability to pick and choose how it is done and even switch tools on one layer, ideally without influencing other layers.
The computer storing the live version of my data shall have a local backup of these files. Meaning: after the first run of a backup, there will be two identical copies of that file. If the file is then changed and the backup runs again, three versions of that file are there: the live version, the (identical) new backup and and previous version as backup. Since the underlying storage is encrypted, no encryption is needed from the backup tool in this layer.
As with any electronic (or even worse: mechanical) device, the question is never if it will fail but when. To counter that, I want my data to be copied onto a second device, like a USB harddrive, a NAS or a public cloud. This step will require encryption on file-level. Encrypting whole disks is nice, but not nicely compatible with cloud storage. But this layer not have to care about keeping a timeseries of the data, as that requirement was already covered in the previous layer.
The two previous layers can be executed manually. And should be, in the first approach, to obtain a better understanding of the tools and the process. But for a permanent use, there should be no manual trigger necessary. Both for creating the backups as well as for the verification that a restore will produce the original data again.
For this task, I have chosen rdiff-backup - to quote from their project repository:
rdiff-backup is a simple backup tool which can be used locally and remotely, on Linux and Windows, and even cross-platform between both. [..] Beside its ease of use, one of the main advantages of rdiff-backup is that it does use the same efficient protocol as rsync to transfer and store data. Because rdiff-backup only stores the differences from the previous backup to the next one (a so called reverse incremental backup), the latest backup is always a full backup, making it easiest and fastest to restore the most recent backups, combining the space advantages of incremental backups while keeping the speed advantages of full backups (at least for recent ones).
Install rdiff-backup according to the instructions for your platform.
Decide, whether you want to include all files and directories in the backup. I am excluding most of my dot-directories, as especially .local or .cache tend to grow big fast and for me contain no valuable content. This command will print those in a file (run in your home directory):
find . -maxdepth 1 -type d -name '.*' > .rdiff-exclude
However, some directories like .ssh you want to backup! So review the .rdiff-exclude file and remove each line again when a directory should not be excluded from the backup.
Prepare the target folder where the backup will be stored. Change, if you are not happy with the location.
sudo mkdir -p /home/.backup/$USER
sudo chown $USER /home/.backup/$USER
Now it is time to create the first backup. We cd into the directory so that the relative paths in the .rdiff-exclude file work; otherwise those need to be absolut paths.
cd /home/$USER
rdiff-backup --no-fsync --exclude-special-files --exclude-globbing-filelist .rdiff-exclude . /home/.backup/$USER
Depending on the amount of files, this may take a while. However, running this command a second time is significantly faster. And subsequent runs will only take more disk space if you have modified any files. Under /home/.backup/$USER you will now find your backup as plain files. No tool is needed to access the latest full backup.
- File level encryption with
gocryptfs - Allows "reverse" mode where plaintext files on the disk are accessed via an overlay mountpoint and are seen as encrypted there
- Synchronization with external HDD or NAS via
rsync - Synchronization with public cloud (e.g. Dropbox) with their tools
Initialize gocryptfs in the backup folder and create a temporary mountpoint for the encrypted view:
gocryptfs -reverse -init /home/.backup/$USER/
mkdir /home/.backup-encrypted/$USER
Unlock and monut the encrypted folder. After that you should see encrypted files in /home/.backup-encrypted/$USER
gocryptfs -reverse /home/.backup/$USER/ /home/.backup-encrypted/$USER
Synchronize the encrypted files to the external storage medium:
rsync -av --delete /home/.backup-encrypted/$USER/ /path/to/my/usb-hdd/encrypted-backups-of-$USER
On the external storage or the cloud service there should now be a copy of the encrypted backup. Subsequent runs of the synchronization should finish significantly faster, as only changed files will be transferred.
Feel free to have more than one remote copy of your local backup.
Are all requirements met?