I’m not going to lecture anyone about backups. I’m going to trust you already know you need backups and that you’re curious about how to set them up or you want to compare your setup to mine. Either way, you’re here now.
My setup is pretty straightforward. I repurposed an old NUC with an external SATA enclosure to act as the borg server. The external enclosure has four drives in a RAID-5, providing enough capacity for me to backup for a quite a while. In addition, I have the most important data replicated to object storage using rclone. This gives me multiple copies of data with multiple layers of protection, covering the 3-2-1 strategy for my most important data, and for the rest I am ok with the risk of loss.
Borg works off the principle of client-server, so the clients backup data to the “remote” borg server (it’s not actually remote, just at the other end of the house). If you aren’t familiar with borg, I highly encourage you to read the docs and have at least basic familiarity with it’s principle of operation.
My setup is loosely based on the configuration described here, so you may want to give that a read too if you like.
What and when to backup
I have a lot of things self-hosted in my home and on virtual machines in a couple of different services. I’m simply going to list a few of the things I backup, and leave you to evaluate what data you want backed up and where it’s at.
Some of the things I backup:
- My home’s primary NAS share for miscelaneous files
- My Calibre library
- The decades of photos that have been retrieved from various phones, cameras, and scanned
- My MP3 collection
- Unifi controller configuration
- Pi-Hole config
- homelab bind and dhcpd config
- WireGuard config
- NextCloud data and config
Once you have identified what data you want to protect, you need two additional bits of information: how frequently to back it up and how long to retain those backups.
Notice that I didn’t say anything about how quickly you can recover the data. For me, this is my personal data and configuration data related to services I use when I work from home. It’s not critical that I get it back within minutes or hours, even though for most scenarios it’s a local recovery that will run at Gigabit speeds. But, you should carefully consider your needs and make decisions on the speed, quantity, frequency, and retention of your backups.
I chose the following retention policies:
- Daily frequency: retain 7 daily, 4 weekly, 3 monthly
- Weekly frequency: retain 4 weekly, 3 monthly
- Monthly frequency: retain 3 monthly
Which policy I use depends on how frequently the data changes and how much I’m willing to risk losing. If the source were to fail, I would lose data since the last backup. Alternatively, if I do something to break a config (e.g. I completely bork my DNS setup), then I might lose any valid changes I’ve made since the last backup.
For example, my phone uses the NextCloud app to automatically upload photos when they’re taken, so NextCloud’s data is backed up daily. But, WireGuard config rarely changes…I’ve only got so many clients and they don’t change often, so it gets backed up monthly. Some would argue that I could use the same policy for everything and if nothing changes it won’t matter because borg does deduplication. Well, you do you, I’m happy with what I’ve got.
Ok, enough preamble, let’s do some work!
Setup
- Configure the backup server
This assumes that the data will be stored at
/mnt/hive
. This could be any storage config (I happen to be using anmdadm
RAID-5 array), just make sure that there’s enough room for your backups.From the backup server, as
root
:12345678910111213# install borgyum install epel-releaseyum install borgbackup# create the borg useruseradd -m borgpasswd borg# make directories for each of the clientsmkdir -p /mnt/hive/client-name# set permissionschown -R borg:borg /mnt/hive - From the clients, install borg and copy the SSH public key to the borg server
Copying the SSH key allows the borg process to connect to the backup server without a password.
From each client, as
root
. I’m choosing to execute backups asroot
to avoid permissions and access issues on the client. If you want to allow users to backup their home directories and other data, then use that user’s account to do this.123456# install borgyum install epel-releaseyum install borgbackup# copy ssh key to borg hostssh-copy-id borg@collective.home.lan - Configure server SSH access restrictions
We want to prevent the client connections from doing anything other than borg. This isn’t necessary in a small home setup, but it doesn’t hurt either and eliminates keeping track of paths for the backup repos.
From backup server, as borg user:
12345678# edit the authorized_keys file to restrict each hostvi ~/.ssh/authorized_keys# each of the host entries should follow this format:# command="cd /mnt/hive/; borg serve --restrict-to-path /mnt/hive/"# for example, here is the entry for server-1# command="cd /mnt/hive/server-1; borg serve --restrict-to-path /mnt/hive/server-1" ssh-ecdsa root@server-1.home.lan - Configure clients for inital backup
With the server side setup, let’s configure the backups on the clients. For each client we will initialize a repo, backup the encryption key, and then kick off the initial backups. A quick note, the output from the below
borg create
commands reports files after they’re done. So, if it looks like it stops and doesn’t change for a long time on a file that is small, it could be that it’s in the process of backing up a large file.From each client, do the following:
12345678910111213141516171819202122232425262728293031323334# init the backup repoborg init --encryption=repokey-blake2 borg@collective.home.lan:backup# export encryption keys, BE SURE TO PUT THESE AND THE PASSPHRASE SOMEWHERE SAFE!borg key export borg@collective.home.lan:backup ~/borg-server-1-backup.key# one less thing to input belowexport BORG_REPO=borg@collective.home.lan:backup# initialize a daily backupborg create \--verbose --filter AME \--list --stats --show-rc \--compression zlib,5 --exclude-caches \::'{hostname}-daily-{now}' \/home/me/photos \/some/important/data# initialize a daily backupborg create \--verbose --filter AME \--list --stats --show-rc \--compression zlib,5 --exclude-caches \::'{hostname}-weekly-{now}' \/path/to/data \/another/path/to/data# intialize a monthly backupborg create \--verbose --filter AME \--list --stats --show-rc \--compression zlib,5 --exclude-caches \::'{hostname}-monthly-{now}' \/some/data/hereLet’s break the above command down a bit…
1234567891011121314151617181920212223242526272829303132# we want borg to create a new backupborg create# output verbose logging--verbose# only backup new, updated/modified, or files which couldn't previously be backed up--filter AME# output all files considered for backup, even if no action was taken--list# show statistics about the backup when done--stats# print the return code to the output--show-rc# compress the data using the zlib algorithm and level 5# a higher level means better compression, but more CPU usage--compression zlib,5# exclude any directory with a CACHEDIR.TAG file--exclude-caches# this line is simplified because we exported the borg repo name into an# environment file, so the "::" is telling it to use the env value# the remaining info, in the single quotes, is the template for the backup name::'{hostname}-monthly-{now}# last, any paths we want backed up as a part of this job/path/to/data - Configure automated backups and pruning
This section is a mashup of what this person did and the Borg docs here.
Backups that don’t happen are not useful, so we want to make sure that they occur on a regular schedule. This will require three things:
- A script which will trigger the backup and retention pruning. I chose to store mine in
/root/.local/bin
.12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849#!/bin/sh# the repo and it's passphraseexport BORG_REPO=borg@collective.home.lan:backupexport BORG_PASSPHRASE='verylongandsecure'# some helpers and error handling:info() { printf "\n%s %s\n\n" "$( date )" "$*" >&2; }trap 'echo $( date ) Backup interrupted >&2; exit 2' INT TERMinfo "Starting backup"# backup the directoriesborg create \--verbose --filter AME \--list --stats --show-rc \--compression zlib,5 --exclude-caches \::'{hostname}-daily-{now}' \/path/to/data 2>&1backup_exit=$?info "Pruning repository"# prune the repoborg prune \--list \--prefix '{hostname}-daily-' \--show-rc \--keep-daily 7 \--keep-weekly 4 \--keep-monthly 3 2>&1prune_exit=$?# use highest exit code as exit codeglobal_exit=$(( backup_exit > prune_exit ? backup_exit : prune_exit ))if [ ${global_exit} -eq 1 ];theninfo "Backup and/or Prune finished with a warning"fiif [ ${global_exit} -gt 1 ];theninfo "Backup and/or Prune finished with an error"fiexit ${global_exit} - A service unit to execute the above backup script. This file is in
/etc/systemd/system/
.123456789101112[Unit]Description=Borg Daily Backup Service[Service]Type=simpleNice=19IOSchedulingClass=2IOSchedulingPriority=7# this will ensure the repo isn't locked for any reason, e.g. the# system was rebooted while a previous backup was happeningExecStartPre=/usr/bin/borg break-lock $REPOSITORYExecStart=/root/.local/bin/borg-daily.sh - A timer unit to start the service unit on a schedule. This file is in
/etc/systemd/system/
.If you have more than one or two clients you may want to consider staggering the time that backups happen. If 10 clients all backup at the same time, then the network and disk subsystems of the borg server could be bottlenecks, leading to backups taking longer than if they were serialized.
123456789101112131415[Unit]Description=Borg Monthly Backup Timer[Timer]WakeSystem=false# this will trigger the unit at 3am every day# within 10 minutes of 3am. to trigger on a weekday, use# syntax like: Mon 01:00, which will happen every Monday# at 1am. to trigger on on the first of each month use syntax# like *-*-01 03:00OnCalendar=*-*-* 03:00RandomizedDelaySec=10min[Install]WantedBy=timers.target
For each client, and each set of backup locations + retention (in other words, the
borg create
commands above), create these three files.1234567891011# optionally check that there are no errorssystemd-analyze verify borg-backup-sample.timer# after creating the files, reload systemdsystemctl reload-daemon# enable and start the timersystemctl enable --now borg-backup-sample# check for the next execution timesystemctl list-timersIf the last command,
list-timers
, doesn’t show the backup timers try appending--all
. If the timer is still not visible, there’s probably an issue with the unit file, make sure it was loaded usingsystemctl status borg-backup-sample.timer
.For CentOS/RHEL 8+ and Ubuntu 18.04+ systems you can use
systemd-analyze calendar
to verify theOnCalendar
statement used in the timer unit.View the logs using the command
journalctl -u borg-backup-sample
. - A script which will trigger the backup and retention pruning. I chose to store mine in
Last thoughts
You may want to configure some sort of email or other alert for your backups. Be conscious of alert fatigue, so I strongly suggest filtering the events so that only actual failures and other events that aren’t simply “success” are sent.
Backups are only as good as the testing you do. It’s a very, very good idea to periodically test recovery for each one of your clients and data sources periodically. I have a calendar reminder for each one that rotates every two months. It takes literally five minutes to test. Remember, backups are worthless, but recoveries are priceless!
Look at borgmatic which makes life so much easier. Also why create different backups for day/week/month(year), why not just create the same backup routine and then let prune take care of it.
It might be easier to see what is what when you have arcives named -daily- -weekly- -montly- but that is a reminiscence from the old tape days, when you store your backup in a repository then it no longer matter – and borgbackup is de-duplicating data, meaning no matter what, a file is only saved ones for each version you backup.
Thanks a lot for the guide!
A small correction for one of the comments in step 5: The –filter=AME option only influences which outputs the –list option gives you (so it outputs you the A(dded), M(odified) and E(rror) files while omitting the U(nchanged) files). So it doesnt affect what gets backuped – just which files get reported. (https://borgbackup.readthedocs.io/en/stable/usage/create.html#item-flags)
Thanks for the guide! It was very helpful.
There’s one thing I don’t understand what you’re doing. In paragraph 4 you generate 3 archives: daily, weekly and monthly. Later the script in paragraph 5 only creates one daily archive and also prunes only this archive. I don’t understand the creation of the two other archives (and therefor didn’t do it.) Am I missing something?