Backups using Borg

I’m not going to lecture anyone about backups. I’m going to trust you already know you need backups and that you’re curious about how to set them up or you want to compare your setup to mine. Either way, you’re here now.

My setup is pretty straightforward. I repurposed an old NUC with an external SATA enclosure to act as the borg server. The external enclosure has four drives in a RAID-5, providing enough capacity for me to backup for a quite a while. In addition, I have the most important data replicated to object storage using rclone. This gives me multiple copies of data with multiple layers of protection, covering the 3-2-1 strategy for my most important data, and for the rest I am ok with the risk of loss.

Borg works off the principle of client-server, so the clients backup data to the “remote” borg server (it’s not actually remote, just at the other end of the house). If you aren’t familiar with borg, I highly encourage you to read the docs and have at least basic familiarity with it’s principle of operation.

My setup is loosely based on the configuration described here, so you may want to give that a read too if you like.

What and when to backup

I have a lot of things self-hosted in my home and on virtual machines in a couple of different services. I’m simply going to list a few of the things I backup, and leave you to evaluate what data you want backed up and where it’s at.

Some of the things I backup:

  • My home’s primary NAS share for miscelaneous files
  • My Calibre library
  • The decades of photos that have been retrieved from various phones, cameras, and scanned
  • My MP3 collection
  • Unifi controller configuration
  • Pi-Hole config
  • homelab bind and dhcpd config
  • WireGuard config
  • NextCloud data and config

Once you have identified what data you want to protect, you need two additional bits of information: how frequently to back it up and how long to retain those backups.

Notice that I didn’t say anything about how quickly you can recover the data. For me, this is my personal data and configuration data related to services I use when I work from home. It’s not critical that I get it back within minutes or hours, even though for most scenarios it’s a local recovery that will run at Gigabit speeds. But, you should carefully consider your needs and make decisions on the speed, quantity, frequency, and retention of your backups.

I chose the following retention policies:

  • Daily frequency: retain 7 daily, 4 weekly, 3 monthly
  • Weekly frequency: retain 4 weekly, 3 monthly
  • Monthly frequency: retain 3 monthly

Which policy I use depends on how frequently the data changes and how much I’m willing to risk losing. If the source were to fail, I would lose data since the last backup. Alternatively, if I do something to break a config (e.g. I completely bork my DNS setup), then I might lose any valid changes I’ve made since the last backup.

For example, my phone uses the NextCloud app to automatically upload photos when they’re taken, so NextCloud’s data is backed up daily. But, WireGuard config rarely changes…I’ve only got so many clients and they don’t change often, so it gets backed up monthly. Some would argue that I could use the same policy for everything and if nothing changes it won’t matter because borg does deduplication. Well, you do you, I’m happy with what I’ve got.

Ok, enough preamble, let’s do some work!

Setup

  1. Configure the backup server

    This assumes that the data will be stored at /mnt/hive. This could be any storage config (I happen to be using an mdadm RAID-5 array), just make sure that there’s enough room for your backups.

    From the backup server, as root:

  2. From the clients, install borg and copy the SSH public key to the borg server

    Copying the SSH key allows the borg process to connect to the backup server without a password.

    From each client, as root. I’m choosing to execute backups as root to avoid permissions and access issues on the client. If you want to allow users to backup their home directories and other data, then use that user’s account to do this.

  3. Configure server SSH access restrictions

    We want to prevent the client connections from doing anything other than borg. This isn’t necessary in a small home setup, but it doesn’t hurt either and eliminates keeping track of paths for the backup repos.

    From backup server, as borg user:

  4. Configure clients for inital backup

    With the server side setup, let’s configure the backups on the clients. For each client we will initialize a repo, backup the encryption key, and then kick off the initial backups. A quick note, the output from the below borg create commands reports files after they’re done. So, if it looks like it stops and doesn’t change for a long time on a file that is small, it could be that it’s in the process of backing up a large file.

    From each client, do the following:

    Let’s break the above command down a bit…

  5. Configure automated backups and pruning

    This section is a mashup of [what this person did](https://blog.andrewkeech.com/posts/170718_borg.html) and the Borg docs [here(https://borgbackup.readthedocs.io/en/1.1-maint/quickstart.html#automating-backups).

    Backups that don’t happen are not useful, so we want to make sure that they occur on a regular schedule. This will require three things:

    1. A script which will trigger the backup and retention pruning. I chose to store mine in /root/.local/bin.

    2. A service unit to execute the above backup script. This file is in /etc/systemd/system/.

    3. A timer unit to start the service unit on a schedule. This file is in /etc/systemd/system/.

      If you have more than one or two clients you may want to consider staggering the time that backups happen. If 10 clients all backup at the same time, then the network and disk subsystems of the borg server could be bottlenecks, leading to backups taking longer than if they were serialized.

    For each client, and each set of backup locations + retention (in other words, the borg create commands above), create these three files.

    If the last command, list-timers, doesn’t show the backup timers try appending --all. If the timer is still not visible, there’s probably an issue with the unit file, make sure it was loaded using systemctl status borg-backup-sample.timer.

    For CentOS/RHEL 8+ and Ubuntu 18.04+ systems you can use systemd-analyze calendar to verify the OnCalendar statement used in the timer unit.

    View the logs using the command journalctl -u borg-backup-sample.

Last thoughts

You may want to configure some sort of email or other alert for your backups. Be conscious of alert fatigue, so I strongly suggest filtering the events so that only actual failures and other events that aren’t simply “success” are sent.

Backups are only as good as the testing you do. It’s a very, very good idea to periodically test recovery for each one of your clients and data sources periodically. I have a calendar reminder for each one that rotates every two months. It takes literally five minutes to test. Remember, backups are worthless, but recoveries are priceless!

Leave a Reply