If you ever had a hard drive fail without having a backup then you know it can be a painful experience. It happened to me when I was 16, and I lost a hard drive filled with music. But my loss was far from important, it was only music that I had obtained from sailing the high seas. I mean sure, it took a lot of time to acquire during those days, does anyone else remember it taking 20 minutes to download a single song? But it was far from irreplaceable. What happens if it’s your family photos or letters or other things that simply can not be replaced? I learned my lesson that day, backups are important!
The 3-2-1 rule for backups states you should have 3 copies of your important data, on 2 separate devices, with 1 separate location. It’s also important to understand that RAID is not a backup. RAID is a solution that fixes the problem of downtime in case of drive failure. It does not protect against accidental file deletions; therefore, it is not a proper backup solution.
Now onto 3-2-1, and the software that I use in my setup.
The servers sitting down in the basement are just used Thinkcentre PCs bought off of eBay for around $100 each and there are 4 x 8TB drives connected to them. Using snapraid I have 3 drives usable as data drives, and one drive is set to be a parity drive. This means if any one drive out of the 4 fails, I lose nothing. Well, to be accurate, I have snapraid set to sync nightly @ 2:00AM, which means, I could lose any data that has been changed since the last nightly sync.
I like this timing because if I am working on something during the day, and accidentally delete it then I can recover the files from the last sync the night before. And it’s unlikely that I place something on the drives that is not replaceable, and also not still on another device, before the next sync takes place so this works for me. But remember how RAID is not a backup? Well, despite it’s name, snapraid is software that blurs those lines a bit between RAID and a backup solution. It sort of acts like RAID and it sort of acts like a backup solution in some ways. However, I do not consider it a backup solution, which is why I also use borg for my real backups.
Borg has deduplication, which means if I have multiple copies of the same file in different folders or even on different machines then it only needs to take up the space of about one of those files. Likewise if I have a big file and only a little of the data changes, it doesn’t need to make a completely new copy of the changed file, just a little more than the changed parts of it. This saves a lot of storage space. Borg is also fast, easy to install, easy to script, supports encryption and compression, and allows for backing up to remote hosts. The backups that borg creates are what I actually consider to be my backups.
And then finally, there is timeshift, and timeshift is software used to backup your OS installation. And specifically your OS installation. So I use timeshift to make sure if my OS gets borked by an update or my own misguided hands then I can easily revert it.
How does this look in actual practice?
My devices (phones, computers, laptops etc.) have pictures, videos, settings, etc. (copy 1, device 1, location 1) that I do not want to lose so syncthing runs on it and copies all those files and folders to the snapraid array (copy 2, device 2, location 1). This is also where almost all of my living data is stored… movies, tv shows, music, documents, games, installs, selfhosted services and data, etc. Everything of any importance is on the server in the snapraid array.
The server has the snapraid parity so in turn gives a backup (copy 3, device 3, location 1) but as I said, I don’t consider snapraid a real backup solution because it’s set to sync nightly so…
Borg makes full backups of all of this to two additional large external drives giving me another +2 copies and +2 devices of redundancy. Then borg makes backups to an online storage solution giving another +1 copy, +1 device, +1 locations.
So my device backups have 6 copies, on 6 devices, at 2 locations. And my server data is backed up with 5 copies, on 5 devices, at 2 locations. It’s for sure overkill, but I had the extra external drives so I figured I might as well use them this way.
Some things I learned through setting up these software solutions:
- Make sure to exclude the data drive mounts for timeshift in the timeshift.json file. My data drives are mounted in /srv and timeshift was including them in the snapshot which exceeded the size of the drive and caused it to error.
- snapraid-runner is a nice script for running snapraid sync and scrub and easily setting other options. I also learned that in order for snapraid to run correctly without errors on my data that I needed to add an ignore rule for *.log files as they were changing during the sync process and causing it to error.
- When running snapraid sync or a borg backup to shutdown all docker containers and limit anything that might have a database or try changing a file while the backup is occurring. Since most of my services run in docker containers, here is an example of how I do this to start snapraid-runner:
#!/bin/bash #stop all docker containers and if successful start snapraid-runner docker stop $(docker ps -a -q) && snapraid-runner -c /path/to/config #these services need to start before other services that depend on them docker start redis docker start nginx #start all the other services docker start $(docker ps -a -q)