Posted on Mon 13 March 2017
Nobody wants backups. Everybody wants restores.
The questions are:
what sort of disaster are you trying to recover from
how often do you expect each to happen
how much time are you willing to take recovering
how much are you willing to spend
Let’s take a few common scenarios.
First: a house full of personal use machines, plus a server. We expect files to go missing or be accidentally deleted fairly often, and we want it to be easy and cheap to recover from that.
The general answer for that is to store files on a networked filesystem of some sort - NFS, SMB, sshfs, whatever - which resides on the server and is snapshotted every so often. Tools for snapshotting include LVM (not recommended), rsnapshot, and btrfs and zfs. Anything with a user-accessible snapshot method is good here - sysadmins don’t need to be involved in every oops.
Second: we have the same setup, but we would also like to make it reasonably easy to restore a whole machine when we have an accident with the hard disk.
For that, we need image backups over the network to the server. We won’t want to snapshot these, just keep the most recent good image. Testing these every so often is necessary.
Third scenario: running a service that makes you money. For this, we want to be up all the time. We can spend a lot more money on this, because we expect to make money from it.
The solutions here involve high availability: multiple machines, possibly in multiple locations, handling the same service in a coordinated fashion. Users need to be automatically directed to a working instance, and we need a monitoring system to tell us when a machine is down, because if the HA system is working we will not get user complaints.