Posted on Mon 04 March 2024

belts and suspenders at home

I have almost 30 years of professional experience as a sysadmin.

Failure is inevitable. Hardware will fail, software will be discovered to have flaws, reconfiguration will be mistaken.

The antidote is to have reliable recovery mechanisms: everything will eventually break or need to be changed, so in turn you need to be able to recover back to a stable position, so you can try again.

Recovery mechanisms are not all-purpose. I usually describe them in three phases:

  • short-term examples include version control, RAID, failover systems, snapshots. These let your systems continue to function after a specific failure or let you revert back to a known good state in a short period of time.

  • medium-term examples: backup (with tested recovery), cold spares, automated deployment systems, alternative paths. These keep you going after a major but limited-scope failure.

  • long-term examples: archives, remote backup, distributed remote deployment, disaster-recovery plans. These let you rebuild somewhere else.

Understanding your systems allows you to plan how you will deal with the inevitable mishaps. Carrying out those plans allows you to have confidence that you won’t dig yourself into a hole that you can’t escape.

The differences between a house network and a small business network are not that great. The business probably has more money to spend, but they have similar needs for reliability. Your family and/or housemates are probably fewer people than your business associates – or perhaps not.

What reliability measures are worthwhile on even a small network?

  • NTP
  • DNS
  • DHCP
  • NAS
  • RAID
  • backup
  • central syslog
  • version control for your configuration
  • diverse routing to the Internet – if it’s cheap enough

What’s not worthwhile unless you have other goals, like “learning this skill”?

  • Kubernetes and similar orchestration systems
  • most multiple-system containerization or VM migration systems
  • internal multi-path networking
  • multi-path disks
  • ansible, chef, puppet, cfengine, nix…

© -dsr-. Send feedback or comments via email — by continuing to use this site you agree to certain terms and conditions.

Built using Pelican. Derived from the svbhack theme by Giulio Fidente on github.