random strings - devopshttps://blog.randomstring.org/2023-07-26T21:19:07-04:00one of the one true ways of ops2023-07-26T21:19:07-04:002023-07-26T21:19:07-04:00-dsr-tag:blog.randomstring.org,2023-07-26:/2023/07/26/one-of-the-one-true-ways-of-ops/
<p>I’m going to tell you the secret (it’s not a secret) to building
reliable, operable, debuggable infrastructure. This is going to be
terse, but hopefully understandable to someone with just a little
experience.</p>
<p>I’m going to tell you the secret (it’s not a secret) to building
reliable, operable, debuggable infrastructure. This is going to be
terse, but hopefully understandable to someone with just a little
experience.</p>
<p>You’re going to need some infrastructure. Infrastructure is not the
stuff that you are building, and it’s not the tools that you are
building the stuff with. Infrastructure is the reliable services which
you depend on to help you build your stuff.</p>
<p>At a minimum:</p>
<ul>
<li>an Internet connection</li>
<li>a computer acting as a firewall/router to protect you from the
Internet</li>
<li>a network switch, preferably one which is configurable with
VLANs</li>
<li>more computers than you would think, some of which will be
specialized by speed or amount of storage, RAM, processors, special
hardware…</li>
</ul>
<p>The first rule is that nothing can be built without a firm
foundation. A firm foundation does not change unless someone makes an
active decision to change it, or something breaks. A broken foundation
must be detected and fixed.</p>
<p>To detect things changing, we need a monitoring system. The
monitoring system should make read-only inquiries via SNMP, check on the
functionality of services on remote computers by running tests on them
ranging from pings to port connections through HTTPS queries and SQL
queries. When it has checked on everything, it needs to go through and
do it again. The monitoring system needs a reliable way of sending an
alert. It must reliably continue sending the alert periodically until it
is stopped by a person or the detected problem is no longer
detected.</p>
<p>The monitoring system needs to know what time it is. Use NTP.
Designate at least one machine as an NTP server, and have it talk to a
pool of NTP servers out on the Internet, as well as all of your internal
machines.</p>
<p>The monitoring system needs to be able to send alerts. If the
Internet is up, send email, preferably to a paging service. How will you
get alerts if the Internet is down? You can try cellphone gateways, but
I recommend a different method: set up a small copy of part of your
monitoring system somewhere else. Have this one just monitor the
availability of your services from an outside perspective. Are you
pingable? Are the ports for your applications open? Can a login page be
retrieved? If not, shout via email.</p>
<p>From now on, your main monitoring system gets a new monitor for every
machine you put into service, and new alerts for every new service you
run, internally or externally.</p>
<p>Now you can detect changes. You need to track changes. On a reliable
server machine with lots of disk space, install your version tracking
system. On that or a similar machine, install a web server that can host
a copy of your preferred operating system’s installation system. And,
also, multiple copies of the complete repository of external software.
Why so much space? Someday you will upgrade the operating system, and
for some period of time you will need a copy of the old and a copy of
the new. And new is usually larger than old.</p>
<p>Install a system that can install operating systems on new machines.
That’s usually a combination of DNS, DHCP, PXE, and a PXE-boot menu.
Figure out how you want to name machines now. Figure out how you will
handle expansion in the future. Come up with a flexible network routing
and address allocation policy that is also reasonably efficient.
Remember that humans like unique names for things that they depend on,
but are okay with meaningful+serial names for machines that are
interchangeable.</p>
<p>You now need a way to take a freshly installed (via PXE) machine and
install and configure specific software on it. Study the available
configuration automation systems (ansible, puppet, chef, bconfig,
cfengine, whatever) and pick one that you can live with for a long time.
Consider carefully whether things should be fundamentally pushed from a
server to a client or pulled from a server by a client. Always prefer
pull for repeated tasks.</p>
<p>When someone tells you that technology Z doesn’t provide security,
just convenience, believe them.</p>
<p>You will probably find yourself in need of a database pretty soon. If
you do not have a burning need for a specific database, there are only
three you should consider (as of 2023): sqlite, mariadb (formerly
mysql), and postgresql. Strongly consider using languages with a
built-in database layer that can use all three of these systems.
Consider picking Postgresql and just sticking with it, unless your needs
are very, very simple – in which case, sqlite might be exactly what you
want.</p>
<p>Learn a major web server: either nginx or apache. They both work
well. I think nginx has a slightly better configuration language, but in
the end you’re going to be deploying configs via that config automation
system.</p>
<p>For every language you develop in, you must find out what library
management system they have and make a local repo of the libraries that
you use. You only build from the local repo. Only. Ever. Local. When you
want a new version of something you bring it down into your local repo.
Don’t remove the old one, it might be better. After three versions have
gone by, you might not care any more. This defends against someone
poisoning the upstream source – a supply chain attack. It is not a
perfect defense.</p>
<p>Which systems are ‘development’ and which are ‘production’? They
should look the same, be deployed the same, but you need a gateway
between them. At any moment you should be prepared to repel boarders,
including developers snooping where they should not and clients tugging
on exposed ports. A formal process with a gatekeeper is good, but
remember that codifying and practicing for emergencies makes everyone
feel better on the tragic but inevitable day when disaster strikes.</p>
<p>You need to know who you are trusting. OS developers? Package
maintainers, library authors, coworkers, contractors, clients? Figure
out the data flows and the trust relationships. Document this. You need
a wiki. Pick one that stores wiki pages in the filesystem, not in a
database: the wiki is going to be a precious documentation source, and
on the day you can’t run the wiki software but you can grep and read the
files, you will thank me.</p>
<p>Access control. You will need to get into your system remotely, which
means Wireguard or SSH or both, one over the other. You need to manage
special privileges, which means logins on each machine and sudo or doas
privileges. In whatever application you are building, consider your
security model first and every time you make a change. Keep it separate
from your infrastructure access control.</p>
<p>Now size the backups and make them, automatically and repeatedly. The
rule of backups is this: nobody cares about backups, they only care
about restores. You have three distinct backup targets:</p>
<ul>
<li>oops, I deleted/changed a thing. Can I get it back fast?
<ul>
<li>use a snapshotted filesystem, with automatic snapshots (I like
ZFS)</li>
<li>use a version control system (yes, for its own sake)</li>
<li>use a self-service per-user backup/restore system (don’t do
this)</li>
</ul></li>
<li>this computer died taking a lot of data with it. Can we restore it
fast?
<ul>
<li>have an onsite backup to disk</li>
<li>make those backups nightly</li>
<li>have multiple copies of freshly acquired data</li>
<li>have an offsite backup of the onsite backup for that day when
everything burns (or the power goes out)</li>
<li>could you have a live backup server? It costs more. That might be
worthwhile.</li>
</ul></li>
<li>the lawyer/accountant says we need to retain this for years. Can we
do that efficiently?
<ul>
<li>encrypt that data and store the passphrase in three different secure
places.</li>
<li>offsite is probably good</li>
<li>keep an onsite catalog of where you put it</li>
</ul></li>
</ul>
<p>I haven’t mentioned your load balancing, streaming database
replication, second site, internal firewalls, office systems, or
printing. If you can avoid ever buying a printer, do that. If you can
minimize printing, do that. Buy a larger monitor rather than more reams
of paper and toner. Use wired networking for every machine with a fixed
location, and treat your wireless networks as being outside visitors.
Survey the MAC addresses of the wired machines and refuse changes
without authorization. If you handle payments of any kind, read the PCI
documentation and do better than they demand. You can do it: they demand
the minimum that they can cope with.</p>
<p>Buy more capacity up front. Compare fully depreciated capital assets
versus the cash flow of rented/leased/flexible services, and bet that
you will be in it for the long haul. If you aren’t sure, scale back.
Don’t depend on the whims of giants: buy commodities that you can get
from anywhere.</p>
<p>There’s always more. This is enough to get you a firm enough
foundation that your organization can survive to find out what you need
to do differently.</p>
questions for packaging systems2016-02-27T14:01:58-05:002016-02-27T14:01:58-05:00-dsr-tag:blog.randomstring.org,2016-02-27:/2016/02/27/questions-for-packaging-systems/First, some history. Then, some questions largely of interest to
system administrators.<p>First, some history. Then, some questions largely of interest to
system administrators.</p>
<p>Once upon a time [UNIX] software distribution was simple. Your
operating system came as a huge blob over which you had no control.
Sometimes there would be updates. Sometimes the update would require a
reinstall, sometimes not. Additional software came in similar blobs,
“managed” in a similar way.</p>
<p>For a while, you needed to compile anything you wanted yourself.
Figuring out dependencies was hard, so people adopted makefiles and then
autoconf. Autoconf could figure out what sort of system you were
running, adapt to various inconsistencies and then even figure out what
libraries you were missing. Building a nontrivial system was, well,
nontrivial. If you were missing a key feature, you would need to stop
and build something else, which might in turn require an update or
hopelessly disable some feature in another system that you were quite
fond of.</p>
<p>Automating this mess was a good idea, so people did that. The BSD
Ports system, among others, was a huge library of source code stubs that
declared dependencies and could download code and compile it for you.
The Packages system then arose: software which declared dependencies and
was already compiled.</p>
<p>(Over in Windows Land, you had a single operating system with just a
bare handful of versions, so every software installation process had
relatively simple assumptions about what was available and what would
have to be provided. 22 high-density floppies? Everyone greeted the
arrival of CDs as a lifesaver.)</p>
<p>Modern packaging systems need to be able to answer the following
questions convincingly:</p>
<ul>
<li>Is the software integrity intact?</li>
</ul>
<p>Are the bits that I asked for the bits that I received? Did the
download happen over an encrypted link? Can I verify the integrity
pre-install and post-install? Manually and automatically?</p>
<ul>
<li>Is there an automated system for updates?</li>
</ul>
<p>Can I get a timely notification of an update? Is the notification
protected against fraud? Can I install packages without answering
questions interactively? Can I install updates automatically? Can I
prevent one or more updates while letting others go through?</p>
<ul>
<li>Are dependencies automatically tracked and installed?</li>
</ul>
<p>Is there a clear difference between required dependencies and
suggested add-ons? Can I set a preference to automatically install all
the suggestions? If a suggested package install fails, does it abort the
install or upgrade of the requested package?</p>
<ul>
<li>Are included component libraries tracked?</li>
</ul>
<p>If you have a statically linked or co-packaged library – that is, a
dependency built in to the same package – is there visibility into that
from the outside? Can it be tracked for security or feature updates like
the main package?</p>
<ul>
<li>Is the responsibility for packaging and updates defined?</li>
</ul>
<p>Does your development team overlap with your security team? Do they
know their responsibilities? Are they documented and visible to the end
user?</p>
<ul>
<li>Are the tools for creating and maintaining repositories available
and well documented?</li>
</ul>
<p>Can I create a secure mirror of the upstream repos? Can I create a
local cache? Can I create and advertise a third-party repo for
distribution to other users? Can I create a local repo just for my own
use? Is the source of a package clear to the end-user before
installation? After installation? Can my packages conflict with upstream
packages and replace them? Can I selectively set priorities per-package?
Per-repo?</p>
the ops in devops means operations2016-02-05T08:41:17-05:002016-02-05T08:41:17-05:00-dsr-tag:blog.randomstring.org,2016-02-05:/2016/02/05/the-ops-in-devops-means-operations/Lots of people will give you guff about “devops” and how it means
that you no longer need people to run your business infrastructure
because it is all a simple matter of programming.<p>Lots of people will give you guff about “devops” and how it means
that you no longer need people to run your business infrastructure
because it is all a simple matter of programming.</p>
<p>I cannot emphasize enough how wrong this is.</p>
<p>What devops does is apply the lessons learned in software development
to the processes of running computer infrastructure. As a direct result,
you can improve your processes by minimizing human involvement in the
execution of processes. What you cannot do is improve your processes
without knowing what those processes are, how they are executed, and
what might go wrong.</p>
<p>Ask any developer: can you solve a business problem without
understanding it? No, you cannot. Programming is the expression of
decision-making in a formal automation. You cannot make (good, or even
reasonable) decisions without understanding the process.</p>
<p>The naive devops-uber-alles view concentrates on building an MVP, a
minimum viable product, and then elaborating on it in customer-desired
ways. The ops part is relegated solely to deployment, the process of
getting the application running on one or more servers.</p>
<p>Here is what operations brings to the table:</p>
<ul>
<li>statistics collection</li>
<li>systems monitoring</li>
<li>alerting of humans based on the above</li>
<li>load balancing</li>
<li>strategic deployment decisions (e.g. multiple vendors, geographic
distribution)</li>
<li>backup and restore decisions and methods</li>
<li>security policy</li>
<li>security implementation</li>
<li>security testing</li>
</ul>
<p>In modern development, it is considered good to write tests to ensure
that the code you write does the right thing. In modern devops, you need
to implement runtime tests to assure yourself that the computers you are
running on are still behaving as you desire.</p>