If you are the sort of person who wants to know about the performance of one or more computers, you should know that lots of people share your desires. As a result, there are many software packages that do one or more of gathering information, storing it, graphing it, displaying the graphs, and perhaps generating alerts based on the information received.
Most of the time, though, you can skip researching all those and go directly to small, medium or large.
The small option is munin. It’s small enough that if you have just one machine, it’s reasonable to install munin. Config on the master node takes just a few minutes; config on the client nodes is usually non-existent unless you have funky DNS. You get pretty graphs with some history, the ability to group by nodes or services, and you can look at a live demo.
What don’t you get? If you have a non-UNIX client, you need SNMP support. There’s no Windows support other than SNMP, though Mac OS X can be either a master or client. Finally, there’s not much history or precise statistics – munin tells you more about trends and where you should be looking for problems.
The medium option is observium. Observium only does SNMP, no client-side, which makes it extremely easy to add a new client node – just tell Observium the IP or name and anything odd about the SNMP access configuration. On the other hand, you don’t set up individual SNMP checks – Observium will try everything it knows about, and Observium knows a lot about SNMP. There’s graphing and grouping and a fancier, more interactive web control panel than munin. It looks like Observium will add monitoring/alerting capabilities soon, though I haven’t tried them out yet. Also, observium really wants to run on its own FQDN – if you don’t run DNS, munin is really a better choice for you.
The large option, for people with really complex or large-scale needs, is a combination of collectd and graphite. Collectd acts as the client, graphite charts the results, and there are options for storing all the history, writing your own queries, and everything else that you might want to do. The basic tradeoff is that you will need to invest time in learning the systems and writing your configuration carefully; in return, you can scale to horrendous numbers of systems and sort through exact numbers in search of patterns completely beyond the grasp of lesser systems.