## Remote System Monitoring
## Steven St.Laurent <>
Many tools have been designed to check, monitor and inform the
administrator or user on the status of his/her systems. These
products, while very good in their own right, tend to require a good
deal of configuration for specific situations and in many cases do not
quite fit my personal needs. I like simple, I like low load, and I
like fast!
A large monitoring program might be worth using on a dedicated machine
but I wanted something for a loaded server. "Why is this of any
importance to me?" you might ask? Well considering I could find no
documentation on a using existing tools to monitor and learning
something new is fun. You might find this useful somewhere down the
road.
I will not give you a ready to run C or Perl script, rather this is
just a good outline, useful for developing your own applications.
None of this is magic, many might already be using this. I also did
not invent this, I just present it as part of my work towards a good
monitoring solution.
First off I needed a set design goal for this first part. Mind you,
this is just one part of a monitoring system. My goals (might vary
compared to yours) were:
- Simple design which could be incorporated into Perl or C scripts
- Uses existing tools available for most/all Unix machines
- Interoperability across operating systems, Solaris, DU, Linux, FreeBSD,
etc.
High Fault Tolerance
With these goals in mind I decided to use something similar to what
we were already familiar with, uptime. Uptime is typically used from
the command line, i.e.:
$ uptime
8:30AM up 14 days, 20:40, 7 users, load averages: 0.03, 0.05, 0.06
As we can see this is all we really need from a remote monitoring
position, at least all I need to start. I see clearly total time up,
number of users, and load. From this one line I can easily write a
perl script which can respond to conditions, emailing or paging me if
load gets too high or the machine reboots.
The problem here is Fault Tolerance. If the machine reboots the
script will start anew, if the machine loses it's network interface it
cannot communicate, or if load reaches a high point it might not be
able to respond. Running a monitoring script from a separate machine
is the best solution, with several error conditions taken into account
you can easily detect problems quickly and efficiently.
Here is how I setup my machines. You might choose otherwise, but the
scheme is the same. First determine what machine will be the monitor.
For my situation, I decided my workstation would be best for the time
being. If redundancy is required I would use two machines, with the
second machine monitoring the first and taking over IF the first
machine stopped responding. These steps do assume you have root on
the machines in question or can convince the admin to do these steps.
- Install tcp-wrappers. As I write this, the current version for
FreeBSD appears to be tcp_wrappers_7.6. Install it from the
ports collection if you like. You will want to install this on all
machines which will be monitored.
- Edit /etc/services. You will need to add the following line to
your services file:
uptime 333/tcp #Monitor Uptime
Inetd needs to know which services exist and which do not. I use
port 333 since it seems to be unassigned and it's not a good idea
to conflict with other services.
- Edit /etc/inetd.conf and add the following:
uptime stream tcp nowait nobody /usr/local/libexec/tcpd /usr/bin/uptime
I'm not going to get into a long discussion about the pro's and
con's of using inetd. Security-wise it can be lacking unless you
know what you are doing. Since we installed tcp-wrappers I can
feel safe offering limited services through inetd. In this case,
let's just leave telnet and our new entry.
If you have not already, comment out all other services you are not
using. Also, consider using tcp wrappers for other services as
well.
- Secure up the system. With inetd running it's now time to secure
up the system by editing the /etc/hosts.allow and /etc/hosts.deny
files. These will be checked against by inetd to see if the
service is allowed to a particular client. If you do not already
use these DO SO. They are vital for a secure system. Especially
when running inetd. Here is an example from mine:
uptime: 10.0.0.14 /
10.0.0.25
You might have other entries here. Here I've allowed uptime (port
333 from services) access from just two machines, 10.0.0.14 (mine)
and 10.0.0.25 (a backup machine just in case). In hosts.deny, I
have:
ALL:ALL
I only want my machines to be able to access this information.
- Test it out from an allowed host telnet to port 333. You should
see something like this.
$ telnet 10.0.0.1 333
Trying 10.0.0.1...
Connected to 10.0.0.1.
Escape character is '^]'.
9:16AM up 60 days, 19:42, 20 users, load averages: 0.00, 0.03, 0.00
Connection closed by foreign host.
$
If you have problems try restarting inetd. Using 'killall -HUP inetd'
should work fine. If inetd is not running, start it! If you still
have problems, and inetd is running try 'tail -f /var/log/messages'
to see if it is generating any errors.
You can do almost anything with this tool. For my project it, gives
me the ability to monitor and track load. While not the best solution
I can run this with a database backend and store the uptime
information. Running every 5 minutes, I can easily monitor gross load
figures. I also have the ability to monitor the machine by watching
for a response. If a certain machine does not respond to a uptime
request, I can set it to email me. In conjunction with other tools
you can quickly create a very robust monitoring system.
This is not a panacea and does not pretend to perform as well or
better than existing products, it is a different way though. You can
do other various tricks through the same method. If you imagine it,
you can probably do it. Plus, the learning experience is far more
valuable that you can imagine.
- Steven
Return to Issue #5
|