*

Service Monitoring Script

About the Service Monitoring Script
This is a *very* simply PHP script used to monitor the services running on a server. I wrote it in PHP because it was the quickest and easiest at the time, but the idea could certainly be ported to any other language, or even a shell script. In fact, I'll probably re-write it as a shell script at some point in case a server didn't have php installed.


How the Script is able to monitor the services
The script needs to be set to run every couple of minutes on the system so it can check that the services are running (and if they are not, it will then do something). This is done by setting the script to run as a cron job. More information about cron jobs can be fond online, but here is a basic tutorial.

* * * * * user command_to_be_executed
- - - - -
| | | | |
| | | | +----- day of week (0 - 6) (Sunday=0)
| | | +------- month (1 - 12)
| | +--------- day of month (1 - 31)
| +----------- hour (0 - 23)
+------------- min (0 - 59)


For instance, the following would execute the command "/sbin/runscript" every hour on the hour as root:
0 * * * * root /sbin/runscript

The following would run the command "wget http://google.com" every day at 3am and 5pm as user "bob"
0 3,17 * * * bob wget http://google.com

The following would run the command "service apache2 start" every 5 minutes as the root user.
*/5 * * * * root service apache2 start


The cron commands and time configuration is saved in /etc/crontab (or you can also use a user's cron file by running crontab -e as the user... if you do that don't add the "user" option as shown above). The crontab file has a list of time configurations and commands (like the examples above). Also, it is important to note that having a "#" at the beginning of a line makes that line a comment in the crontab file. Below is the crontab file for my monitor script.

SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/

# run-parts
01 * * * * root run-parts /etc/cron.hourly
02 4 * * * root run-parts /etc/cron.daily
22 4 * * 0 root run-parts /etc/cron.weekly
42 4 1 * * root run-parts /etc/cron.monthly
# this is the monitor script that emails us if apache or mysql stops
*/5 * * * * root php /root/monitor

Notice the line that runs /root/monitor, that is my monitoring script! As you can see, I have it running every 5 minutes as root. If the fact that it runs as root concerns you, it should work as a normal user too, but the script accepts no input parameters, and does not read any files, so running it as root should not really matter because I have the script permissions as 700 and owned by the root user (thus in the /root directory). 700 permissions means other users cannot execute the script, write the script, or even read the script. If you don't know how that works, look up information on the commands "chmod", "chown", and "chgrp" which are used to change the permissions and owners of a file in linux. Let me make this clear: If you run the script from the crontab file as the root user, ONLY ALLOW THE ROOT USER ACCESS TO THAT FILE!

For more information on cron and crontab files, go here, here, and here. You might also notice that all the crontab documents do not mention the user being in the line, only the 5 numbers (for the time to run the command) and the command. All I know is both servers I administer (one mine and one not mine) have a user field there, so I used it. I suggest you take a look at your crontab file and take a queue from the lines already there to know if you should specify the user or not.


How the Script Works
The script will first check to see if the services are running. To do this, it needs the port that the service should be listening on. Then it runs the command "netstat -an | grep 0.0.0.0:X" (where X is the port number) on the command line and checks the output. If the command does not return anything, the service is not listening on the port and is presumed to be in trouble.

The script will run through all the services, compiling a list of any that are stopped. If any are stopped, the script will check for a special lock file located in /tmp before the alerts are sent, and if it exists it will not send out any. If the lock file does not exist, it will create it and then send out the alerts. Once the script sees that no services are stopped anymore, it will remove the lock file if it exists. In this way, the script will only send out the alerts once, so it will not flood your email address or text your cellphone into oblivion. The full script can be seen below, or can be downloaded here.

/*============================================================================
    BEGIN OPTIONS SECTION.  THESE ARE THE OPTIONS FOR THE SCRIPT.
============================================================================*/
// email this person if things go wrong!!
$emails[0] = "myemail@mydomain.com";
$emails[1] = "anotheremail@mydomain.com";
$emails[2] = "somebody@somewhere.com";
$emails[3] = "youget@theidea.com";
// text these phone numbers using the appropriate SMS gateways
// for more info on SMS, see http://en.wikipedia.org/wiki/SMS_gateways
// we make the SMS email message shorter so the text message is not too expensive!
$sms[0] = "4045557777@mymetropcs.com"; // michael's cell with Metro PCS
$sms[1] = "7705558888@messaging.sprintpcs.com"; // Reece's Cell with Sprint
// list the services and ports we need to check.
// note that the service name is the key and the port is the value.
$service['Apache'] = 80;
$service['MySQL'] = 3306;
// set up the email information (name and email the message sas it is from)
$from_name = "Monitor Script";
$from_email = "monitor@mydomain.com";
 
 
/*============================================================================
    BEGIN THE INITIAL SETUP.  THIS WILL INITIALIZE SOME VARIABLES.
============================================================================*/
// set up headers
$headers = "From: $from_name <$from_email>\n";
$headers.= "Reply-To: $from_name <$from_email>\n";
$headers.= "Content-Type: text/plain\n";
// get the current date/time
$time = date("Y-m-d H:i:s");
// the name of the lock file
$lockfile = "MONITOR.lck";
// set to true if a service is found to not be running
$error = array();
// clear cache used for file_exists() function so we get updated data
clearstatcache();
 
 
/* ============================================================================
    BEGIN SERVICE CHECKING TO SEE WHICH SERVICES ARE NOT RUNNING.
============================================================================*/
// loop through services
foreach($service as $k=>$v)
{
    // quick sanity checks
    if(!is_numeric($v)) continue;
    if($k=="") continue;
    // build the command
    $cmd = "netstat -an | grep 0.0.0.0:".$v;
    // get the output of the command and see if it's blank
    if(shell_exec($cmd)==""){
        // debug echo so you can run this manually.
        echo "[ERROR] $k is NOT listening on port $v\n";
        $error[$k] = $v;
    }else{
        // debug echo so you can run this manually.
        echo "[SUCCESS] $k is listening on port $v\n";
    }
}
 
 
/* ============================================================================
    BEGIN ACTIONS IF A SERVICE WAS FOUND NOT TO BE RUNNING
============================================================================*/
// if any service was not listening on it's designated port, perform actions
if( count($error) > 0)
{
    // if lock file exists, exit and do nothing because we already did actions.
    if(file_exists("/tmp/$lockfile")){
        echo "Lock file exists already, exiting.\n";
        exit;
    }
    // try to set lock file so that we don't send more than one alert
    // lock file MUST be in tmp directory so file will be removed on reboot automatically
    echo "Creating lock file\n";
    if(strlen($lockfile)>0) shell_exec("echo \"FILLER\" > /tmp/$lockfile");
    // set main message that will be put in messages file
    $msg = "Monitor Script Found services *NOT* running at $time :";
    // set subject
    $subject = "[SERVER ERROR] Services Reporting as Stopped";
    // set the body of the email for mail emails
    $body = "\nThis is a Monitor Script Alert email message.\n\n";
    $body.= "At $time, the following services have problems:\n\n";
    // build the list of services found stopped
    foreach($error as $k=>$v){
        $body.= "$k on port $v\n";
    }
    $body.= "\nAnother alert will not be sent until all services are found to be running.\n\n";
    // begin sending emails.
    foreach($emails as $email){
        if(mail($email, $subject, $body, $headers)){
            shell_exec("echo \"$msg Email Sent Successfully to $email\" >> /var/log/messages");
        }else{
            shell_exec("echo \"$msg EMAIL SEND ERROR TO $email\" >> /var/log/messages");
            // debug echo in case you manually run this.
            echo "Email Sending Failed.\n";
        }
    }
    // set up SMS body and send SMS emails to text cell phones
    $subject = "SERVER ERROR. ";
    $body = "Services stopped:";
    // add list of services not running to the sms message
    foreach($error as $k=>$v){
        $body.= " $k";
    }
    // loop through sms emails and send the text messages
    foreach($sms as $email){
        if(mail($email, $subject, $body, $headers)){
            shell_exec("echo \"$msg SMS Sent Successfully to $email\" >> /var/log/messages");
        }else{
            shell_exec("echo \"$msg SMS SEND ERROR TO $email\" >> /var/log/messages");
            // debug echo in case you manually run this.
            echo "SMS Sending Failed.\n";
        }
    }
}
// if all services are okay and running, do this.
else{
    // sanity check
    if(strlen($lockfile)>0){
        // see if lock file exists or not
        if(file_exists("/tmp/$lockfile")){
            // if lock file exists, remove it since everything is okay and running now.
            echo "Removing lockfile\n";
            shell_exec("rm -f /tmp/$lockfile");
        }else{
            echo "No lock file to remove\n";
        }
    }else echo "Error: Lockfile is blank\n";
}


Download the script
You can download the script below. Please make sure you change the emails and SMS emails that the alerts are sent to before you run it! Also, you can run the file manually with the same command you put in the crontab file, and it should print out basic debugging messages for you. That way you don't have to wait 5 minutes every time you want to test a change.

Download the script code here: Monitor Script


Taking it further
The above php script does not take into account zombie processes though, which might cause issues. I saw this a bit when dealing with redmine running as it's own Ruby process. Ideally you would set it to run under apache via mod_ruby but in this instance we wanted to use it as it's own process. In order to fix it, I simply wrote a watchdog process in my new language of choice, Python! It basically checks every 5 minutes (via cron) to see if it can connect to the redmine http server, and if not it will start the server again.

import httplib
import os

def main():
        h3 = httplib.HTTPConnection('localhost', 3000, 5)
        try:
                h3.request("GET","/")
        except:
                cmd = "/redmine/script/manualrun -e production 1>/tmp/redmine.log 2>&1 &"
                os.system(cmd)

# just redirects the call of __main__ to main()
if __name__ == "__main__":
        main()




reece
home
history
baby
pictures
calendar
addresses
wall
projects
4006
word
flickr
monitor
chat
lolmail
work
cocard
ibm
resume
dev
sudoku
security
portsentry
portknock
badbot
setuid
web
greasemonkey
visitors
links
downloads
misc
art
vote
influence
waffles