Loadwatch

From Just another day in the life of a linux sysadmin
Jump to navigation Jump to search

LoadWatch

http://mdwiki.hostbaitor.com/#!loadwatch.md

Purpose

LoadWatch was hacked together with the intent to help track down intermittent load spikes. This is generally meant as a short term information gathering script to help catch load issues which we are having problems catching in the act.

Installation

  • Create the necessary directories, touch an empty file for the script, make it executable and then edit the empty file to include the LoadWatch script:
mkdir /root/bin
mkdir /root/loadwatch
touch /root/bin/loadwatch.sh
chmod 700 /root/bin/loadwatch.sh
$EDITOR /root/bin/loadwatch.sh

By default, the script monitors 1-minute load. ( grep -c proc /proc/cpuinfo is a good place to start )

Set THRESH intelligently.

Look at

grep -c proc /proc/cpuinfo

to see how many cores there are. The default of 60 will not capture anything useful on servers with fewer CPU cores. Setting this equal to number of cores or higher will only begin logging after CPU usage is above 100% and you will miss any information leading up to it. Setting to lower than number of cores will capture more information, but add more overhead. Use good judgment.

  • Copy the script below into /root/bin/loadwatch.sh using the text editor of your choice:
#!/bin/bash
FILE=loadwatch.`date +%F.%H.%M`
DIR=/root/loadwatch
#Load Threshold for doing a dump.
THRESH=60

LOAD=`cat /proc/loadavg | awk '{print $1}' | awk -F '.' '{print $1}'`

echo `date +%F.%X` - Load: $LOAD >> $DIR/checklog

if [ $LOAD -gt $THRESH ]
then
    	echo Loadwatch tripped, dumping info to $DIR/$FILE >> $DIR/checklog
        echo `date +%F.%H.%M` > $DIR/$FILE
        free -m >> $DIR/$FILE
        mysqladmin processlist stat >> $DIR/$FILE
        /sbin/service httpd fullstatus >> $DIR/$FILE
        netstat -tn 2>/dev/null | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head >> $DIR/$FILE
        top -bcn1 >> $DIR/$FILE
        ps auxf >> $DIR/$FILE
        #uncomment the following line to monitor exim
        #/usr/sbin/exiwhat >> $DIR/$FILE
fi


  • Set up a cron job to run about every 3 minutes or so. PASTE THIS LINE ENTIRELY INTO A BASH PROMPT. DO NOT ADD IT TO CRONTAB THE WAY IT IS.
echo "*/3 * * * * /root/bin/loadwatch.sh > /dev/null 2>&1" >> /var/spool/cron/root
/etc/init.d/crond restart
  • Test your work:
crontab -l|grep loadwatch

Should show:

*/3 * * * * /root/bin/loadwatch.sh > /dev/null 2>&1

If Plesk, What Do?

Assuming CentOS:

Edit /etc/httpd/conf/httpd.conf and, if it exists, uncomment

#ExtendedStatus On

Then uncomment the server-status vhost. It should look like this when you're done:

<Location /server-status>
    SetHandler server-status
    Order deny,allow
    Deny from all
    Allow from localhost
</Location>

Now restart Apache

/etc/init.d/httpd restart

You may need to change "Allow from 127.0.0.1" to "Allow from localhost" instead. Test your work by running "service httpd fullstatus" at command line. Failing to set this up could result in a lack of logged httpd data.

The script's MySQL check will need to be adjusted:

mysqladmin --user=admin --password=`cat /etc/psa/.psa.shadow` processlist stat >> $DIR/$FILE

The Results

Any time the script runs and the load is over the threshold (decimal points are cut off, btw) it will dump a bunch of information out to a new file in /root/loadwatch which is aptly named loadwatch.YYYY-MM-DD.HH:MM:SS. You can dig through this information to find out what is causing the load spikes. The file /root/loadwatch/checklog contains the load and time of each check. This is to verify that the script is running correctly. To keep this file from growing infinitely, once you have verified the script is working, you can symlink it to /dev/null if you so wish.

rm -f /root/loadwatch/checklog
ln -s /dev/null /root/loadwatch/checklog

Getting information quickly from checklog

 grep -B1 tripped /root/loadwatch/checklog

LoadParse

LoadParse is a script for parsing LoadWatch reports and providing useful information at a glance,

Installing:

 mkdir /root/bin
 wget http://layer3.liquidweb.com/scripts/loadparse.sh -O /root/bin/loadparse
 chmod +x /root/bin/loadparse

example output:

 michael@mikes-terminal:~/loadwatch$ loadparse loadwatch.2012-03-24.21\:09\:14 
 CPU percentage per user:
 113.1 rachellr
 23.5 nobody
 11.6 mysql
 6.8 games6z
 3 planetst
 2.3 root
 1.9 dragonnc
 1.1 sunreee
 0.6 mxtrackf
 0.6 brandla8
 
 Memory percentage per user:
 59.3 rachellr
 14 nobody
 9.6 games6z
 6 dragonnc
 4.2 planetst
 2.4 mxtrackf
 2 mysql
 1.8 root
 1.6 sunreee
 1.2 cloudroq
 
 Total Apache Requests:
 395
 
 Apache connections per domain:
     178 rachelleanselmi.com
      53 games62.com
      35 dragonnestplus.com
      26 planetsteelers.com
      25 mxtrackguide.com
       8 cloudromance.com
       7 sunreed.com
       6 scotthomesales.com
       5 daviecountyblog.com
       5 bettascapes.com
 
 Apache connections per IP:
     175 67.227.195.50   # note this is the servers main IP as the site was connecting to itself.
       9 66.87.96.248
       8 98.26.146.134
       7 24.255.193.156
       5 69.249.39.83
       5 157.55.39.91
       5 108.215.2.171
       4 98.109.157.165
       4 66.96.128.64
       4 24.7.112.55
 
 MySQL total queries:
 225
 
 MySQL running queries per dbname:
     120 rachellr_wrdp1
      28 dragonnc_wrdp1
      17 games6z_wp1
      13 mxtrackf_moto12
       7 sunreee_sunreedold
       5 planetst_plsteelerswordpress
       5 cloudroq_2
       4 planetst_vbulletin9
       3 primepah_wrdp
       2 scottham_wdps1


The actual Script


#!/usr/bin/env bash

# Set file_name to first arguement:
file_name=$1

###########################################################
#               get and verify file name
###########################################################

function get_file_name
{
# if the file_name is not empty, is a file and starts with 'loadwatch':
if [ ! -z $file_name ]  && [ -f $file_name ] && [[ $file_name == loadwatch* ]]
  then
    # Start parsing the logfile:
    parse_start
  else
    echo -e "\nThis script parses loadwatch files only.\n\nExamples:\nExecuted from the same folder as the loadwatch files:\nloadparse loadwatch.2012-03-24.21\\:09\\:14\n\nFrom elsewhere:\nloadparse /root/loadwatch/loadwatch.2012-03-24.21\\:09\\:14'"
  fi
}



###########################################################
#                   ps aux section
###########################################################

function parse_psaux
{
  # store the ps aux from loadwatch
  psaux=`cat $file_name | sed -n "/^USER.*COMMAND$/,/^$/p" | egrep -v ^$\|USER.*COMMAND`

  # Store a list of users in the ps aux
  psaux_users=`echo "$psaux" | awk '{print $1}' | sort | uniq`

  # CPU/Memory usage per user
  for user in $psaux_users
    do
      psaux_cpu_per_user="$psaux_cpu_per_user\n`echo "$psaux" | egrep ^$user | awk 'BEGIN{total=0};{total += $3};END{print total, $1}'`"
      psaux_mem_per_user="$psaux_mem_per_user\n`echo "$psaux" | egrep ^$user | awk 'BEGIN{total=0};{total += $4};END{print total, $1}'`"
    done
}

###########################################################
#                   HTTP section
###########################################################

function parse_http
{
  # Store the http connection data
  http_cons=`cat $file_name | sed -n "/Srv.*VHost.*Request/,/__________________/p" | egrep -v Srv.*VHost.*Request\|__________________\|..reading..`

  # Store a list of domains found
  http_domain_list=`echo "$http_cons" | awk '{print $12}' | sort | uniq`

  # Total Apache requests:
  http_total_requests=`echo "$http_cons" | wc -l`

  http_per_domain=`echo "$http_cons" | awk '{print $12}' | sort | uniq -c | sort -rn | head`
  http_per_ip=`echo "$http_cons" | awk '{print $11}' | sort | uniq -c | sort -rn | head`
}


###########################################################
#                   MySQL section
###########################################################

function parse_mysql
{
  mysql_queries_list=`cat $file_name | sed -n "/Id.*db.*Info/,/Uptime.*Opens.*Queries/p" | tail -n +3 | head -n -2 | grep ^\| | grep -v 'show processlist'`

  mysql_db_list=`echo "$mysql_queries_list" | awk '{print $8}' | sort | uniq | grep -v \|`

  mysql_total_queries=`echo "$mysql_queries_list" | egrep ^\| | wc -l`

  mysql_queries_by_dbname=`echo "$mysql_queries_list" | awk '{print $8 }' | sort | uniq -c | sort -rn | head`
}

###########################################################
#                   Run parses
###########################################################

function parse_start
{
  parse_psaux
  parse_http
  parse_mysql
  parse_report
}

###########################################################
#                   Reporting section
###########################################################

function parse_report
{
  # Report the data:

  echo -e "\nCPU percentage per user:"
  echo -e $psaux_cpu_per_user | grep -v ^$ | sort -rn | head

  echo -e "\nMemory percentage per user:"
  echo -e $psaux_mem_per_user | grep -v ^$ | sort -rn | head

  echo -e "\nTotal Apache Requests:\n$http_total_requests"

  echo -e "\nApache connections per domain:"
  echo -e "$http_per_domain"

  echo -e "\nApache connections per IP:"
  echo -e "$http_per_ip"

  echo -e "\nMySQL total queries:"
  echo -e "$mysql_total_queries"

  echo -e "\nMySQL running queries per dbname:"
  echo -e "$mysql_queries_by_dbname"
}

###########################################################
#      Start the script by getting the file_name
###########################################################

get_file_name



Add email notification to help desk when the script is triggered

Please be thoughtful about this, it is usually not necessary, your coworkers will be sad if this starts spamming hd. And if you do use it, you can should add a ticket number to the subject line, so we know why you are monitoring.

Also, as this doesn't put any useful information in the email it creates, it would be better to just install swapwatch.sh on the server as that will provide us with useful information rather than setting the following up.


Add these four lines at the top under !#/bin/bash

 SUBJECT="LoadWatch on $HOSTNAME triggered. Please Check it out."

 EMAIL="$SET_EMAIL_HERE"

 EMAILMESSAGE="/tmp/emailmessage.txt"

 echo "LoadWatch on $HOSTNAME triggered. Please Check it out."> $EMAILMESSAGE

Then add this just above the fi statement at the bottom:

 /bin/mail -s "$SUBJECT" "$EMAIL" < $EMAILMESSAGE