Loadwatch
LoadWatch
http://mdwiki.hostbaitor.com/#!loadwatch.md
Purpose
LoadWatch was hacked together with the intent to help track down intermittent load spikes. This is generally meant as a short term information gathering script to help catch load issues which we are having problems catching in the act.
Installation
- Create the necessary directories, touch an empty file for the script, make it executable and then edit the empty file to include the LoadWatch script:
mkdir /root/bin mkdir /root/loadwatch touch /root/bin/loadwatch.sh chmod 700 /root/bin/loadwatch.sh $EDITOR /root/bin/loadwatch.sh
By default, the script monitors 1-minute load. ( grep -c proc /proc/cpuinfo is a good place to start )
Set THRESH intelligently.
Look at
grep -c proc /proc/cpuinfo
to see how many cores there are. The default of 60 will not capture anything useful on servers with fewer CPU cores. Setting this equal to number of cores or higher will only begin logging after CPU usage is above 100% and you will miss any information leading up to it. Setting to lower than number of cores will capture more information, but add more overhead. Use good judgment.
- Copy the script below into /root/bin/loadwatch.sh using the text editor of your choice:
#!/bin/bash FILE=loadwatch.`date +%F.%H.%M` DIR=/root/loadwatch #Load Threshold for doing a dump. THRESH=60 LOAD=`cat /proc/loadavg | awk '{print $1}' | awk -F '.' '{print $1}'` echo `date +%F.%X` - Load: $LOAD >> $DIR/checklog if [ $LOAD -gt $THRESH ] then echo Loadwatch tripped, dumping info to $DIR/$FILE >> $DIR/checklog echo `date +%F.%H.%M` > $DIR/$FILE free -m >> $DIR/$FILE mysqladmin processlist stat >> $DIR/$FILE /sbin/service httpd fullstatus >> $DIR/$FILE netstat -tn 2>/dev/null | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head >> $DIR/$FILE top -bcn1 >> $DIR/$FILE ps auxf >> $DIR/$FILE #uncomment the following line to monitor exim #/usr/sbin/exiwhat >> $DIR/$FILE fi
- Set up a cron job to run about every 3 minutes or so. PASTE THIS LINE ENTIRELY INTO A BASH PROMPT. DO NOT ADD IT TO CRONTAB THE WAY IT IS.
echo "*/3 * * * * /root/bin/loadwatch.sh > /dev/null 2>&1" >> /var/spool/cron/root /etc/init.d/crond restart
- Test your work:
crontab -l|grep loadwatch
Should show:
*/3 * * * * /root/bin/loadwatch.sh > /dev/null 2>&1
If Plesk, What Do?
Assuming CentOS:
Edit /etc/httpd/conf/httpd.conf and, if it exists, uncomment
#ExtendedStatus On
Then uncomment the server-status vhost. It should look like this when you're done:
<Location /server-status> SetHandler server-status Order deny,allow Deny from all Allow from localhost </Location>
Now restart Apache
/etc/init.d/httpd restart
You may need to change "Allow from 127.0.0.1" to "Allow from localhost" instead. Test your work by running "service httpd fullstatus" at command line. Failing to set this up could result in a lack of logged httpd data.
The script's MySQL check will need to be adjusted:
mysqladmin --user=admin --password=`cat /etc/psa/.psa.shadow` processlist stat >> $DIR/$FILE
The Results
Any time the script runs and the load is over the threshold (decimal points are cut off, btw) it will dump a bunch of information out to a new file in /root/loadwatch which is aptly named loadwatch.YYYY-MM-DD.HH:MM:SS. You can dig through this information to find out what is causing the load spikes. The file /root/loadwatch/checklog contains the load and time of each check. This is to verify that the script is running correctly. To keep this file from growing infinitely, once you have verified the script is working, you can symlink it to /dev/null if you so wish.
rm -f /root/loadwatch/checklog ln -s /dev/null /root/loadwatch/checklog
Getting information quickly from checklog
grep -B1 tripped /root/loadwatch/checklog
LoadParse
LoadParse is a script for parsing LoadWatch reports and providing useful information at a glance,
Installing:
mkdir /root/bin wget http://layer3.liquidweb.com/scripts/loadparse.sh -O /root/bin/loadparse chmod +x /root/bin/loadparse
example output:
michael@mikes-terminal:~/loadwatch$ loadparse loadwatch.2012-03-24.21\:09\:14 CPU percentage per user: 113.1 rachellr 23.5 nobody 11.6 mysql 6.8 games6z 3 planetst 2.3 root 1.9 dragonnc 1.1 sunreee 0.6 mxtrackf 0.6 brandla8 Memory percentage per user: 59.3 rachellr 14 nobody 9.6 games6z 6 dragonnc 4.2 planetst 2.4 mxtrackf 2 mysql 1.8 root 1.6 sunreee 1.2 cloudroq Total Apache Requests: 395 Apache connections per domain: 178 rachelleanselmi.com 53 games62.com 35 dragonnestplus.com 26 planetsteelers.com 25 mxtrackguide.com 8 cloudromance.com 7 sunreed.com 6 scotthomesales.com 5 daviecountyblog.com 5 bettascapes.com Apache connections per IP: 175 67.227.195.50 # note this is the servers main IP as the site was connecting to itself. 9 66.87.96.248 8 98.26.146.134 7 24.255.193.156 5 69.249.39.83 5 157.55.39.91 5 108.215.2.171 4 98.109.157.165 4 66.96.128.64 4 24.7.112.55 MySQL total queries: 225 MySQL running queries per dbname: 120 rachellr_wrdp1 28 dragonnc_wrdp1 17 games6z_wp1 13 mxtrackf_moto12 7 sunreee_sunreedold 5 planetst_plsteelerswordpress 5 cloudroq_2 4 planetst_vbulletin9 3 primepah_wrdp 2 scottham_wdps1
The actual Script
#!/usr/bin/env bash # Set file_name to first arguement: file_name=$1 ########################################################### # get and verify file name ########################################################### function get_file_name { # if the file_name is not empty, is a file and starts with 'loadwatch': if [ ! -z $file_name ] && [ -f $file_name ] && [[ $file_name == loadwatch* ]] then # Start parsing the logfile: parse_start else echo -e "\nThis script parses loadwatch files only.\n\nExamples:\nExecuted from the same folder as the loadwatch files:\nloadparse loadwatch.2012-03-24.21\\:09\\:14\n\nFrom elsewhere:\nloadparse /root/loadwatch/loadwatch.2012-03-24.21\\:09\\:14'" fi } ########################################################### # ps aux section ########################################################### function parse_psaux { # store the ps aux from loadwatch psaux=`cat $file_name | sed -n "/^USER.*COMMAND$/,/^$/p" | egrep -v ^$\|USER.*COMMAND` # Store a list of users in the ps aux psaux_users=`echo "$psaux" | awk '{print $1}' | sort | uniq` # CPU/Memory usage per user for user in $psaux_users do psaux_cpu_per_user="$psaux_cpu_per_user\n`echo "$psaux" | egrep ^$user | awk 'BEGIN{total=0};{total += $3};END{print total, $1}'`" psaux_mem_per_user="$psaux_mem_per_user\n`echo "$psaux" | egrep ^$user | awk 'BEGIN{total=0};{total += $4};END{print total, $1}'`" done } ########################################################### # HTTP section ########################################################### function parse_http { # Store the http connection data http_cons=`cat $file_name | sed -n "/Srv.*VHost.*Request/,/__________________/p" | egrep -v Srv.*VHost.*Request\|__________________\|..reading..` # Store a list of domains found http_domain_list=`echo "$http_cons" | awk '{print $12}' | sort | uniq` # Total Apache requests: http_total_requests=`echo "$http_cons" | wc -l` http_per_domain=`echo "$http_cons" | awk '{print $12}' | sort | uniq -c | sort -rn | head` http_per_ip=`echo "$http_cons" | awk '{print $11}' | sort | uniq -c | sort -rn | head` } ########################################################### # MySQL section ########################################################### function parse_mysql { mysql_queries_list=`cat $file_name | sed -n "/Id.*db.*Info/,/Uptime.*Opens.*Queries/p" | tail -n +3 | head -n -2 | grep ^\| | grep -v 'show processlist'` mysql_db_list=`echo "$mysql_queries_list" | awk '{print $8}' | sort | uniq | grep -v \|` mysql_total_queries=`echo "$mysql_queries_list" | egrep ^\| | wc -l` mysql_queries_by_dbname=`echo "$mysql_queries_list" | awk '{print $8 }' | sort | uniq -c | sort -rn | head` } ########################################################### # Run parses ########################################################### function parse_start { parse_psaux parse_http parse_mysql parse_report } ########################################################### # Reporting section ########################################################### function parse_report { # Report the data: echo -e "\nCPU percentage per user:" echo -e $psaux_cpu_per_user | grep -v ^$ | sort -rn | head echo -e "\nMemory percentage per user:" echo -e $psaux_mem_per_user | grep -v ^$ | sort -rn | head echo -e "\nTotal Apache Requests:\n$http_total_requests" echo -e "\nApache connections per domain:" echo -e "$http_per_domain" echo -e "\nApache connections per IP:" echo -e "$http_per_ip" echo -e "\nMySQL total queries:" echo -e "$mysql_total_queries" echo -e "\nMySQL running queries per dbname:" echo -e "$mysql_queries_by_dbname" } ########################################################### # Start the script by getting the file_name ########################################################### get_file_name
Add email notification to help desk when the script is triggered
Please be thoughtful about this, it is usually not necessary, your coworkers will be sad if this starts spamming hd.
And if you do use it, you can should add a ticket number to the subject line, so we know why you are monitoring.
Also, as this doesn't put any useful information in the email it creates, it would be better to just install swapwatch.sh on the server as that will provide us with useful information rather than setting the following up.
Add these four lines at the top under !#/bin/bash SUBJECT="LoadWatch on $HOSTNAME triggered. Please Check it out." EMAIL="$SET_EMAIL_HERE" EMAILMESSAGE="/tmp/emailmessage.txt" echo "LoadWatch on $HOSTNAME triggered. Please Check it out."> $EMAILMESSAGE Then add this just above the fi statement at the bottom: /bin/mail -s "$SUBJECT" "$EMAIL" < $EMAILMESSAGE