Domlog Diving

From Just another day in the life of a linux sysadmin
Jump to navigation Jump to search

cPanel

With information from sar you can see when the load spikes are happening and use that to do a more targeted search in the domlogs. Typically the timestamp is in the format of day, three letter abbreviation of the month, year, hour, minute, seconds (DD/Mon/YYYY:HH:MM:SS). So 01/Jan/2017:01:32:26 would be January 1st, 2017 at 1:32AM and 26 seconds as an example. If a load spike happened that day (January 1st, 2017) starting around 9AM we could do a search like the following. Also do not forget that the path has changed for EA4 to /var/log/apache2/domlogs.

EA3

grep -s "01/Jan/2017:09" /home/domlogs/* | grep POST | awk '{print $1}' | cut -d: -f1 | sort | uniq -c | sort -rn | head
grep -s "01/Jan/2017:09" /home/domlogs/* | grep GET | awk '{print $1}' | cut -d: -f1 | sort | uniq -c | sort -rn | head
grep -s "01/Jan/2017:09" /home/domlogs/* | egrep -i '(crawl|bot|spider|yahoo|bing|google)'| awk '{print $1}' | cut -d: -f1 | sort | uniq -c | sort -rn | head

EA4


grep -s "01/Jun/2017:09" /var/log/apache2/domlogs/* | grep POST | awk '{print $1}' | cut -d: -f1 | sort | uniq -c | sort -rn | head
grep -s "01/Jun/2017:09" /var/log/apache2/domlogs/* | grep GET | awk '{print $1}' | cut -d: -f1 | sort | uniq -c | sort -rn | head
grep -s "01/Jun/2017:09" /var/log/apache2/domlogs/* | egrep -i '(crawl|bot|spider|yahoo|bing|google)'| awk '{print $1}' | cut -d: -f1 | sort | uniq -c | sort -rn | head


The above searches will give the top ten number of POST requests, GET requests, and bot/crawler requests per domain name respectively. By changing the cut to print the 2nd field [awk '{print $2}'], it should give us the top ten IPs instead.

grep -s "01/Jan/2017:09" /home/domlogs/* | grep POST | awk '{print $1}' | cut -d: -f2 | sort | uniq -c | sort -rn | head

If you change the awk to print the 7th field you can find the top number of uri's being requested.

grep -s "01/Jan/2017:09" /home/domlogs/* | grep POST | awk '{print $7}' | cut -d: -f2 | sort | uniq -c | sort -rn | head

Using egrep we can expand or shrink the time frame we are looking for in the logs as well.

egrep '01/Jan/2017:(09|10)' /home/domlogs/* | grep POST | awk '{print $1}' | cut -d: -f1 | sort | uniq -c | sort -rn | head

This will search for POST requests made January 1st, 2017 between 9-10:59AM.

egrep '01/Jan/2017:09:[0-2]' /home/domlogs/* | grep POST | awk '{print $1}' | cut -d: -f1 | sort | uniq -c | sort -rn | head

This will search for POST requests made January 1st, 2017 between 9-9:29AM so effectively the first half hour if the load spike only lasted that long. The commands listed above are just examples and there is a lot you can search for in this fashion, but with a more targeted time frame it can give us a better idea of what was happening on the server around the time of the load spike.

If the domlogs have rotated since the load spike occurred and the customer has archived access logs enabled we should be able to search those as well. These files are compressed with gzip so we will need to use zgrep and zegrep for our searches.

 zgrep "01/Jan/2017:09" /home/*/logs/* | grep POST | awk '{print $1}' | cut -d: -f1 | sort | uniq -c | sort -rn | head
 zegrep '01/Jan/2017:(09|10)' /home/*/logs/* | grep POST | awk '{print $1}' | cut -d: -f1 | sort | uniq -c | sort -rn | head

IP looks dossy or has no referrer: Sometimes, the above searches give you results with IPs that have no referrer. Or you see some really suspicious stuff from a single IP and wonder if it hit any additional sites. These searches will just give you the domlogs of where said IP hit, and how many times it hit the sites:

 grep -l "123.45.67.123" /home/domlogs/* | while read i; do echo $i; grep -c 123.45.67.123 $i; done

Or if it's in an archived domlog:

 zgrep -l "123.45.67.123" /home/*/logs/*-Nov-2014.gz | while read i; do echo $i; zgrep -c 123.45.67.123 $i; done

Once you have this information, you can grep the listed domlogs for the IP to better find out what it was doing.

Make sure to check the timezone of the server to make sure you are looking for the correct time frame(s).