Disk Cleanup

From Just another day in the life of a linux sysadmin
Jump to navigation Jump to search

Finding Disk Utilization

Partitions fill up. It's the way of things. Cleaning things up can be easy once you know where space is being utilized.

Here's a lil' shortcut to quickly finding a partition over 90% (of course, change 90 in this line to whatever threshold you'd like):

df -h | sed 1d | awk '{if ($5 > 90) print $6 " is at " $5;}'

First off, du is your friend. Using it will allow you to track disk usage in any partition quite easily. I recommend the following command to check things out.

du -hx --max-depth 1

or even better

du -sk ./* | sort -nr | awk 'BEGIN{ pref[1]="K"; pref[2]="M"; pref[3]="G";} { total = total + $1; x = $1; y = 1; while( x > 1024 ) { x = (x + 1023)/1024; y++; } printf("%g%s\t%s\n",int(x*10)/10,pref[y],$2); } END { y = 1; while( total > 1024 ) { total = (total + 1023)/1024; y++; } printf("Total: %g%s\n",int(total*10)/10,pref[y]); }'


Its easy to see how this technique can be used to show a good breakdown for disk usage. Sometimes you might see disk usage that does not "add up". This is where the apparent disk usage for the partition doesn't match what the output of the du command shows. This typically happens when logfiles are deleted, but the server process that is writing to them still has the files open. The file will continue to take up disk space until it is closed, even thought it doesn't appear in the filesystem tree. Once it is closed, the space will be freed. You can identify these deleted files by running the following command.

lsof | grep deleted

It should return output in the following format. (I inserted the column header for reference)

COMMAND     PID     USER   FD   TYPE     DEVICE      SIZE      NODE NAME
httpd       546     root    3u   REG        8,8         0        46 /tmp/ZCUDfVfcQp (deleted)
httpd       547     root    3u   REG        8,8         0        46 /tmp/ZCUDfVfcQp (deleted)
httpd       548     root    3u   REG        8,8         0        46 /tmp/ZCUDfVfcQp (deleted)
httpd       748     root    3u   REG        8,8         0        46 /tmp/ZCUDfVfcQp (deleted)
mysqld      844     root    6u   REG        8,8         0        17 /tmp/ib8chgxk (deleted)
mysqld      844     root    7u   REG        8,8      3000        25 /tmp/ibMnPjAD (deleted)
mysqld      844     root   12u   REG        8,8         0        30 /tmp/ibMtDLhA (deleted)
mysqld      966     root    6u   REG        8,8         0        17 /tmp/ib8chgxk (deleted)
mysqld      966     root    7u   REG        8,8      3000        25 /tmp/ibMnPjAD (deleted)
mysqld      966     root   12u   REG        8,8         0        30 /tmp/ibMtDLhA (deleted)
httpd      3943     root    3u   REG        8,8         0        46 /tmp/ZCUDfVfcQp (deleted)
cpbandwd   6309     root    1w   REG        8,5    151453    448340 /var/cpanel/updatelogs/update.1162606882.postinstall.log (deleted)
cpbandwd   6309     root    2w   REG        8,5    151453    448340 /var/cpanel/updatelogs/update.1162606882.postinstall.log (deleted)

The first two columns show the name and PID number of the process keeping the files open. If you see that the filesize is large, you can free up the space by killing/restarting the process that is keeping the file open.

Note also that sometimes disk usage errors are not the result of physical disk space being used up, but rather the number of inodes. You can see the inode usage of a given drive thus:

df -hi

Should a partition show 100% disk inode usage, you can quickly find the directories on that partition with the most files in them with this sweet little one liner:

find . -type d | while read line; do echo "$( find "$line" -maxdepth 1 | wc -l) $line"; done | sort -rn | less

This will give you a nice 'less' screen with the directories in order by file count, where you can pretty easily figure out which ones are eating the inodes. You may wish to drop the pipe to less and redirect to a text file for easier culling.


Cleaning Partitions

Once the "offending" files/directories are identified, they can be cleaned. The following sections identify common sources of sizable disk utilization, and how to fix issues that result from that utilization.

/

There are typically only a few situations where the root (/) partition will fill up:

Broken /backup

There are instances where a backup process gets confused and thinks that the backup drive is mounted at /backup, but is really not, and backups are written into the / partition. Run unmount /backup a few time to make sure that the backup drive is unmounted, and then check the disk usage in /backup. If there are backups present, that's probably your problem. If the server has a backup drive and should be backing up to that location, removing these backups is advisable. Again before you remove these it is very important to verify that the backup drive is not mounted.

This can be verified by running:

df -h

It should show something like this:

[root@server ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda6             2.0G  572M  1.4G  30% /
tmpfs                 3.9G     0  3.9G   0% /dev/shm
/dev/sda1             194M   78M  107M  43% /boot
/dev/sda8             864G  274G  547G  34% /home
/dev/sda7             2.0G  135M  1.8G   8% /tmp
/dev/sda2              20G   13G  6.4G  67% /usr
/dev/sda5              20G   14G  4.9G  74% /var 

There you can see that all of the partitions are on the drive /dev/sda and there is no /backup listed.

run the command

fdisk -l

This will list the drives in the server. It should list something like this:

[root@server ~]# fdisk -l

Disk /dev/sda: 147.0 GB, 147015821824 bytes
255 heads, 63 sectors/track, 17873 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sda1   *         1        13    104391   83  Linux
/dev/sda2            14      1283  10201275   83  Linux
/dev/sda3          1284      1537   2040255   82  Linux swap
/dev/sda4          1538     17849 131026140    f  Win95 Ext'd (LBA)
/dev/sda5          1538      2807  10201243+  83  Linux
/dev/sda6          2808      2938   1052226   83  Linux
/dev/sda7          2939      3069   1052226   83  Linux
/dev/sda8          3070     17849 118720318+  83  Linux

Disk /dev/hda: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1             1     30401 244196001   83  Linux


By looking at this it is pretty clear that /dev/sda is the primary drive on the server (you can see all the partitions listed there) Which means /dev/hda1 is probably the /backup drive. To check you can run this:

e2label /dev/hda1

The output of that command looks like this:


[root@server ~]# e2label /dev/hda1 /backup

The letters on the drives are not always the same. A back up drive could be /dev/sdb1 or /dev/sda1 or /dev/hdb1 etc. so please make sure that you are looking at the partitions and using the e2label command to determine the correct drive. Most of the time the backup drive will be the only partition on a single disk.

You can accidentally change the name of a partition with the e2label command so make sure you run it as displayed above!

If the backup drive does not have a label it is best to add one now. Run the following command to label the backup drive:

e2label /dev/hda1 /backup

At this point you should check the fstab for an entry for the /backup drive

vim /etc/fstab

It will look something like this:

LABEL=/                 /                       ext3    defaults,usrquota        1 1
LABEL=/boot             /boot                   ext3    defaults        1 2
none                    /dev/pts                devpts  gid=5,mode=620  0 0
LABEL=/home             /home                   ext3    defaults,usrquota        1 2
none                    /proc                   proc    defaults        0 0
none                    /dev/shm                tmpfs   defaults        0 0
LABEL=/tmp              /tmp                    ext3    defaults,noexec,nodev,nosuid        1 2
LABEL=/usr              /usr                    ext3    defaults,usrquota        1 2
LABEL=/var              /var                    ext3    defaults,usrquota        1 2
/dev/sda3               swap                    swap    defaults        0 0


In this case the backup drive is not listed, so even if we mount it manually it will not mount itself if the server is rebooted. The line should look like this when you add it to the file:

LABEL=backup                 /backup                       ext3    defaults        1 2

If the backup drive is not labeled you can also use the below entry in fstab. (It is still recommended to label the drive if it is replaced down the road the new drive can be labeled /backup and continue to function using the above entry in fstab)

/dev/hda1                    /backup                       ext3    defaults        1 2

You should now be able to run the command and have it mount to the proper location.

mount /backup 

Before you mount the backup drive you should make sure any broken backups are cleaned out of /backup. You should run umount /backup before deleting anything there just to be sure you are not deleting good backups!

Finding Inode Utilization

Note also that sometimes disk usage errors are not the result of physical disk space being used up, but rather the number of inodes. You can see the inode usage of a given drive thus:

df -hi

Should a partition show 100% disk inode usage, you can quickly find the directories on that partition with the most files in them with this sweet little one liner:

find . -type d | while read line; do echo "$( find "$line" -maxdepth 1 | wc -l) $line"; done | sort -rn | less

This will give you a nice 'less' screen with the directories in order by file count, where you can pretty easily figure out which ones are eating the inodes. You may wish to drop the pipe to less and redirect to a text file for easier culling.
If / is the partition full of inodes on a dedi though, just searching within /root/ should save you some time instead of looking inside every single partition all at once. You can also use the -xdev flag for find to keep it from looking on other partitions There is also a script to assist with the finding of inodes, however learning how to use find to locate these doesnt hurt either.

wget -O /scripts/inodes.sh http://layer3.liquidweb.com/scripts/inodes.sh
chmod +x /scripts/inodes.sh
/scripts/inodes.sh

While following these procedures, if you notice that the directory /.cpanel/comet is the primary culprit in inode usage, there happens to be a handy script to clear out unneeded files.

/usr/local/cpanel/bin/purge_dead_comet_files

Also worth checking default email accounts on cpanel servers, e.g /home/$user/mail/cur|new

and ` exim -bpc`



/lib/modules

On some older machines (FC2 boxes especially), there will be many kernels installed, and their module directories will reside in the / partition under /lib/modules. It is safe to remove older kernels as long as the server is not booted into the kernel you're removing. Just make sure to check grub.conf after you're done removing kernels to make sure that grub still points to a kernel that exists.

The following script can help with this task immensely: http://layer3.liquidweb.com/scripts/kernelcleaner.sh

Non-standard directories

Sometimes customers will place data/program directories in the / partition, not knowing that it is formatted with a small filesystem. There's no way around this really except to inform the customer that it is advisable to move their data/program to another partition with more space.

/root

Since root's home directory (/root) is typcially on the / partition, it can fill up with downloaded files, saved logs, or anything else that gets placed in /root. Cleaning out unneeded data from this directory can save a lot of space. Frequently, /root/loadMon is filled up the fastest. Disable written logs to prevent this from occurring

/tmp

Typically, there is nothing here that can't be removed except the mysql.sock socket file. After all, it is temporary space.

/boot

The only times I've ever seen issues with the /boot partition filling up is when the system mistakenly mounted /boot at /backup and tried to place backups there. These are obvious and easy to remove.

/usr

There are quite a few things in /usr that can eat up disk space. Most of them are CPanel or apache related.

/usr/src

/usr/src/kernels

Quite frequently, the OS will place kernel source directories under /usr/src/kernels. If the user on the box isn't compiling their own kernels, these sources are typically unneeded. Running the following command will clear the kernel sources if they're are provided by the 'kernel-sources' RPM package.

rpm -qa kernel-source | xargs rpm -e

If there is no kernel-source package installed, removing these sources should proceed with discretion. Make sure the kernel the server is running isn't custom compiled, and doesn't depend on that source directory.

Other

There may be other source directories in /usr/src. Unless they were put there by Liquid Web and we know that they can be removed, it's best to let them be. If it's imperative to free up the space they're taking, the directories can be tar'd up and (g|b)zipped, and the originals removed. If you do this, leave a note in the account in case this causes an issue.

/usr/local/apache

On a CPanel machine, all of the logs for Apache will be placed under /usr/local/apache.

/usr/local/apache/domlogs

The domain logs (and other CPanel data) will reside in /usr/local/apache/domlogs, and is a common source of large disk utilization. Unless the customer is willing to remove the logs or set CPanel to rotate them frequently, moving the directory to another partition and making a symlink is the only option. Fortunately, /home is almost always huge and under-utilized, so moving them is an easy band-aid. First, copy the logs to /home using the following commands.

mkdir /home/domlogs
rsync -avHP /usr/local/apache/domlogs/ /home/domlogs/

After the data is copied, stop apache, and do a final sync of the logs. It is then safe to move the orignal directory to the side, create the symlink, and restart apache.

service httpd stop
rsync -avHP /usr/local/apache/domlogs/ /home/domlogs/
mv /usr/local/apache/domlogs{,.bak}
ln -s /home/domlogs /usr/local/apache/domlogs
service httpd start

Once you verify that data is being written properly to the new /home/domlogs directory, it is safe to remove the old domlogs directory.

rm -rf /usr/local/apache/domlogs.bak

/usr/local/apache/logs

The default apache logs and a few other things will reside in /usr/local/apache/logs. Large disk utilization in this directory is common if the error_log and access_log files aren't being rotated and/or compressed properly. This is quite common. Adding a logrotate script to clean up apache will normally take care of this problem. The default logrotate script is either /etc/logrotate.d/apache or /etc/logrotate.d/httpd. Sometimes both will be present. If both are present, remove one, and replace the contents of the other with the following content.

#You may need to tweak this a bit for the customer's specific domains.  
#The below example will rotate all of apache's core logs, 
# Take a look at their domlogs and use your judgment to add additional paths to the line at the top.
# This part of the log line should not be added unless they have disabled awstats/cpanellogd:
# /usr/local/apache/domlogs/*.com /usr/local/apache/domlogs/*.net

 /usr/local/apache/logs/*log {
    compress
    weekly
    notifempty
    missingok
    rotate 3
    sharedscripts
    postrotate
        /bin/kill -HUP `cat /usr/local/apache/logs/httpd.pid 2>/dev/null` 2> /dev/null || true
    endscript
 }

You can force the rotation of the logs by running the following command (replacing [filename] with either apache or httpd, whichever exists).

logrotate -f /etc/logrotate.d/[filename]

/usr/local/cpanel

This directory is a beast. Cpanel keeps so much junk in here its not even funny. It tends to be large, but it *cannot* be symlinked anywhere. Doing so will defeat the patched suexec that cpanel uses to make the mailman and cgi-sys directories work properly and will result in internal server errors being displayed on their access.

/usr/local/cpanel-rollback

Look for last modified and delete all but the latest 1 or 2 sets.

/usr/local/cpanel/3rdparty/mailman

  • If the server has even one busy mailing list on it, this directory can take up a ton of space. If you're lucky, only the logs/ directory will be large. This just holds the mailman logfiles, so if it's large and the logfiles are unneeded, they can be discarded.
  • The other directory that tends to be large is the archives/ directory. This is customer data, and shouldn't be removed. This is another candidate for symlinking to /home. Please make note that you *cannot* symlink the whole mailman directory to /home. Mailman will break, and you will be hearing from customers. The proper action is to symlink the archives directory to /home.
mkdir -p /home/mailman/archives
rsync -avHl /usr/local/cpanel/3rdparty/mailman/archives/ /home/mailman/archives/

After the data is copied, stop CPanel/mailman and do a final sync of the archives. It is then safe to move the orignal directory to the side, create the symlink, and restart CPanel/mailman.

service cpanel stop
rsync -avHl /usr/local/cpanel/3rdparty/mailman/archives/ /home/mailman/archives/
mv /usr/local/cpanel/3rdparty/mailman/archives /usr/local/cpanel/3rdparty/mailman/archives.bak
ln -s /home/mailman/archives /usr/local/cpanel/3rdparty/mailman/archives
service cpanel start

Once you verify that data is being written properly to the new /home/mailman/archives directory, it is safe to remove the old archives directory.

rm -rf /usr/local/cpanel/3rdparty/mailman/archives.bak
  • A third directory that can, on occasion, get ridiculously large is the data/ directory. This can fill up with pickled held messages. DO NOT JUST RM THEM. Doing so may cause breakage to the mailman interface, and we wouldn't want that. Instead:
cd /usr/local/cpanel/3rdparty/mailman
bin/discard data/heldmsg-<listname>-*

If for some reason there are so many that the bin/discard program chokes on the wildcard expansion, try this fu:

find ./data -name heldmsg-<listname>-* -print | xargs bin/discard

Mad props to mailman for actually knowing about this issue: http://wiki.list.org/pages/viewpage.action?pageId=4030620

As of this writing, that discard script is not entirely reliable.

/usr/local/cpanel/logs

These logs can get quite large without rotation, but can be very useful in investigating many problems. Set up logrotate for cPanel logs through WHM >> "cPanel Log Rotation Configuration" on the server, and set an appropriate threshold.

/usr/local/cpanel/src

This directory is where CPanel stores source code for software it builds. The only directory I've seen inside of it that has any notable size is the 3rdparty directory. It contains sources for third-party applications, and I've never seen an issue from removing all of the contents therein, since upcp will repopulate that directory if it needs to build something there.

rm -rf /usr/local/cpanel/src/3rdparty/*

/usr/local/lp/logs/httpd

This directory holds Mr. Radar logs. These logs can be symlinked to /dev/null. You will see a logfile with a name similar to servXXXXXXX.sn.sourcedns.com.

 rm -f /usr/local/lp/logs/httpd/servXXXXXXX.sn.sourcedns.com && ln -s /dev/null /usr/local/lp/logs/httpd/servXXXXXX.sn.sourcedns.com 

(where servXXXXXX is the actual name of the file in the directory)

/usr/local/jakarta

If the user is running Tomcat on their server, one of the log files can grow to be extremely large. To clear this file, do the following:

/usr/local/jakarta/tomcat/bin/shutdown.sh
rm /usr/local/jakarta/tomcat/logs/catalina.out
/usr/local/jakarta/tomcat/bin/startup.sh

/usr/share/

/usr/share/clamav/

It's been found that you can move this, just create a symlink back.

mv /usr/share/clamav /home/usr_share_clamav
ln -s /home/usr_share_clamav /usr/share/clamav
/etc/init.d/exim restart

/var

/var/cpanel/bandwidth

Cpanel seems to have given us another reason to clean out /var as of late. The fix in this case would be to do the following:

killall cpanellogd
mkdir /home/bandwidth
chown root:wheel /home/bandwidth
chmod 755 /home/bandwidth
rsync -avHl /var/cpanel/bandwidth/ /home/bandwidth 

Doublecheck to make sure tailwatchd is not running and then run:

/usr/local/cpanel/bin/tailwatchd stop
rsync -avHl /var/cpanel/bandwidth/ /home/bandwidth
mv /var/cpanel/bandwidth /var/cpanel/bandwidth.bak
ln -s /home/bandwidth /var/cpanel/bandwidth
/usr/local/cpanel/bin/tailwatchd start

After verifying the new directory is working correctly, remove the old:

rm -rf /var/cpanel/bandwidth.bak

/var/cache/yum

This is where yum stores a lot of its stuff, including downloaded RPMs. It's always completely safe to clean this directory with the command:

 yum clean all

/var/log

This directory is a haven for large log files. It is quite often the case that logrotate isn't set to compress log files here, and they grow to great size. This is easily fixed though. If logrotate isn't compressing the logs as they're rotated, you'll see something like this in a directory listing.

-rw-------    1 root     root       637071 Jan 24 21:26 messages
-rw-------    1 root     root      3563526 Jan 21 04:03 messages.1
-rw-------    1 root     root      3805857 Jan 14 04:03 messages.2
-rw-------    1 root     root      3421860 Jan  7 04:03 messages.3
-rw-------    1 root     root      1019255 Dec 31 04:04 messages.4

Fixing this is a simple step process.

  • Edit /etc/logrotate.d/syslog to include 'compress' on its own line, as per the following example.
vim /etc/logrotate.d/syslog
/var/log/messages /var/log/secure /var/log/maillog /var/log/spooler /var/log/boot.log /var/log/cron {
   sharedscripts
   compress
   postrotate
       /bin/kill -HUP `cat /var/run/syslogd.pid 2> /dev/null` 2> /dev/null || true
   endscript
}
  • compress the existing logfile.n logfiles, so that logrotate will rotate them properly in the future.
cd /var/log
gzip *.?
  • run logrotate to compress the current logfile, if needed.
logrotate -f /etc/logrotate.d/syslog

There may also be large Apache SSL log files present here. Setting up logrotate to compress and rotate the apache default log files as explained here will take care of them.

If space is an issue even after compression, removal of the oldest log files is a viable option. Just make sure to leave the logs for at least the past two weeks if it is possible.


/var/log/audit.d

This directory isn't typically a problem, but on some servers it can be. It's filled up by the audit daemon's binary log files, and I've seen it reach 7GB before on systems with large /var partitions. We've found no use for the audit daemon (and it's harmful in certain instances), so it can safely be disabled, and its logs removed.

/etc/init.d/auditd stop
chkconfig auditd off
rm -rf /var/log/audit.d/*

/var/tmp

This is temporary space, so its generally safe to delete anything in here except the mysql socket file (mysql.sock).

/var/lib/mysql

This is where mysql keeps its data by default. This can be problematic with our default partitioning system when the customer has large databases. Moving the mysql data directory to a larger partition is the only option. For instructions on how to do this, see Moving the MySQL Data Directory to /home.

Exim Stats ( /var/lib/mysql/eximstats/ )

This can at times grow very large. It will hold stats about exim which most users never interact with. Once you confirm they do not need to use these stats they can be cleaned out.

 mysql eximstats

Then if you:

 mysql> show tables;
 +---------------------+
 | Tables_in_eximstats |
 +---------------------+
 | defers              | 
 | failures            | 
 | sends               | 
 | smtp                | 
 +---------------------+

You will see the tables these stats make up. You can delete them:

 truncate defers;
 truncate failures;
 truncate sends;
 truncate smtp;

This can be accomplished from the command line too:

 mysql eximstats -e "truncate defers;truncate failures;truncate sends;truncate smtp;"

Then restart mysql

If eximstats is especially large and or the drive is maxed out using truncate can be a pain.  So instead do the following.  FYI, do not do this for other databases with similar issues.
mysqldump -d eximstats > /root/eximstats.sql
rm /var/lib/mysql/eximstats/*
mysql eximstats < /root/eximstats.sql

This will remove the information. You can also in WHM -> Service Manger disable exim stats or in Tweak Settings the number of days to keep the stats (default is 90).

/home

If /home is full, there are a few things that will provide some breathing room, but since the majority of the stuff in /home is customer data, there isn't much that can be done except add another drive.

/home/temp

This is a directory we add on setup of the server, and can sometimes contain a large amount of data that can be removed. If something large in this directory hasn't been touched in two weeks or more, I consider it safe to remove.

/home/cprestore

Can have some old backups, check the date and if it seems old enough remove em. Also look for old backup files in /home/

CPanel build directories

Since it is typically large, CPanel does its large software builds in directories in /home. If space is a concern, the cpzendinstall/ and cpapachebuild/ directories can be removed.

CPanel account transfer file directories

CPanel places the tar archives it uses for account transfers and uncompresses them to /home. These files and directories can be removed in a crunch. Files/directores beginning with cpmove or cprestore can be removed safely if they exist.

CPAN directories

in /home, there is quite typically a directory that perl's CPAN system uses to hold onto its cached information, and they also hold source/build directories. The source and build directories can be cleaned to save space.

rm -rvf /home/.cpan/build/*
rm -rvf /home/.cpan/sources/*

Directories symlinked from other partitions

Directories symlinked from other filesystems are placed in /home for a reason, but if space is limited on /home, moving them back to their original filesystem (if space permits) can provide some breathing room.


/backup

Since this is typically set up using a dedicated backup drive, it can quite often be highly utilized. However, if it's completely full, it can break the backup processes.

CPanel backups

If the drive is full of CPanel backups, there's not much that can be done except deleting backups. If all of the backup timeframes (daily, weekly, monthly) are enabled, disabling one of them may allow the backup process to complete properly.