Monday, September 15, 2014

XenServer Root Disk Maintenance [feedly]



----
XenServer Root Disk Maintenance
// Latest blog entries

The Basis for a Problem

For all that it does, XenServer has a tiny installation footprint: 1.2 GB (roughly).  That is the modern day equivalent of a 1.44" disk, really.  While the installation footprint is tiny, well, so is the "root/boot" partition that the XenServer installer creates: 4GB in size - no more, no less, and don't alter it! 

The same is also true - during the install process - for the secondary partition that XenServer uses for upgrades and backups:

The point is that this amount of space does not facilitate much room for log retention, patch files, and other content.  As such, it is highly important to tune, monitor, and perform clean-up operations on a periodic basis.  Without attention over time all hotfix files, syslog files, temporary log files, and other forms of data can accumulate until the point with which the root disk will become full.

One does not want a XenServer (or any server for that matter) to have a full root disk as this will lead to a full stop of processes as well as virtualization for the full disk will go "read only".  Common symptoms are:

  • VMs appear to be running, but one cannot manage a XenServer host with XenCenter
  • One can ping the XenServer host, but cannot SSH into it
  • If one can SSH into the box, one cannot write or create files: "read only file system" is reported
  • xsconsole can be used, but it returns errors when "actions" are selected

So, while there is a basis for a problem, the following article offers the basis for a solution (with emphasis on regular administration).

Monitoring the Root Disk

Shifting into the first person, I am often asked how I monitor my XenServer root disks.  In short, I utilize tools that are built into XenServer along with my own "Administrative Scripts".  The most basic way to see how much space is available on a XenServer's root disk is to execute the following:

df -h

This command will show you "disk file systems" and the "-h" means "human readable", ie Gigs, Megs, etc.  The output should resemble the following and I have made the line we care about in bold font:

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             4.0G  1.9G  1.9G  51% /
none                  299M   28K  299M   1% /dev/shm
/opt/xensource/packages/iso/XenCenter.iso
                       56M   56M     0 100% /var/xen/xc-install

A more "get to the point" way is to run:

df -h | grep "/$" | head -n 1

Which produces the line we are concerned with:

/dev/sda1             4.0G  1.9G  1.9G  51% /

The end result is that we know 51% of the root partition is used.  Not bad, really.  Still, I am a huge fan of automation and will now discuss a simple way that this task can be ran - automatically - for each of your XenServers.

What I am providing is essentially a simple BASH script that checks a XenServer's local disk.  If the local disk use exceeds a threshold (which you can change), it will send an alert to XenCenter so the the tactics described further in this document can be employed for the assurance of as much free space as possible.

Using nano or VI, create a file in the /root/ (root's home) directory called "diskmonitor" and paste in the following content:

#!/bin/bash
# Get this host's UUID
thisUUID=`xe host-list | grep $HOSTNAME -B 1 | head -n 1 | awk {' print $5 '}`

# Threshold of disk usage to report on
threshold=75    # an example of how much disk can be used before alerting

# Get disk usage
diskUsage=`df -h | grep "/$" | head -n 1 | awk {' print $5 '} | sed -n -e "s/%//p"`
if [ $diskUsage -gt $threshold ]; then
     xe message-create pool-uuid= name="ROOT DISK USAGE" body="Disk space use has exceeded $diskUsage on `echo $HOSTNAME`!" priority="1"
fi

After saving this file be sure to make it executable:

chmod +x diskmonitor.sh

The "#!/bin/bash" at the start of this script now becomes imperative as it tells the user space (when called upon) to use the BASH interpreter.

To automate the execution of this script, one can now leverage cron as to add an entry and recurring time this script should be executed.  I will outline how to run this script 4 times per day via cron, but if you are looking for more information regarding cron, what it does, and how to configure it, then checkout http://www.thegeekstuff.com/2009/06/15-practical-crontab-examples/

From the command line where /root/diskmonitor exists, execute the following to add an entry to the crontab, or list of jobs to run on a periodic basis:

crontab -e

This will open root's crontab where you will want to add the following line:

00 00,06,12,18 * * * ./root/diskcleanup

After saving this, we now have a cron entry that runs "diskcleanup" at midnight, six in the morning, noon, and 6 in the evening (military time) for every day of every week of every month.

Now that we have a basic tool to alert if disk space exceeds the default threshold set (75%), let's move on to ways to reclaim disk space for a XenServer's root partition.

Removing Old Hotfixes

After applying one or more hotfixes to XenServer, copies of each decompressed hotfix are stored in /var/patch.  The main reason for this - in short - is that in pooled environments, hotfixes are distributed from a host master to each host slave to eliminate the need to download a single file multiplied by the number of hosts in a pool.  The more complex reason is for consistency for if a host becomes the master of the pool, it must reflect the same content and configuration as its predecessor did and this includes hotfixes.

The following is an example of what the /var/patch/ directory can look like after the application of one or more hotfixes:

Notice the applied sub-directory?  We never want to remove that.  So, to remove the archived files (that use the UUID convention for names), one can execute the following "commands" to safely remove only the old patch files and without removing the /applied sub-directory:

find /var/patch -maxdepth 1 | grep "[0-9]" | xargs rm

Finally, if you are in the middle of applying hotfixes do not perform the removal procedure (above) until all hosts are rebooted, fully patched, and verified as in working order.  This applies for pools - especially - where a missing patch file could throw off XenCenter's perspective of what hotfixes have yet to be installed and for which host.

The /tmp Directory

Plain and simple, the /tmp directory is truly meant for just that: holding temporary data.  As I state this one can access a XenServer's command-line and execute the following to see a quantity of ".log" files:

cd /tmp
ls

As visualized (and overtime) one can see that an accumulation of many, many log files.  Albeit, these are small at the individual file perspective, but collectively... they take up space.

These files are generated by access - via XenCenter - to Guest VMs and are, in short, related to Stunnel.  Just as before, one can execute the following "commands" as to remove these log files and preserve the /tmp directory structure:

find /tmp -name *.log | xargs rm

This will remove only the ".log" files.

Compressed Syslog Files

Finally, the last item is to remove all compressed syslog files stored under /var/log.  These usually consume the most disk space and as such, I will be authoring an article shortly to explain how one tune and even forward these messages to a syslog aggregator.

In the meantime, just as before one can execute the following command to keep current syslog files in-tact, but remove old, compressed log files:

find /var/log -name *.gz | xargs rm

So For Now...

It is at this point one has a tool to know when a disk has hit capacity and methods with which to clean-up specific items.  This can be taken by the admin to be ran in an automated fashion or manual fashion.  It is truly up to the admin's style of work.

Feel free to post any questions, suggestions, or methods you may even use to ensure XenServer's root disk does not fill up.

 

--jkbs | @xenfomation

 

 


----

Shared via my feedly reader


Sent from my iPhone

No comments:

Post a Comment