Wait! Backup!

I hate to wait. Standing in lines at the supermarket, or heaven forbid the Department of Motor Vehicles, is a pain I'd rather avoid. But even more than the DMV, I hate to wait for a paused Unidata system to come back online when the backup is running. I can avoid the DMV, except once every few years. With Unidata backups, however, that delay that can happen daily. Even more irritating, it may be entirely unnecessary!

Pausing your database to take a quiescent copy is a good thing. Historically, backups have paused the database for the entire time that the backup is running. With the newer systems that's no longer necessary. For example, one can pause/lock a Unidata/Universe database environment, then take a snapshot or split a mirror, then resume/unlock the database, and then run the backup off the — secondary — quiescent copy. This can reduce your total downtime from several hours to a couple of seconds or less.

For those using AIX, IBM has this backup strategy known as "split backup off mirror" (SBOM). With this backup strategy, the file system is mirrored using the logical volume manager. The backup procedure pauses the database and then splits off a mirror from the volume group, mounting it separately as a copy of the original file system. Once this mirror has been split, the database on the original drives is resumed and the backup commences from the copy. When the copy is complete, the mirror, previously split, is re-added to the volume group and re-synced to the other copies. The database is paused for merely a couple of seconds, rather than the hours it would take for the full backup procedure.

A similar thing can be done with Red Hat Enterprise Linux (RHEL) as well, although it's not the most efficient option. Instead, on RHEL a snapshot can be taken of the file system, mounted, backed-up, and then when the backup is complete, the snapshot can be removed. There are a couple of benefits to this approach:

  • Reduced storage: When a snapshot is taken on RHEL, the snapshot only needs enough storage to hold any changes to the original filesystem to bring the snapshot up-to-date with the original copy. If the snapshot is done at a time when the system is not under load, the amount of disk needed for the snapshot can be very small. By contrast, a mirror will persistently need 100% of the original drive space.
  • No re-sync on the snapshot: When the snapshot backup is complete, the snapshot can be removed. It is merely a frozen moment in time, so by removing it you only lose that moment. Everything that happened since the snapshot remains intact on the original copy. And removing a snapshot is remarkably fast! With the SBOM approach, it can take hours for the mirror to be joined and updated with the original copy. The system isn't down - these are not hours of downtime, per se - but they are still hours of I/O that can impact the performance of the rest of the system.

A snapshot backup is really quite easy to do, assuming the logical volume manager has been configured with a little bit of unclaimed storage. Here's the 40,000-foot view of the process:

  • Pause the database
  • Take a snapshot of the file system
  • Resume the database
  • Mount the snapshot somewhere in the file system
  • Do the backup
  • Remove the snapshot

It's also important to add in some basic validation to ensure that you don't start a backup when another backup is running; that the database isn't already paused, that sort of thing.

Here are the commands to set up a snapshot backup on UniVerse and UniData systems:

Step 1: Pause the Database

On Unidata, the dbpause command run by someone with superuser status will pause the database. This allows programs to read information from the files, but any attempts to write anything will block (hang) until the database is resumed.

On Universe, the comparable command is uv -admin -L to lock the database from being updated.

Step 2: Take a Snapshot of the Database

The lvcreate command is used to create the snapshot using this syntax:

lvcreate -L size G -s -n snap /dev/ volumeGroup/volumeName

The size parameter defines how big the snapshot volume needs to be. Typically, 10% to 20% of the size of the original drive will suffice. However, that is merely a SWAG [ educated guess - Ed ] and should be adjusted based on the updates happening while the backup is running. It's important that the snapshot is big enough. Otherwise, during the backup, the snapshot could become invalidated by updates. However, it doesn't need to be excessively large.

The volumeGroup parameter defines the volume group where volumeName can be found. The lvs command can be useful for reporting the logical volumes currently defined. If, for example, your volume group is called newsystem and the logical volume is called root and it's 200 GB in size, a snapshot sized at 10% can be created with:

lvcreate -L20G -s -n snap /dev/newsystem/root

Note that the newsystem volume group will need to have 20 GB in unclaimed storage to allow this snapshot to be created. The pvs command can be used to see how much free space exists on the physical volumes used in the logical volume group.

Step 3: Resume the Database

On Unidata, the database can be resumed by a superuser using the dbresume command. On Universe, this command can be used in the operating system to unlock the database:

uv -admin -U

Step 4: Mount the Snapshot Somewhere

This is where things can get a little tricky. The default filesystem on RHEL is called xfs. Each xfs storage volume contains a unique ID. Because the snapshot is a moment in time of the original volume, both the snapshot and the original copy have the same (should be) unique ID. We can mount the snapshot by telling RHEL to ignore the duplicate ID, but we do that to our peril. If the system goes down when the two volumes with the same unique ID are mounted, there's a strong likelihood that the system will not come back up again without some assistance.

To mitigate this, we'll mount the drive temporarily so that some final updates can be done, and then unmount it right away. There is still a risk that the system might go down in between these two operations, but that risk is minimal as both commands will finish nearly instantaneously. Then we'll tell the xfs administrator to generate a new unique ID for that drive. And finally, when that's done, we can mount the drive somewhere that the backup process can see it. The following commands do this:

mount -n -o nouuid /dev/ volumeGroup /snap /mnt/snap
umount /mnt/snap
xfs_admin -U generate /dev/ volumeGroup /snap
mount /dev/ volumeGroup /snap /mnt/snap

In this scenario, I'm mounting the snapshot volume at /mnt/snap . You can, of course, mount the snapshot anywhere you want, as long as the directory exists prior to the mount command.

In the first mount command, the -n option says this is a temporary mount and the /etc/mtab file should not be updated by the mount. This is one attempt to mitigate the issue where there are two volumes mounted with the same unique ID. However, I found that even with the -n , the /etc/mtab file is updated so this option may not be as helpful as intended.

The -o nouuid option on the first mount command says to ignore the fact that the two volumes have the same ID, and mount it anyway. During the mount, the logical volume manager does some cleanup on the mounted volume to ensure that the drive will later mount cleanly. Because this is the riskiest command of the bunch, with the potential to leave two volumes mounted with the same ID, it is immediately followed by a umount to unmount the snapshot volume.

The xfs_admin command with the -U generate option will generate a new unique ID for the snapshot volume. Once this is done we can then mount the snapshot drive again — cleanly, safely — and without requiring any options.

Total elapsed time for these four commands is a second or less, and they generate so little overhead on the system that normal everyday users won't even know all this magic is happening.

Step 5: Do the Backup

Once the backup drive is mounted, you can use whatever tools you have to do the backup. Maybe you're backing up to an appliance ala Unitrends or Synology, maybe you're rsync ing the data to off-site storage, or maybe you're backing up to tape. It really doesn't matter how you backup, except to say that you should be backing up to something external to the current system and preferably to something off-site, like the cloud storage replication option available on Unitrends appliances.

Step 6: Remove the Snapshot

Once the backup is complete, there's no point to holding on to the snapshot. The lvremove command can quickly dispatch it, freeing that storage space. However, before removing the snapshot logical volume, it must first be unmounted:

umount /mnt/snap
lvremove -f /dev/ volumeGroup /snap

Of course, as in all previous commands, replace volumeGroup with the appropriate volume group for your system.

If you're thinking it might be good to hold on to that snapshot "just in case," allow me to talk you out of that idea. Remember, we only sized the snapshot volume to support the changes that would be happening to the original during the backup. The longer the snapshot sticks around, the more likely it is that you'll overflow that space and invalidate the snapshot. More importantly, when there's an active volume snapshot, everything that happens to the original volume gets logged for the snapshot, resulting in double updates. Deleting the snapshot frees up that storage and allows the system to go back to single updates for the original volume only.

This is notably a high-level view of the process to do a snapshot backup, but hopefully it illustrates that this is not all that difficult of a procedure. Wrap these commands up in some validation to ensure that nobody does anything stupid, and watch your backup downtime decrease to virtually nothing!


Nov/Dec 2015