san storage

Sun Oracle Webcast Wrap Up

Last night I watched almost the entire 5 hour live webcast announcing Oracle's strategies regarding the Sun Microsystems acquisition. As a near-evangelist for Sun and Solaris, I'm very happy with the deal finally going through and even happier that most of what Oracle said makes sense to me as a customer.

No ZFS Support for EMC Replication Manager

As I originally blogged, I was hoping to use EMC snapshots to perform server-less/network-less backups. EMC provides two main tools for managing snapshots in this type of situation:

  • EMC Replication Manager
  • EMC PowerSnap Networker Module

The PowerSnap Module supposedly automates taking snapshots for the purpose of backups, while Replication Manager supposedly provides a much more robust package.

With Replication Manager you might create a policy to take a snapshot every five minutes, keep the last 10, and use those for backups whenever necessary.

To make a long story short, Replication Manager is useless for LUNs with ZFS. According to EMC, this won't change in the near future. PowerSnap also has no support for taking snapshots of LUNs with ZFS on them so basically EMC has no server-less backup offerings for Solaris with ZFS.

Caveats on Using Snapshots for Server-less Backups

Whether you are dealing with disk I/O in reading the data from the disks, or CPU for compressing or encrypting the data (or both- remember to compress and then encrypt!), or network for transferring the data to a backup server, the added load of a backup on your production servers is unwelcome. For this reason, the period of time during which backups can be made, aka. backup window, may be limited- even severely.

You may say, "It only takes me X hours to do a full backup of everything", but over time backup windows are notorious for becoming too small. Backups are split over multiple days, technologies upgraded, etc. When planning a backup strategy, my approach is to eliminate the backup window altogether- that is do whatever you can to take the backup off the production hardware altogether.

Storage Snapshots are one method for taking the production servers out of the backup equation. By creating a consistent, point in time snapshot on your storage, and mounting it on your backup server, you can backup your data using your backup server's resources while your production servers continue as usual.

Caveats of this method in general are:

EMC Replication Manager in Solaris

UPDATE: No ZFS Support for Replication Manager in the near future

Using storage level snapshots can be used to run backups without directly requiring resources from the original host.

EMC Replication Manager coordinates the creation of application consistent snapshots across all the hosts in your network. It handles scheduling creation/expiration of snapshots,  mounting and unmounting from backup servers, etc. from a single console.

Although it is not tightly integrated into EMC Networker like the similar Networker PowerSnap module, it can be used to start a backup process after taking a new snapshot and it has the capability to manage snapshots unrelated to backups from a GUI.

While the data sheet claims support for Solaris, there are several caveats which I have run into.

When 99.999% Isn't Good Enough

When discussing availability of a service, it is common to hear the term "Five Nines" referring to a service being available 99.999% of the time but "Five Nines" are relative. If your time frame is a week, then your service can be unavailable for 6.05 seconds whereas a time frame of a year, allows for a very respectable 5.26 minutes.

In reality, none of those calculations are relevant because no one cares if a service is unavailable for 10 hours, as long as they aren't trying to use it. On the other hand, if you're handling 50,000 transactions per second, 6.05 seconds of unavailability could cost you 302,500 transactions and no one cares if you met your SLA.

This problem is one I've come up against a number of times in the past and recently even more and the issue is orders of magnitude in IT. The larger the volume of business you handle, the less relevant the Five Nines become.