In case you haven't been paying attention, Linux is in a mad dash to copy everything that made Solaris 10 amazing when it launched in 2005. Everyone has recognized the power of Zones, ZFS and DTrace but licensing issues and the sheer effort required to implement the technologies has made it a long process.
You know the elephant in the room, the one no one wants to talk about. Well it turns out there was a whole herd of them hiding in my cloud. There's a herd of them hiding in your cloud too. I'm sure of it. Here is my story and how I learned to wrangle the elephants in the cloud.
Like many of you, my boss walked into my office about three years ago and said "We need to move everything to the cloud." At the time, I wasn't convinced that moving to the cloud had technical merit. The business, on the other hand, had decided that, for whatever reason, it was absolutely necessary.
As I began planning the move, selecting a cloud provider, picking tools with which to manage the deployment, I knew that I wasn't going to be able to provide the same quality of service in a cloud as I had in our server farm. There were too many unknowns.
The ball hit the net but from which side. Can you tell? Over the past three years, companies have pushed themselves to the cloud for many reasons but have they landed in the wrong side of the net?
Many companies have mistaken moving to the cloud for a goal to be achieved and it is natural to make that mistake. Companies see the bottom line, that building services in PAAS or IAAS clouds lowers the costs of bootstrapping risky projects, speeds up time to market and enables greater flexibility. They naturally make moving everything to the cloud a business target.
They miss that driving these benefits are the ways that automation and infrastructure as a service force the modernization and industrialization of a company's IT teams and processes. Even if a company isn't using any modern software driven deployment techniques, it is the industrialization of infrastructure on the provider's side that allows a "machine" to be spec'ed, purchased, racked, cabled, and installed at the push of a button or the call of an API. It is this change in the way that IT works that is improving the bottom line, speeding time to market and increasing the business agility.
I was recently discussing load balancers with someone. I said I was much happier with F5 than I was with Cisco and he countered that although he preferred F5 head to head, going with Cisco for all the network was better for them in the long run.
The situation with storage is similar. EMC makes a great SAN but a pretty bad NAS. Is it worth getting EMC's NAS for the One Stop Shop factor?
Storage Tiering is nothing new. We use fast 15K RPM disks for high performance applications, slower 10K RPM disks for less demanding applications, and 7.2K RPM SATA disks for archive storage. Recently, solid state disks (SSDs) have also become more common for really high performance needs. The trick is managing it all.
Two or three years ago, if you wanted to implement automatic storage tiering, I would have pointed you in the direction of Sun's Storage and Archive Manager- SAM and QFS, Sun's tightly integrated shared file system. SAM-QFS automatically moves files from one storage tier to another based on the SAM policy and transparently retrieves the files when requested. With tape still the least expensive storage available, this is still a great solution for archiving petabytes of documents/files.
Unfortunately, SAM works at the file level so it will not help our databases run faster. What will help us is ZFS. ZFS is still making some fairly big waves in the storage community with it's Hybrid Storage Pool feature. In a standard configuration, ZFS uses RAM for a Layer 1 read cache (ARC). In advanced configurations, the zpool can be configured to use a Layer 2 cache (L2ARC) on faster disks ie. SSDs compared to SAS compared to SATA , etc. The zpool can also be configured to use separate, possibly faster disks for the ZFS Intent Log (ZIL) which is basically a write cache (without getting into why it is more than a write cache). Even without faster disks, the ability to store the read/write cache on a separate device can increase performance just by dedicating more IOPS to the cause.
Oracle/Sun's 7000 series storage builds on the success of the ZFS Hybrid Storage Pool, using Logzilla devices for the ZIL and Readzilla devices for the L2ARC. With the powerful flash acceleration in the storage pool, even 7.2K RPM disks can give performance equal to that of higher speed 15K RPM disks.
Although ZFS does great things for performance by utilizing multiple tiers of storage devices, all the data is still physically stored on the same tier of storage in addition to having the hot data stored again in the caches. This is arguably a waste of capacity but can also lead to performance issues in some cases. For example, a cold L2ARC cache after reboot could give slower performance until fully warmed up. Oracle will probably fix this at some point by allowing the L2ARC to persist if stored on a non-volatile device (bug_id=6662467).
In the meantime, EMC recently announced an interesting new feature called FAST, short for Fully Automated Storage Tiering. FAST is available from FLARE version 04.30.000.5.004. FAST allows you to define a pool in the array composed of multiple RAID Groups, and then define a LUN on the pool as opposed to defining a LUN on the RAID Groups themselves. Once the LUN begins filling with data, the EMC will transparently begin transparently migrating data between the tiers of the pool in 1GB chunks, storing hot data on the fastest tiers and coldest data on the slowest tier.