Blog
Virtual Block Storage Crashed Your Cloud Again :(
You know it's bad when you start writing an incident report with the words "The first 12 hours." You know you need a stiff drink, possibly a career change, when you follow that up with phrases like "this was going to be a lengthy outage...", "the next 48 hours...", and "as much as 3 days".
That's what happened to huge companies like NetFlix, Heroku, Reddit,Hootsuite, Foursquare, Quora, and Imgur the week of April 21, 2011. Amazon AWS went down for over 80 hours, leaving them and others up a creek without a paddle. The root cause of this cloud-tastrify echoed loud and clear. Heroku said:
Ynet on AWS. Let's hope we don't have to test their limits.
In Israel, more than in most places, no news is good news. Ynet, one of the largest news sites in Israel, recently posted a case study (at the bottom of this article) on handling large loads by moving their notification services to AWS.
"We used EC2, Elastic Load Balancers, and EBS... Us as an enterprise, we need something stable..."
They are contradicting themselves in my opinion. EBS and Elastic Load Balancers (ELB) are the two AWS services which fail the most and fail hardest with multiple downtimes spanning multiple days each.
EBS: Conceptually flawed, prone to cascading failures
Linux and Solaris are Converging but Not the Way You Imagined
In case you haven't been paying attention, Linux is in a mad dash to copy everything that made Solaris 10 amazing when it launched in 2005. Everyone has recognized the power of Zones, ZFS and DTrace but licensing issues and the sheer effort required to implement the technologies has made it a long process.