Disclaimer: While I work for Amazon we had no direct involvement in the massive S3 outage yesterday. We were, like many parts of the internet (and Amazon) affected, but I had no hand in the problem or fixing it.

It’s amazing to see how much of the internet is using Amazon services. For the most part things really work well. But like anything out there that’s reliant on things working, things will occasionally fail. S3 has had a stellar record for uptime. But yesterday the US-East availability zone’s S3 cluster was down for the count.

All you had to do was look on Twitter to see how widespread the outage really was.

But the thing is you can design to allow for failures like this. If you look at the Amazon.com website most of the functionality of the shopping and buying processes stayed up. And this isn’t because we have special access to S3 and such. It’s because there’s been decisions made to host critical functionality in different data centers.

Online I’ve even seen discussions about some people store critical data and host services in both AWS from and Amazon as well as Microsoft’s Azure. Sure, you have to use the lowest common denominator for the functionality, but the chances of both going down at the same time is way lower.There are costs associated with all

There are costs associated with all of these… but it all depends on what you’re willing to lose.