Amazon EC2 Collapse and designing for cloud computing
As I'm sure most tech geeky folks now know, Amazon EC2 had a massive outage yesterday. This affected numerous online applications and web sites totally unrelated to amazon.com. My favorite new word is currently cloudpocalypse.
Many folks have decided that "cloud computing" is the next golden hammer that will solve any and all computing problems. I hate to rain on their parade (pun intended), but at least from my perspective, "cloud computing" primarily provides the ability to acquire cheap and fast computing infrastructure, have it online quickly, and scale it massively. EC2 is GREAT for that, as a matter of fact, there probably isn't anything nearly as powerful and complete on the market right now.
Note that I didn't include the word "reliable" anywhere in my value proposition. Don't trust a cloud provider to be reliable, to quote Amazon's own CTO "Everything Fails all the time". The important thing to consider when designing software for the cloud is how you deal with failure.
Many folks with traditional datacenters try to deal with failure by flogging sysadmins, developers, and vendors every time something goes wrong and generally spending a lot of time pointing fingers. This is pretty unhealthy and actually creates more problems than it solves, but when using amazon as a platform it's just not going to be an option.
What does this mean?
It means when designing for a cloud provider, you might need a plan "B" or a plan "C" when something goes wrong. Many traditional shops have a "disaster recovery" site which is physically separated from the main site. How do you do this in the cloud? Likely it means having alternate providers or real life physical servers that you have control over as a disaster recovery option.
Moreover it means that your applications should be designed in such a way that they can still function when they suddenly are running on a different machine in a different location or with a different set of components. If you rely on physical files (outside your app) being present for your application to function... you'll fail. If you rely on a database having certain up to date information... you'll fail, if you rely on "you own" IP address (within your application code) you'll fail.
In short, designing for cloud computing means designing for failure. Arguable ALL software design should be done this way, but I think cloud computing requires elevating the importance of this quality.
Many folks have decided that "cloud computing" is the next golden hammer that will solve any and all computing problems. I hate to rain on their parade (pun intended), but at least from my perspective, "cloud computing" primarily provides the ability to acquire cheap and fast computing infrastructure, have it online quickly, and scale it massively. EC2 is GREAT for that, as a matter of fact, there probably isn't anything nearly as powerful and complete on the market right now.
Note that I didn't include the word "reliable" anywhere in my value proposition. Don't trust a cloud provider to be reliable, to quote Amazon's own CTO "Everything Fails all the time". The important thing to consider when designing software for the cloud is how you deal with failure.
Many folks with traditional datacenters try to deal with failure by flogging sysadmins, developers, and vendors every time something goes wrong and generally spending a lot of time pointing fingers. This is pretty unhealthy and actually creates more problems than it solves, but when using amazon as a platform it's just not going to be an option.
What does this mean?
It means when designing for a cloud provider, you might need a plan "B" or a plan "C" when something goes wrong. Many traditional shops have a "disaster recovery" site which is physically separated from the main site. How do you do this in the cloud? Likely it means having alternate providers or real life physical servers that you have control over as a disaster recovery option.
Moreover it means that your applications should be designed in such a way that they can still function when they suddenly are running on a different machine in a different location or with a different set of components. If you rely on physical files (outside your app) being present for your application to function... you'll fail. If you rely on a database having certain up to date information... you'll fail, if you rely on "you own" IP address (within your application code) you'll fail.
In short, designing for cloud computing means designing for failure. Arguable ALL software design should be done this way, but I think cloud computing requires elevating the importance of this quality.
Comments