Mashable’s Ben Parr called it Cloudgate to describe what happened when Amazon’s Elastic Could Computer (EC2) went down leaving some major websites in the dust, such as FourSquare, Quora, Wildfire, and more. I think this caught everyone off guard because there had been numerous articles about Amazon’s lead in the Cloud Computing space. The company has been at it for a while and has a head start in learning how to build an efficient cloud infrastructure.Last week, though, we had a friendly reminder that even the good can die young. That we have to always be prepared for the unexpected. (Yes, that’s one of my mottos). Some valuable lessons:
- Build redundancy – and if your hosting partner can’t provide it, go find another partner to act as your back up.
- Take service level agreement seriously – some of my friends tell me they were not very stringent
- Employ some of your own monitoring tools just so you don’t loose any time to be notified about your site being down
- Cloud Disaster Recover Options — Look into third parties to help you with this
- Leverage Twitter to send out alerts and let people know what is going on.
- Follow up with customers and let them know what happened, that you apologize and that you want their suggestions of how to be notified in the future (I only heard from one company who was impacted)
- Impact of new features and functionality — when you add new i-candy to your site or backend functionality, make sure to conduct complete end-to-end testing