June 25, 2012 | Mark Paradies

Monday Accident and Lessons Learned: When High Reliability Systems Fail

What if you had a system with two regular power supplies, two back-up power supplies (diesels), and a battery back up with a separate diesel to keep it charged?

Wow!  This should be highly reliable right?

Read about how this system failed here:

feed://status.aws.amazon.com/rss/ec2-us-east-1.rss

Now here’s the question …

What did they miss in their “root cause analysis”?

I think they had great troubleshooting.

They even had actions to address generic problems.

But I don’t think they found the root causes of the “cloud failure” incident.

What do you think? Leave your comments here…

Categories
Show Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Follow us on Facebook
Follow us on Twitter
Check out our videos
Join us on LinkedIn