When the Power Goes Out At Google

When the Power Goes Out At Google 135

Posted by Soulskill on Monday March 08, 2010 @12:07PM from the larry-and-sergey-ghost-stories dept.

1sockchuck writes "What happens when the power goes out in one of Google's mighty data centers? The company has issued an incident report on a Feb. 24 outage for Google App Engine, which went offline when an entire data center lost power. The post-mortem outlines what went wrong and why, lessons learned and steps taken, which include additional training and documentation for staff and new datastore configurations for App Engine. Google is earning strong reviews for its openness, which is being hailed as an excellent model for industry outage reports. At the other end of the spectrum is Australian host Datacom, where executives are denying that a Melbourne data center experienced water damage during weekend flooding, forcing tech media to document the outage via photos, user stories and emails from the NOC."

When the Power Goes Out At Google

This discussion has been archived. No new comments can be posted.

Search 135 Comments Log In/Create an Account

Comments Filter:

title should read "Google App Engine NOT a Cloud" (Score:1, Funny)

by Anonymous Coward writes: on Monday March 08, 2010 @12:22PM (#31401576)

Obviously if the power goes out, and the service goes offline, then it WASN'T a cloud. If it's a cloud, it can't go down. If it goes down, it wasn't a cloud.
What's there to get?

Re:and what about openess during the incident? (Score:4, Funny)

by theIsovist ( 1348209 ) writes: on Monday March 08, 2010 @12:26PM (#31401612)

Glen Beck, is that you!?

They had a perfect contingency plan for this case (Score:5, Funny)

by juanjux ( 125739 ) writes: on Monday March 08, 2010 @12:28PM (#31401636) Homepage

...but it was stored on Google Docs.

Re:title should read "Google App Engine NOT a Clou (Score:1, Funny)

by Anonymous Coward writes: on Monday March 08, 2010 @12:48PM (#31401858)

Whoosh.

Re:what about having people onsite? (Score:1, Funny)

by Anonymous Coward writes: on Monday March 08, 2010 @12:57PM (#31401954)

You are thinking too small-scale. Of course there are people on-site. Google has data centers all over the world -- how are they going to drive there?
http://en.wikipedia.org/wiki/DUKW [wikipedia.org]
'nuff said.

Re:Generators plus UPS FTMFW (Score:3, Funny)

by Darth_brooks ( 180756 ) writes: <[clipper377] [at] [gmail.com]> on Monday March 08, 2010 @01:57PM (#31402788) Homepage

Yeah, and when the guys at the Jesus Christ of Datcenters that you describe have to do something like, say, switch from generator to utility power manually, and the document that details that process is 18 months old and refers to electrical panels that don't exist anymore, you get what you had here. A failure of fail-over procedures. If the lowliest help desk / operator can't at least understand the documentation you've written, then you've failed.
The only equipment failure listed is a "power failure." Granted, that can be as simple as "car hits a telephone pole and knocks out a chunk of the grid, leaving your office in the dark", which should be an easily survivable event. But how do you handle a failure like "50kva inline UPS shits the bed leaving nothing but a smoking chassis that no one wants to go anywhere near?" or "HVAC unit fails on christmas eve when only a skeleton staff is on duty and fills the raised floor with 8 inches of water, shorting everything within an inch of its life and making it impossible to bring any hosted services back online?"
There's nothing like a little bit of "we had no idea these three or four unrelated circumstances could happen simultaneously" disaster porn to make you realize that A. Outage / DR / fail-over planning is more than just throwing money at stuff (UPS's, generators, redundant lines, etc) and B. No matter how good your plan is, it will never be 100% effective.

Lessons Like (Score:3, Funny)

by Greyfox ( 87712 ) writes: on Monday March 08, 2010 @02:26PM (#31403138) Homepage Journal

Don't have all your shit in one data center, maybe? I'd have thought that one would be pretty fundamental. Of course, knowing Google they're going to decide that what they really need is power generation right on site, then they'll just pop off and invent nuclear fusion before lunch.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

When the Power Goes Out At Google 135

When the Power Goes Out At Google More Login

When the Power Goes Out At Google

title should read "Google App Engine NOT a Cloud" (Score:1, Funny)

Re:and what about openess during the incident? (Score:4, Funny)

They had a perfect contingency plan for this case (Score:5, Funny)

Re:title should read "Google App Engine NOT a Clou (Score:1, Funny)

Re:what about having people onsite? (Score:1, Funny)

Re:Generators plus UPS FTMFW (Score:3, Funny)

Lessons Like (Score:3, Funny)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot