When the Power Goes Out At Google 135
1sockchuck writes "What happens when the power goes out in one of Google's mighty data centers? The company has issued an incident report on a Feb. 24 outage for Google App Engine, which went offline when an entire data center lost power. The post-mortem outlines what went wrong and why, lessons learned and steps taken, which include additional training and documentation for staff and new datastore configurations for App Engine. Google is earning strong reviews for its openness, which is being hailed as an excellent model for industry outage reports. At the other end of the spectrum is Australian host Datacom, where executives are denying that a Melbourne data center experienced water damage during weekend flooding, forcing tech media to document the outage via photos, user stories and emails from the NOC."
what about having people onsite? (Score:2, Insightful)
aren't there any people in the data center to tell them that yes there has been a power outage, so and so machines are affected, etc? sounds like all they have is remote monitoring and if something happens than someone has to drive to the location to see what's wrong
Re: (Score:2, Troll)
The post on the google-appengine group details all the things they did wrong and are going to fix, after the power went out. Fine, I have to plan for outages too. But what caused the unplanned outage?
Re:what about having people onsite? (Score:4, Insightful)
Who cares?
Power failures are expected, what you can do is have plans for when they occur - batteries, generators, service migration to other sites, etc, etc. Those plans (and the execution of them) are what they had problems with.
multiple datacenters (Score:2)
Power failures are expected, what you can do is have plans for when they occur - batteries, generators, service migration to other sites, etc, etc
Too small scale, too complex, too much human intervention and too unreliable. Minimum of 2 datacenters on opposite sides of the world and you only send half the traffic to each. When the first vanishes the second picks up the traffic. The exact mechanism depends on the level of service you want to provide.
Re: (Score:2)
Re:what about having people onsite? (Score:4, Insightful)
Re: (Score:2)
Re: (Score:3, Insightful)
This isn't just about keeping the people that use Google services informed, this is an admiss
Re: (Score:1, Flamebait)
This just goes to show that Google is as "incompetent" as anyone else. There was a discussion on here the other day and a poster asked why Microsoft, with all of their resources, hasn't come up with a secure OS yet. It was suggested that the know how to create such an OS is out there, and it would just take money and will on Microsoft's part. This seems like the Google equivalent.
Google is trying to push Apps as a replacement for Exchange and Office. They are trying to push it as a replacement for hosti
Useless for large scale problems (Score:5, Interesting)
Of COURSE there are people onsite. Most likely they have anywhere from a dozen to a hundred people onsite. But what's that going to do for you in the case of a large-scale problem?
The otherwise top rated 365 Main [365main.com] facility in San Francisco went down a few years ago. They had all the shizz, multipoint redundant power, multiple data feeds, earthquake-resistant building, the works. Yet, their equipment wasn't well equipped to handle what actually took them down - a recurring brown-out. It confused their equipment, which failed to "see" the situation as one requiring emergency power, causing the whole building to go dark.
So there you are, with perhaps 25 staff a 4-story building with tens of thousands of servers, the power is out, nobody can figure out why, and the phone lines are so loaded it's worthless. Even when the power comes back on, it's not like you are going to get "hot hands" in anything less than a week!
Hey, even with all the best planning, disasters like this DO happen! I had to spend 2 wracking days driving to S.F. (several hours drive) to witness a disaster zone. HUNDREDS of techs just like myself carefully nursing their servers back to health, running disk checks, talking in tense tones on cell phones, etc.
But what pissed me off (and why I don't host with them anymore) was the overly terse statement that was obviously carefully reviewed to make it damned hard to sue them. Was I ever going to sue them? Probably not, maybe just ask for a break on that month's hosting or something. I mean, I just want the damned stuff to work, and I appreciate that even in the best of situations, things *can* go wrong.
So now I host with Herakles data center [slashdot.org] which is just as nice as the S.F. facility, except that it's closer, and it's even noticably cheaper. Redundant power, redundant network feeds, just like 365 main. (Better: they had redundancy all the way into my cage, 365 Main just had redundancy to the cage's main power feed)
And, after a year or two of hosting with Herakles, they had a "brown-out" situation, where one of their main Cisco routers went partially dark, working well enough that their redundant router didn't kick in right away, leaving some routes up and others down while they tried to figure out what was going on.
When all was said and done, they simply sent out a statement of "Here's what happened, it violates some of your TOS agreements, and here's a claim form". It was so nice, and so open, that out of sheer goodwill, I didn't bother to fill out a claim form, and can't praise them highly enough!
Re:Useless for large scale problems (Score:5, Insightful)
The otherwise top rated 365 Main [365main.com] facility in San Francisco went down a few years ago. They had all the shizz, multipoint redundant power, multiple data feeds, earthquake-resistant building, the works. Yet, their equipment wasn't well equipped to handle what actually took them down - a recurring brown-out. It confused their equipment, which failed to "see" the situation as one requiring emergency power, causing the whole building to go dark.
I think you made the right decision in changing providers. I remember that story about the 365 outage, and while I am too lazy to look up the details again, I recall it being as you're telling it. To that end, I'd simply say that they most certainly did have the proper equipment to handle the brown out, but obviously not the proper management. If you're having regular (if intermittent) power problems (brown outs, phase imbalances, voltage harmonic anomolies, spikes, etc), just roll to generator, that's what they're there for.
I'm sick of people making the assumption that the operators of the facility were just at the mercy of a power quality issue because they have redundant power feeds and automatic transfer switches. Yes, in a perfect world, all the PLCs will function as designed, and the critical load will stay online by itself. However, it takes some foresight and some common sense sometimes to make a decision to mitigate where necessary. I direct all my guys to pre-emptively transfer to our generators if there are frequent irregularities on both of our power feeds (i.e. during a violent thunderstorm, simultaneous utility problems, etc).
In other words, I'm agreeing with you that the service you received was unacceptable. Along with that (and in rebuttal to the parent post), I'm saying that it's not enough to talk about how they came back from the dead, but why they got there in the first place.
Re: (Score:2)
Re: (Score:2)
But what pissed me off (and why I don't host with them anymore) was the overly terse statement that was obviously carefully reviewed to make it damned hard to sue them. Was I ever going to sue them? Probably not, maybe just ask for a break on that month's hosting or something.
You wouldn't but come on, you know how we Americans are. We sue when we can't play Halo for a few days [gamespot.com].
Chances aren't bad that someone was looking for a lawsuit, heading it off at the pass had a chance to prevent some stupid lawsuits which would waste time and only benefit lawyers, possibly requiring some invasive, poorly thought-out court-ordered hinderance which would have slowed the recovery.
Re: (Score:1)
Of COURSE there are people onsite. Most likely they have anywhere from a dozen to a hundred people onsite.
and as long as you're quiet and don't try to damage the control systems, you can move about freely and they'll generally ignore you [wordpress.com]
Re: (Score:1, Funny)
You are thinking too small-scale. Of course there are people on-site. Google has data centers all over the world -- how are they going to drive there?
http://en.wikipedia.org/wiki/DUKW [wikipedia.org]
'nuff said.
Oh, my lifestream (Score:2)
Read the comments (Score:5, Insightful)
I pity EvilMuppet. Guy is a tool. There are contractual agreements that are in place to prevent pictures, aka the "rules" but when the data center blatantly LIES they are breaking the trust and violating the agreement. Case Law exists where contracts can be violated when one accuses the other of violating said contract.
That's what happened. The data center was lying about what happened to avoid responsibility for the equipment it was being paid to host. Pictures were taken and are being used to prove the company did violate the trust of the contract.
You can argue the semantics and legality of it but if this goes to court the pictures will be admissible and the data center will lose.
Re: (Score:2)
Re: (Score:1)
I love those guys.
They did a comedy show building up to the 2000 Sydney Olympics.
http://en.wikipedia.org/wiki/The_Games_(Australian_TV_series) [wikipedia.org]
Spawned "the Office" style of pseudo documentary. Excellent show.
Search: clarke dawe "the games" on youtube to see some clips.
Re: (Score:1)
Looking over the contract we have with Datacom, you'd be hard pressed to have the Managing Director's statements be material in affecting a contract violation. Given that the photos were taken well before any statement was made to the public by a Datacom representative takes at least some of the basis away from your argument of trust.
As for evidence, colleagues of mine have damaged equipment and I have remote monitoring, MRTG graphs and other means of validating facts. How do you think that a particular pub
Re: (Score:2)
Yes, there is.
Whether and to what extent legal precedent is binding and, as such, "law" depends on the legal system; there are some legal systems in which it is not, and some in which, in specific circumstances, it is. The latter is true of the UK and many former British colonies, including, e.g., the USA, Canada, Australia, among others.
Judges who follow precedent
Re: (Score:1)
In a system without precedent, you could theoretically have judges decide cases based on the individual facts of each case, rather than on what the law says. You would have judges throwing the book at defenda
title should read "Google App Engine NOT a Cloud" (Score:1, Funny)
Obviously if the power goes out, and the service goes offline, then it WASN'T a cloud. If it's a cloud, it can't go down. If it goes down, it wasn't a cloud.
What's there to get?
Re: (Score:1, Insightful)
Re: (Score:1, Funny)
Whoosh.
Re: (Score:2)
Sounds more like fog to me.
Re: (Score:2)
Obviously if the power goes out, and the service goes offline, then it WASN'T a cloud. If it's a cloud, it can't go down. If it goes down, it wasn't a cloud.
The cloud got too big and it rained.
They had a perfect contingency plan for this case (Score:5, Funny)
...but it was stored on Google Docs.
Significantly higher latency? (Score:3, Interesting)
A new option for higher availability using synchronous replication for reads and writes, at the cost of significantly higher latency
Anyone know some numbers around what "significantly higher latency" means? The current performance [google.com] looks to be about 200ms on average. Assuming this higher availability model doesn't commit a DB transaction until it's written to two separate datacenters, is this around 300 - 400ms for each put to the datastore?
Re: (Score:2)
I suspect not, since the feature hasn't been implemented yet.
Re: (Score:2)
No, because the writes should happen in parallel.
Correct, the writes should happen roughly in parallel but for the offsite datacenter there will be the latency of the round-trip to send the data and receive the commit confirmation. So latency to the offsite datacenter will need to be factored in plus any overhead involved with managing simultaneous writes to geographically disparate datastores. That's my best guess as to why they said "significantly higher latency".
App Engine down again? (Score:3, Insightful)
App Engine must be Googles absolutely most poorly run project. It has been suffering from outages almost weekly (the status page [google.com] doesn't tell the whole truth unfortunately), unexplainable performance degradations, data corruption (!!!), stale indexes and random weirdness for as long as it has been run. I am one of those who tried for a really long time to make it work, but had to give up despite it being Google and despite all the really cool technology in it. I pity the fool who pays money for that.
The engineers who work with it are really helpful and approachable both on mailing lists and irc, and the documentation is excellent. But it doesn't help when the infrastructure around it is so flaky.
ISO9001 (Score:1, Insightful)
This should be standard practice... It's like the good bits of ISO9001 with a bit more openness. When done right, ISO9001 is a good model to follow.
the worst nightmare of data center peeps (Score:4, Interesting)
Re: (Score:3, Informative)
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
Across town could be 20 miles away in London. On the other side of the Thames is very likely to have it's power and data coming from completely independent systems, even a different power station and over a different part of the national grid.
Since BT was historically the only telecoms provider, even now they are plenty big enough to easily be in a position to have multiple independent data feeds, and if they all fail, nothing else in the capital is working anyway, so a DC's survival would be a minor issue
Re: (Score:2)
Well, I’m no expert, but it’s not very hard to get a building water tight, now is it?
floods (Score:2, Insightful)
Did you ever actually see a big flood? Freaking awesome power, like a fleet of bulldozers. Smashes stuff, rips houses off foundations, knocks huge trees over, will tumble multiple ton boulders ahead of it, etc. Just depends on how big the flood is. We had one late last year here, six inches of rain in a couple of hours, just tore stuff up all over. The "building" that can withstand a flood of significant size exists, it is called a submarine. Most buildings of the normal kind just aren't designed to deal wi
Re: (Score:1)
The structure that can withstand a flood has existed for a lot longer than submersible warships - it's called a "hill". If you don't have one conveniently nearby to use you can even build an artificial one.
Re: (Score:1)
A hill isn't a building. He was talking about water proofing a building. Under normal conditions, sure, buildings are pretty good to keep you from the weather, but in big floods, most will suffer leakage or outright destruction. That's why you always see people trying to save their homes or businesses with sand bags. It just isn't that common for buildings to be built bad flood tough. Some probably exist, but not too many. And yep, a good building on top of the biggest hill around would be the safest. I was
Re: (Score:3, Informative)
An "artificial hill" intended to protect an area from floods is usually called a "levee", and while certainly extremely useful for their intended purpose, they aren't exactly an ironclad guarantee. So having contingency plans for the case where they fail isn't a bad idea.
Re: (Score:1)
An "artificial hill" intended to protect an area from floods is usually called a "levee", and while certainly extremely useful for their intended purpose, they aren't exactly an ironclad guarantee. So having contingency plans for the case where they fail isn't a bad idea.
Buildings that are built _on top_ of a hill (even an artificial one), don't have quite the same set of severe problems with flooding that occurs in low-lying areas. ;)
Re: (Score:2)
Simple: the power equipment gets an unscheduled watering and your servers go down.
If you want to minimize the impact that a disaster can wreak on your servers in a datacenter, then you need to have your entire setup running and synchronously replicated in another datacenter.
Re: (Score:2)
I have setup all servers under my responsiblity in VM's (using VirtualBox) and am ready to deploy on a minimum of servers with only databases available. (I have roughly 3 TB of data and about 22 TB of images.)
I've been patiently standing by, waiting for a data center agreement to be formalized, whereby we'll have a hot-site setup in a center about twenty mi
"no online database will replace your daily news" (Score:2)
Huh? (Score:3, Informative)
Generators plus UPS FTMFW (Score:2, Insightful)
Epic fail.
Any data center worth it's weight in dirt, must have UPS devices sufficient to power all servers plus all network and infrastructure equipment, as well as the HVAC systems too, for a minimum of at least 2 full hours on batteries, in case the backup generators have difficulty in getting started up and online.
Any data center without both adequate battery-UPS systems plus diesel (or natural gas or propane powered) generators is a rinky-dink, mickey-mouse amateur operation.
Re: (Score:2)
It's hard to believe that freakin' Google wouldn't be at that lev
Re: (Score:1)
Re:Generators plus UPS FTMFW (Score:4, Insightful)
Re: (Score:3, Informative)
You cannot have uptime without power. A mains outage coupled with an unexpected generator failure *will* result in downtime - your decision now is whether you wish your servers to be gracefully shutdown, or just have the rug pulled from under them and hours or days of potential angst as a result. Which is it?
And before you suggest larger UPSes for longer pr
Re: (Score:2)
you must have overlooked his closing statement regarding 'if the backup generator failed to kick in'.
If the backup generator fails, then you want 2 hours, not 20 minutes, so that you can have a third set of generators and specialist engineers flown in to install them; if the outage looks long-term, get a tanker truck running between the building and the fuel depot, etc.
"20 minutes to shut down gracefully" might be better than "nothing", but it's certainly not great
Re: (Score:2)
UPSes are *that* expensive. If you really wanted to guard against a generator failure, you double up on generators but you do not spend more than you need on UPSes. A UPS should be there to smooth the transition to the generator, not to run the site for any significant length of time.
Re: (Score:2)
Re: (Score:3, Funny)
Yeah, and when the guys at the Jesus Christ of Datcenters that you describe have to do something like, say, switch from generator to utility power manually, and the document that details that process is 18 months old and refers to electrical panels that don't exist anymore, you get what you had here. A failure of fail-over procedures. If the lowliest help desk / operator can't at least understand the documentation you've written, then you've failed.
The only equipment failure listed is a "power failure." Gra
Re: (Score:2)
Google's setup appears to rely on the fact that they have redundant data centers, so failover to another data center addresses this problem. The problem here, as identified in their post-mortem, is that
Re: (Score:2)
That's what I was thinking; the local battery design that was previously praised became the fault. A large central UPS can monitor and test its batteries more than just plugging an SLA battery into the DC side of a server power supply and patting yourself on the back for being a genius. A UPS gives more telemetry, too. How did Google monitor those individual batteries? Not all SLA batteries are perfect. Were they tested and maintained? I'm guessing "no" to both if 25% of the servers lost power before the ge
Re: (Score:2)
This is false. Google details at some length the causes of the customer-facing outage. The power going out is an early problem, but its not a particular important issue because that's an accepted risk in their plans. The failure was in the fact that the procedures that are intended to prevent a power loss at a data centre from producing a customer-affecting outage had inadequate coverage of partial losses of power, and on top o
Re: (Score:2)
This is false. Google details at some length the causes of the customer-facing outage.
I only sort of skimmed over TFA to get the big points, but if you can point out the part where they explain why 25% of the servers lost power, I'd appreciate it.
Re: (Score:2)
Of course you can verify battery performance safely. My UPS has battery test (checks if the batteries can still be used, if it fails, batteries need replacing) and run time calibration (discharges batteries to 25% and monitors how long it took, based on that it can estimate how long will it be able to hold the load). The whatever system google is using should be able to check the batteries while power is on, so that you don't end up with batteries that have 20% of their original capacity when the power goes
Re: (Score:2)
Yes, but you HAVE A UPS (an APC Smart-UPS by the sound of it). Google does not. They have SLA batteries hard-wired into the PSUs of each individual server. Now, it would be possible for there to be special circuitry to do an online battery test
When the Power Goes Out At Google... (Score:3, Informative)
Re: (Score:2)
...a fairy dies.
I suspect that this will result in a large overpopulation of fairies. Since Google would be to blame for this, perhaps they should begin some sort of fairy mitigation program?
Re: (Score:1)
try employing the right people (Score:2, Interesting)
Has anyone from Ubisoft read this? (Score:1)
Lessons Like (Score:3, Funny)
Re: (Score:2)
And they'll call it gFusion?
Maybe... (Score:2)
They'll come out with it when Apple releases iFusion...
Back around 2005... (Score:2)
I was charged with laying out the design for data, telecom and electrical for the project. Also had engineering of our little NOC.
Nice setup - redundant power in the I.T. division, nice big APC UPS for the entire room, had it's own 480V power drop, dual HVAC units, a natural gas fired generator. It's nice to have the money to do this.
Since we were a state agency we had to use s
Re:Back around 2005... (Score:4, Insightful)
The company had spent the past year rearchitecting the entire IT infrastructure, as the complete core application suite for the business was, other than your standard peripheral utilities like Office et al, green screen based, using a proprietary language from the early 1980s that was barely still maintained and wasn't going anywhere fast.
It was my job to handle the systems infrastructure side of the deal, while another team handled software development and I was way ahead of them - the core business applications were still in the planning stages while the infrastructure to handle and host them was well advanced. The platform we chose was well designed, with onsite redundancy built into the base cost and easily scalable - dare I say it myself, it was a good job. The only thing I had no hand in on the hardware side was the actual building infrastructure, as we had moved to custom built offices about 5 years prior, and there was someone else on the team that handled telecoms and the building. But we had a UPS and a generator, so all seemed well in the world.
Alongside the new infrastructure came the new business continuity plan. Well, I say 'new' - I can't really say there was an 'old' BCP. Sure, we rented space at a major BC facilities provider, but there had never been any test, and there wasn't even any written documentation as to what to do.
Here is where I must admit my first failure - the BCP was not treated as an integral, tied-in-like-a-knot part of the infrastructure, it was a separate project running alongside. Sure, the new infrastructure was designed to take a local server failure through redundancy, or even allow ease of moving to an offsite location. That part of it was all in place. My failure was in not ensuring that the offsite location actually existed as the new infrastructure grew.
However, by the start of 2009, the basic infrastructure needs of the BCP were well known, costed and presented to the company board of directors. And there it sat. Every month I would ask them if it had been signed off, if I could spend the money. Every month I received a negative answer, it just hadn't been discussed at these busy directors meetings.
And that was my second failure. I had no sponsor in those meetings, there was basically no IT representation (the IT director had resigned after the modernisation was pushed through, he wanted no part in it as he had not been taking the business forward himself). With no sponsor, no one wanted to raise the potential spending of a hundred thousand pounds themselves. And so it sat.
Then one day in June, we had a routine fan replacement on the UPS. The engineer was signed in, did the replacement under the watchful eye of a senior helpdesk technician, and flipped the UPS back from maintenance bypass to full protected mains. And that was when the first bang happened.
And all the lights went dark. All the whirring stopped. All the phones stopped ringing. All the people stopped talking.
It was blissfully quiet for a few precious seconds. And then it was painfully quiet for about another 5. And then all hell broke loose.
The core business applications did not fair well. The 30 year old architecture essentially had no failsafe for database writes, and as the server had quit in the midst of several thousand writes, we knew we had just lost a significant amount of data.
Its worth taking several seconds out to explain how the core application language does its job. Firstly, there is no database server, its all C-ISAM datafiles directly read from and written to by each individual application. Locks are handled by each application internally, with OS level locking preventing concurrent writes to the same record in the data file. No database engine, no transaction logging, no roll backs, no error correction, nothing. There was nothing in the language to protect those poor l
Re: (Score:2)
Apparently it did work though. When we had that DNS fail we were able to see that the hot standby site came up without a hiccup.
But you make a good point, unless you have someone high up that's going to shepherd your project through, you'd be better be prepared for some ugly times.
Luckily we had full buy-in on ours. Another thing happened though. I.T.
Post Mortem Missed the Problem (Score:1)
The Datacom response model (Score:2)
Repeat to yourself: "All is well, All is well, All is well" and everything will be exactly like you wish it to be.
Note originators of response model are not responsible for anyone being taken away to a psychiatric facility because of a belief response model user is psychotic
Re:and what about openess during the incident? (Score:4, Funny)
Re: (Score:2)
$Political_Pundit_I_Disagree_With, is that you!?
Fixed that for you
No, I think I got it right this time.
Re: (Score:2)
Re:Don't they have (Score:5, Informative)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re:Don't they have (Score:4, Interesting)
However, all of this is moot, since even if they had a flywheel setup as you're speculating, it still doesn't explain why 25% of the floor went down. If the equipment was installed, maintained and loaded properly, they should've been able to get to the generators with no problem.
are you really telling me that you believe you and ElectricTurtle are smarter than the combined brainpower set loose by Google for building and maintaining this facility?
No, I'm telling you that I manage a data center, and I know first hand how they work (or in this case, should work). I fail to see an adequate explanation of how this was unavoidable.
Re: (Score:2)
I was under the impression that Google's servers all had small individual batteries in each chassis to provide power during generator spin-up in lieu of full-on UPSs. Maybe some of them didn't last as long as they were supposed to? Or maybe the generator took longer to warm up than it should have>
Re: (Score:2)
Re: (Score:2)
Yep. I mean, as it's been stated in other comments, I think Google's way of hedging its bets is to have redundant data centers, so I think they correctly focused on the procedural issues.
However... as a current programmer and former IT guy, I'd like to know more about what caused the failures in the first place.
Re: (Score:3, Insightful)
That's because they are focussing on what went wrong. Power losses, including ones that take down the whole data center, are accepted risks and part of the reason they have a redundant data centers and failover procedures.
The failure wasn't that they had a partial loss at a datacenter. The failure was that the impact of that loss wasn't mitigated pro
Re: (Score:2)
Power losses, including ones that take down the whole data center, are accepted risks and part of the reason they have a redundant data centers and failover procedures. The failure wasn't that they had a partial loss at a datacenter. The failure was that the impact of that loss wasn't mitigated properly by the systems that were supposed to be in place to do that.
I must respectfully disagree. Power losses that take down the whole data center are definitely NOT accepted risks. The entire reasoning for spending millions upon millions of dollars to have UPS systems, Static Switches, Automatic Throwover Switches, Diesel Generators with thousands of gallons of fuel, etc isn't because you think downtime is acceptable, it's because downtime is not an option.
We almost agree though. I do agree that the failure was in improper mitigation of the risk as opposed to mit
Re: (Score:1, Troll)
Re: (Score:2)
You demonstrated that you don't know enough about modern data center design based on your 4 word comment. No further information was necessary.
Plenty of people who have worked in data centers wouldn't know this, so the fact that you may have worked in one is a moot point.
See the reply to the guy who also doesn't know this stuff that was trying to stick up for you. http://slashdot.org/comments.pl?sid=1575066&cid=31403320 [slashdot.org]
Re: (Score:2)
Re: (Score:2)
The argument is simply that going without adequate battery power to handle transfer switching is asinine and you seem to think that's normal data-center behavior. You would be the only one that thinks that would be properly redundancy and all the data-centers I'm in have battery backed transformers to handle the load while they switch to alternate power.
The most expensive data center I'm in even goes so far as to have an hour of battery time to handle generator failures during a power outage.
ElectricTurtl
Re: (Score:2)
Re: (Score:2)
Actually no, Google doesn't use UPS systems if this is one of their designs that uses one small sealed lead acid battery per server.
Re: (Score:2)
Re: (Score:3, Insightful)
Re: (Score:2)
Depends on who's doing the shopping.
If you're looking for a serious hosting facility, then incident response should be one of the things you look at. If they haven't had an incident*, then you have no idea how they'll handle it when (not if!) one happens. They can hand you all the documentation in the world, but that can't speak to execution.
* that they've admitted to
Re: (Score:2)
So, rewatching season 2? The addiction is terrible. I recommend a dosis of Flashforward ...