Slashdot Log In
Cooling Challenges an Issue In Rackspace Outage
Posted by
Zonk
on Tue Nov 13, 2007 12:03 PM
from the getting-a-touch-warm-in-here dept.
from the getting-a-touch-warm-in-here dept.
miller60 writes "If your data center's cooling system fails, how long do you have before your servers overheat? The shrinking window for recovery from a grid power outage appears to have been an issue in Monday night's downtime for some customers of Rackspace, which has historically been among the most reliable hosting providers. The company's Dallas data center lost power when a traffic accident damaged a nearby power transformer. There were difficulties getting the chillers fully back online (it's not clear if this was equipment issues or subsequent power bumps) and temperatures rose in the data center, forcing Rackspace to take customer servers offline to protect the equipment. A recent study found that a data center running at 5 kilowatts per server cabinet may experience a thermal shutdown in as little as three minutes during a power outage. The short recovery window from cooling outages has been a hot topic in discussions of data center energy efficiency. One strategy being actively debated is raising the temperature set point in the data center, which trims power bills but may create a less forgiving environment in a cooling outage."
Related Stories
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
This is number 3 (Score:5, Informative)
Other publications [valleywag.com] have noted it was number 3, too.
DT
Re:Why run data centres in hot states? (Score:5, Interesting)
There's several good reasons why the servers are located where they are, and not, say, in Alaska.
The main one is light speed through fiber, and a cable from Houston to Fairbanks would induce a best case of around 28 ms latency, each way. Multiply by several billion packets.
This is why hosting near the customer is considered a Good Thing, and why companies like Akamai have made it their business of transparently re-routing clients to the closest server.
Back to cooling. A few years ago, I worked for a telephone company, and the local data centre there had a 15 degree C ambient baseline temperature. We had to wear sweaters if working for any length of time in the server hall, but had a secure normal temperature room outside the server hall, with console switches and a couple of ttys for configuration.
The main reason why the temperature was kept so low was to be on the safe side -- even if a fan should burn out in one of the cabinets, opening the cabinet doors would provide adequate (albeit not good) cooling until it could be repaired, without (and this is the important part) taking anything down.
A secondary reason was that the backup power generators were, for security reasons, inside the server hall themselves, and during a power outage these would add substantial heat to the equation.
Parent
Re: (Score:3, Informative)
Well that's just incompetent. For one thing, commercial electronics experience increased failure as you move away from an ambient 70 degrees F regardless of which direction you move. Running them at 59 degrees F (15 C) is just as likely to induce intermittent failures as running it at 80 degrees F.
For another, you're supposed to design your cooling system to accommodate all of the planned heat load in the environment. If your generators will be a
Re: (Score:3, Interesting)
I was considering asking why the GP poster was bothering with a sweater when working (as opposed to sleeping) in his server room at 15centigrade, b
Re: (Score:3, Informative)
As for the power efficiency of pumping air from several hundred meters away compared to pumping it through the grille of an AC unit, well, there's a reason why skyscrapers these days have multiple central air facilities instead of just one: Economics.
I'd like to see you pump air for any l
Which only shows (Score:3, Informative)
Is this really news?
Re: (Score:2)
Re: (Score:3, Interesting)
Re: (Score:2)
I was thinking the same thing.
AC is out? Crank open the vents and turn on the fans.
Admittedly it wouldn't work so well in the summer, but spring/winter/fall could be nice.
Re: (Score:2)
In the winter, if you heat with electricity, you can basically run your computer for free, since its waste heat reduces the amount of heat needed to be generated by resistance heaters.
Re: (Score:3, Interesting)
Re: (Score:2)
Re:Which only shows (Score:4, Interesting)
Parent
Re: (Score:2)
Re: (Score:3, Informative)
Re:Which only shows (Score:5, Informative)
Parent
Re:Which only shows (Score:4, Interesting)
For example, Chicago's primary datacenter facility is in 350 E. Cermak (right next to McCormick Place) and the primary interconnect facility in that building is Equinix (which has the 5th and now 6th floors.) A year or so ago there was a major outage there (that mucked up a good amount of the internet in the midwest) when a power substation caught on fire and the Chicago Fire Department had to shut off power to the entire neighborhood. So the backup system started like it should, with the huge battery rooms powering everything (including the chillers) for a bit while the engineers started up the generators. Only thing is, the circuitry that controls the generators shorted out, so while the generators themselves were working, the UPS was working, the chillers were working, this one circuit board blew at the WRONG moment. And this isn't the only time this circuit has been used, they test the generators every few weeks.
Long story short, once the UPSes started running out of power the chillers started going, lights flickered, and for a VERY SHORT period of time the chillers went out before all of the servers did. Within a minute or two it got well over 100 degrees in that datacenter. Thank god the power cut out as quick as it did.
So yes, Equinix in that case did everything by the book. They had everything setup as you would set it up. It was no big deal. But something went wrong at the worst time for it to go wrong and all hell broke loose.
It could be worse, your datacenter could be hit by a tornado [nyud.net]
Parent
Re: (Score:3, Interesting)
Re: (Score:3, Interesting)
Re: (Score:3, Interesting)
Oh and as far as the one leg collapsing thing, yes we were VERY pissed at everyone involved in that little problem, it turns out it was a
Re: (Score:3, Interesting)
Although our servers are on uninterrupted power (same as the Air Con)
I guarantee your HVAC systems are NOT on UPS power. If by some massive failure during construction and commissioning they were and it was missed, I'd recommend firing your entire engineering department and any development contractors involved with building and maintaining your facility. There is no reason to put HVAC systems (chillers, pumps, air handlers, CRACs) on UPS as they can all manage just fine with losing their power and restarting once power is restored (either from utility or generator).
Re: (Score:3, Informative)
Believe it or not, I've designed both, and while I certainly don't claim to be an expert on all the IT equipment, I've got a pretty good idea of the electrical systems that go into them.
:-)
My description of the emergency branches was intentionally vague because their full definitions comprise some dozens of pages in NFPA 99. I assumed most people wouldn't care about that level of detail
Anyway, my point was that
Re: (Score:3, Interesting)
Fast forwar
How to estimate the cooling needs? (Score:3, Interesting)
Re:How to estimate the cooling needs? (Score:5, Informative)
Parent
Re: (Score:3, Informative)
Re:How to estimate the cooling needs? (Score:5, Interesting)
Believe it or not, but in one of those "life coincidences", pi is a safe approximation. Take the number of watts your equipment, lighting, etc., use, multiply by pi, and that's the # of btus of cooling. Don't forget to include 100 watts per person for body heat.
It'll be 90F degrees outside, and you'll be a cool 66F.
Parent
Re: (Score:3, Funny)
And if that doesn't work, you can always tell your VP that you were taking your numbers from some guy named TrollTalk on ./
I'm sure he'll understand.
Re: (Score:3, Interesting)
Re: (Score:3, Interesting)
The Prof in a box experiment has a large issue that contributes to error. He is breathing with a tube, the heat exchange in your lungs is a convection exchange and has too large a magnitude to ignore. If you have doubts about how much heat flows out through breathing nex
Re: (Score:3)
Anyway, if you use an average of 2000 kcal, whether that goes into heating or moving around, a control volume around yourself will experience the same thing: 2000 kcal of waste heat generated over the course of a day. Everything turns into waste heat,
Re: (Score:3, Insightful)
Physics (Score:4, Informative)
Also note - Don't EVER user the rated wattage of a power supply because that's what it SUPPLIES, not uses. Instead use the current draw multiplied by the voltage (US - 110 for single phase, 208 for dual phase in must commercial blgs, 220 only in homes or where you know thats the case). This is the 'VA' [Volt-Amps] unit. Use this number for 'watts' in the conversion to refrigeration needs.
Just FYI - a watt is defined as 'the power developed in a circuit by a current of one ampere flowing through a potential difference of one volt." see http://www.siliconvalleypower.com/info/?doc=glossary/ [siliconvalleypower.com], i.e. 1W = 1VA. The dirty little secret about power calculations is that there is another factor thrown in, typically about 0.65, called the 'power factor' that UPS and power supply manufacturers use to lower the overall wattage. That's why you always use VA (rather than the reported wattage) because in a pinch you can always measure both voltage and amperage(under load).
Basically do this - take all the amperage draws for all the devices in your rack/room/data center, multiply them by the applied voltage for that device (110 or 208) and add all the products together. Then convert that number to tons of refrigeration. This is your minimum required cooling for a lights out room. If you have people in the room, count 1100 BTU's/hr for each person and add that to the requirements (after conversion to whatever unit you're working with). Some HVAC contractors want specifications in BTU's/hr and other want it in tons. Don't forget lighting either if its not a 'lights out' operation. A 40W florescent bulb means its going to dissipate 40W (as in heat). You can use these numbers directly as they are a measure of the actual heat thrown, not of the power used to light the bulb.
Make sense?
Dennis Dumont
Parent
Re: (Score:3, Informative)
It's not "thrown in" by the manufacturers. The dirty little secret is simply that you are talking about AC circuits. 1W = 1VA in AC circuits only if the volts and the amps are in phase -- which they aren't.
Take a sine wave -- in AC, that's what your voltage looks like, always changing. If you're
You could do like my previous Director of IT did.. (Score:2)
Man, I wish I was making that up.
And the answer is: Liquid Nitrogen (Score:2, Informative)
Re: (Score:3, Informative)
Except of course the power needed to create the LN2.
As above - how do you think they prevent the LN2 from evaporating? The LN2 is a buffer against loss of pow
Re: (Score:3, Informative)
Setting aside evaporation, be careful not to get it on anything. LOX can easily saturate anything remotely porus and oxidisable, effectively turning it into an unstable explosive until the LOX evaporates... at LOX or LN temperatures, that can even become an issue with oxygen condensing fro
New cooling strategy needed? (Score:5, Interesting)
The advantage of this is that even in the worst case scenario where the chillers fail totally during mid-summer there is no run-away, closed loop, self re-enforcing heat cycle, the data centre temperature will rise but it would do so more slowly and the maximum equilibrium temperature will be far lower (and dependant upon the external ambient temperature).
In fact, as part of the design for the cluster room in our new building I've specified such a system, though due to the maximum size of the ducting space available we can only use this for half the heat load.
Re:New cooling strategy needed? (Score:4, Informative)
Parent
Funny you mention this (Score:5, Interesting)
Short-cycling protection (Score:5, Interesting)
Most large refrigeration compressors have "short-cycling protection". The compressor motor is overloaded during startup, and needs time to cool. So there's a timer that limits the time between two compressor starts. 4 minutes is a typical delay for a large unit. If you don't have this delay, compressor motors burn out.
Some fancy short-cycling protection timers have backup power, so the the "start to start" time is measured even through power failures. But that's rare. Here's a typical short-cycling timer. [ssac.com] For the ones that don't, like that one, a power failure restarts the timer, so you have to wait out the timer after a power glitch.
The timers with backup power, or even the old style ones with a motor and cam-operated switch, allow a quick restart after a power failure if the compressor was already running. Once. If there's a second power failure, the compressor has to wait out the time delay.
So it's important to ensure that a data center's chillers have time delay units that measure true start-to-start time, or you take a cooling outage of several minutes on any short power drop. And, after a power failure and transfer to emergency generators, don't go back to commercial power until enough time has elapsed for the short-cycling protection timers to time out. This last appears to be where Rackspace failed.
Dealing with sequential power failures is tough. That's what took down that big data center in SF a few months ago.
Highlights Serious Flaw - Neglecting Outside (Score:4, Insightful)
From the articles, it appears that Rackspace datacenter doesn't have multiple power lines coming in and/or many come in via one feed point.
How else is it that a car crash quite some distance from the datacenter can cause such disruption. Does anyone even plan for such events - I get the feeling most planners don't, since I've seen first-hand many power failures occur in places where one would expect more redundency from dumb things like a vehicle hitting a utility pole, etc.
Ron
Re: (Score:3, Informative)
I live near a hospital which is located on the boundary between two distribution circuits, each fed from a different substation. That redundancy cost the hospital tens or hundreds of thousands of dollars. But the two substations are fed from the same transmission loop, which runs through the woods (lo
Maxwells data center (Score:3, Funny)
computers convert 100% electricity to heat (Score:3, Insightful)
5kw? ow. (Score:3, Insightful)
Almost no data center we spoke to would commit to cooling more than 4800 watts of power at an absolute maximum per rack, and those were facilities with hot/cool row setups to maximize airflow. But that meant they didn't want to drop more than 2x20amp power drops, plus 2x20 for backup, if you agreed to maintain 50% utilization across all 4 drops. But since you'd really want to maintain 75%- even in the case of failure, you'd only be using 3600watts. (In the facility we ended up in, we have a total of 6 20 amp drops, and we only actually utilize ~4700 watts.
Ultimately, though, the important thing is that cooling systems should be on generator/battery backup power. Otherwise, as this notes, your battery backup won't be useful.
Damn dihydrogen monoxide (Score:3, Funny)
Re: (Score:3, Insightful)
A large data center should not have one big massive UPS anyway. It should all be divided out into various load sections, each with its own UPS+battery system. Once you do that, then you can have cooling on its own UPS without any risk of the cooling system impacting the UPS feeding the computers ... if you really want cooling on UPS (it can be done, but generally is not the best way). Surely you would have the cooling on it's own three phase circuits.
Perhaps a better approach is a smart cooling system t
Re: (Score:3, Informative)
Having said that, you are exactly right on having both your UPS system(s) and your cooling system(s) diversified. I tend to get into this argument with people regarding what constitutes a "data center" and