Cooling Challenges an Issue In Rackspace Outage 294
miller60 writes "If your data center's cooling system fails, how long do you have before your servers overheat? The shrinking window for recovery from a grid power outage appears to have been an issue in Monday night's downtime for some customers of Rackspace, which has historically been among the most reliable hosting providers. The company's Dallas data center lost power when a traffic accident damaged a nearby power transformer. There were difficulties getting the chillers fully back online (it's not clear if this was equipment issues or subsequent power bumps) and temperatures rose in the data center, forcing Rackspace to take customer servers offline to protect the equipment. A recent study found that a data center running at 5 kilowatts per server cabinet may experience a thermal shutdown in as little as three minutes during a power outage. The short recovery window from cooling outages has been a hot topic in discussions of data center energy efficiency. One strategy being actively debated is raising the temperature set point in the data center, which trims power bills but may create a less forgiving environment in a cooling outage."
This is number 3 (Score:5, Informative)
Other publications [valleywag.com] have noted it was number 3, too.
DT
Which only shows (Score:3, Informative)
Is this really news?
Re:How to estimate the cooling needs? (Score:5, Informative)
Re:Which only shows (Score:5, Informative)
Re:How to estimate the cooling needs? (Score:3, Informative)
And the answer is: Liquid Nitrogen (Score:2, Informative)
Re:How to estimate the cooling needs? (Score:1, Informative)
20A x 110V = 2200 VA which doesn't directly translate to Watts, as someone will surely correct me, but for cooling purposes it's not a bad rule of thumb to directly translate the VA to Watts because you'll be including a built-in overhead into which you will surely grow your server space. Then go from Watts to BTU/hour.
2200 Watts x 3.412 BTU/Hour Watt = 7506 BTU/hr
12000 BTU/hr = 1 ton. Do that calculation for all possible hosts in your space, round up. Then purchase an additional, but portable, cooler for the space. Use that cooler for emergencies, like chilling beer, and if the main chillers break, you'll have nice cold beer to drink while the HVAC guys fix the big units and you wait for your less-essential machines to come up.
Most people will do the caclulaton and find their datacenter cooling systems are woefully under sized, running 100% whenever the outside air temperature is above 50F...
Physics (Score:4, Informative)
Also note - Don't EVER user the rated wattage of a power supply because that's what it SUPPLIES, not uses. Instead use the current draw multiplied by the voltage (US - 110 for single phase, 208 for dual phase in must commercial blgs, 220 only in homes or where you know thats the case). This is the 'VA' [Volt-Amps] unit. Use this number for 'watts' in the conversion to refrigeration needs.
Just FYI - a watt is defined as 'the power developed in a circuit by a current of one ampere flowing through a potential difference of one volt." see http://www.siliconvalleypower.com/info/?doc=glossary/ [siliconvalleypower.com], i.e. 1W = 1VA. The dirty little secret about power calculations is that there is another factor thrown in, typically about 0.65, called the 'power factor' that UPS and power supply manufacturers use to lower the overall wattage. That's why you always use VA (rather than the reported wattage) because in a pinch you can always measure both voltage and amperage(under load).
Basically do this - take all the amperage draws for all the devices in your rack/room/data center, multiply them by the applied voltage for that device (110 or 208) and add all the products together. Then convert that number to tons of refrigeration. This is your minimum required cooling for a lights out room. If you have people in the room, count 1100 BTU's/hr for each person and add that to the requirements (after conversion to whatever unit you're working with). Some HVAC contractors want specifications in BTU's/hr and other want it in tons. Don't forget lighting either if its not a 'lights out' operation. A 40W florescent bulb means its going to dissipate 40W (as in heat). You can use these numbers directly as they are a measure of the actual heat thrown, not of the power used to light the bulb.
Make sense?
Dennis Dumont
Re:New cooling strategy needed? (Score:4, Informative)
Re:And the answer is: Liquid Nitrogen (Score:3, Informative)
Except of course the power needed to create the LN2.
As above - how do you think they prevent the LN2 from evaporating? The LN2 is a buffer against loss of power, but typically they have a pretty serious cryocooler to keep the LN2 there when they do have power.
Re:Physics (Score:3, Informative)
It's not "thrown in" by the manufacturers. The dirty little secret is simply that you are talking about AC circuits. 1W = 1VA in AC circuits only if the volts and the amps are in phase -- which they aren't.
Take a sine wave -- in AC, that's what your voltage looks like, always changing. If you're powering something purely resistive like an incandescent bulb, your amps follow the same sine wave and 1W=1VA. But inductive loads like power supplies introduce a lag in the current, so that the amps aren't in phase with the volts. As a result, you cannot naively multiply the RMS volts by the RMS amps to get the average wattage -- you have to take the integral of volts times amps through the curve. And for part of that curve, the voltage and the current flow in different directions, which represents negative power (that is, the inductive circuitry is pushing current back across the wire). As a result of this the overall power will always be less than the volt-amps.
Re:How to estimate the cooling needs? (Score:2, Informative)
Re:How to estimate the cooling needs? (Score:2, Informative)
Sheesh - what's the name of your company so I can sell short?
Re:Which only shows (Score:3, Informative)
Re:Why run data centres in hot states? (Score:3, Informative)
As for the power efficiency of pumping air from several hundred meters away compared to pumping it through the grille of an AC unit, well, there's a reason why skyscrapers these days have multiple central air facilities instead of just one: Economics.
I'd like to see you pump air for any long distance with your exercise bike
Re:Highlights Serious Flaw - Neglecting Outside (Score:3, Informative)
I live near a hospital which is located on the boundary between two distribution circuits, each fed from a different substation. That redundancy cost the hospital tens or hundreds of thousands of dollars. But the two substations are fed from the same transmission loop, which runs through the woods (lots of trees and on inaccessible rights-of-ways), so the most probable fault will take both stations, circuits, and sources to the hospital off line.
The moral of the story: Don't depend on an outside organization (the local utility) for service when its your neck on the line and not theirs.
Re:New cooling strategy needed? (Score:2, Informative)
1) until relatively recently, houses "breathed" quite well on their own due to loose construction. With tightening energy codes and the use of Tyvek and better windows, houses don't have a lot of air exchange through the boundaries, and problems ensue - "stuffiness", moisture, mold, "sick building". Residential construction hasn't thought this through yet - there are some builders who now refuse to use Tyvek due to ventilation (and liability) issues.
2) Controls become an order of magnitude more complicated. Most residential systems are "bang bang" systems - it's on or off based on 1 criteria. To introduce free cooling, you need outside air sensors, dampers, actuators, and a controller a lot more complex than a home t-stat. For most ersidential builders, that's a couple thousand in extra costs that can't be recouped in sale price - most owners just don't care, and when you are building 5000 of the same unit, "most owners" rule.
As for dehumidification, you have it backwards - dehumidification typically required a COLDER coil than necessary for cooling alone, and then you reheat the air. It is horribly inefficient, but sometimes necessary - with a "tight" building, you have to get the moisture out somehow, and supercooling the air inside just isn't a good idea (other than making for lots of erect nipples, that is)
Finally, what makes sense for one situation may not for another - a data center uses orders of magnitude more cooling than a house or common office building. Moving the amount of air necessary to provide that cooling gets really hard - the amount of energy a fan requires increase with the CUBE of the flow required. So to get twice the airflow you use 8x the power. It's the same with pumps, but because the heat capacity of water or glycol is so much greater than that of air, the effects are minimized.
Re:Datacenter cooling should be on generator/ups (Score:3, Informative)
Having said that, you are exactly right on having both your UPS system(s) and your cooling system(s) diversified. I tend to get into this argument with people regarding what constitutes a "data center" and one of the most significant parts of determining what actually constitutes a "data center" is redundancy. This means not just redundant utility power feeds, but redundant UPS systems/modules, redundant generators, redundant chillers/CRACs, redundant PDU's, etc etc.
For our cooling systems, we have 4 Chillers (we only need 2) and 20 CRACs (we only need 10. Any problems with any system can be mitigated by rolling to the redundant system.
Re:And the answer is: Liquid Nitrogen (Score:3, Informative)
Setting aside evaporation, be careful not to get it on anything. LOX can easily saturate anything remotely porus and oxidisable, effectively turning it into an unstable explosive until the LOX evaporates... at LOX or LN temperatures, that can even become an issue with oxygen condensing from the air onto your equipment/insulation. Forget just avoiding the creation of sparks -- better be sure that the safety measures have been successful in eliminating all LOX-incompatible materials and be careful not to bump anything too hard!
And of course, even a tiny fire or explosion can easily lead to a rapid boiloff. Sudden boiloff can be an issue simply because of drastically increased pressure and still-cold temperature. Liquified gasses like LOX, NOX, etc. expand a LOT when they boil (about an 600-800x increase in volume, simply transitioning from a liquid to a gas), even while remaining dangerously cold. Imagine being in a closed room with a punctured dewar. Assuming you've escaped being hit by the dewar which has gone flying like a deflating balloon with reinforced-concrete-shattering force, you've potentially got ruptured eardrums and possibly internal injuries due to the abrupt pressure change which has also jammed the door. You fall to the floor from the pain of burns on your lower body from the ultracold gas which has quickly filled the lower part of the room -- which then starts to burn your face and lungs out too as you start breathing it.
Hopefully the facility you're in has proper emergency ventilation measures, adequate room size, properly constructed doors, and protective equipment to avoid this scenario, but you still don't want to be in the room if it happens if you can help it... Cryogenic gasses are seriously dangerous. Don't underestimate them or treat them lightly.
Re:New cooling strategy needed? (Score:2, Informative)
With a heat exchanger, you are bringing cool air in, and then HEATING IT UP with the waste heat from the exhaust air. Great for saving energy in a residence, when one wants to stay toasty warm - not so great in a data center or office building when there is still a cooling load in winter. So absolutely nothing you said has anything to do with free cooling. Heat exchangers are great for what they do, but free cooling isn't it.
"That's also untrue. The direct, free-flowing heat exchange between hot and cold coils allows dehumidifiers to be much more energy efficient, using typically around 1/3rd as much power for the same volume of air."
True - IF you are using a heat exchanger. But if one is not - lets say, in an office building on a cool spring morning - then you have a problem. You bring in nice 65F air, at 65-70% RH - it's wet. You don't heat it up through a HX, because you need the 65F air to maintain temp setpoint. But now you are dumping a lot of water into the space, and it doesn't *feel* cool. So, you run your cooling coil at, say 50F discharge temp. That is below dewpoint, and it pulls moisture out of the air. But now you are dumping 50F air into the space, so the space temp gets driven down, and you get the nipple effect. So what do you do? REHEAT the air to 65F. Which, BTW, is exactly what home humidifiers do - the discharge air is reheated to a temp greater than the intake air, reflecting the energy added by the electricity. TANSTAAFL.
You can throw a HX in that equation, but it certainly isn't a dumb device - the control logic needs to know when to open the air dampers and close them, so as not to interfere with free cooling.
"As to coil temperature, obviously any temperature will work, to varying degrees of effectiveness. You'll need to provide some numbers to back up your claim. General-purpose dehumidifiers are usually just slightly modified AC units"
Bullshit. The coil temperature MUST be less than the dew point of the air, by the very definition of "dew point". Practically, it needs to be substantially less for the dehumidification to really work. Often, that temp is less than desired for discharge air temp. See above example.
Call me when you've bought a psychrometric chart and a ductulator. There are plenty of design decisions to be made when designing an HVAC system, unfortunately including appeasing owners who think they are design geniuses.
Re:Why run data centres in hot states? (Score:3, Informative)
Well that's just incompetent. For one thing, commercial electronics experience increased failure as you move away from an ambient 70 degrees F regardless of which direction you move. Running them at 59 degrees F (15 C) is just as likely to induce intermittent failures as running it at 80 degrees F.
For another, you're supposed to design your cooling system to accommodate all of the planned heat load in the environment. If your generators will be adding heat then the A/C needs to have sufficient capacity to take that heat back out.
And anyway, your generators shouldn't be adding heat. They should be walled off from the data center with exterior air exchange. Otherwise an error in the exhaust ducting risks killing your operators with CO poisoning.
Re:Which only shows (Score:3, Informative)
Believe it or not, I've designed both, and while I certainly don't claim to be an expert on all the IT equipment, I've got a pretty good idea of the electrical systems that go into them.
My description of the emergency branches was intentionally vague because their full definitions comprise some dozens of pages in NFPA 99. I assumed most people wouldn't care about that level of detail
Anyway, my point was that while a typical data center has 3 types of power available (Normal, Emergency and UPS), a typical hospital usually has at least 5:
Normal
Emergency (Life Safety)
Emergency (Critical)
Emergency (Equipment)
Emergency (UPS)
These generally include separate panels, feeders, automatic transfer switches...etc, so I still stand by my claim that hospitals have the more complex electrical system. Also consider that hospitals now contain increasingly critical data center facilities. Of course I will concede that the UPS topology of a large data center is generally far more complex then a hospitals....but again, that's just one part of the puzzle.
I'll take that bridge. The reason your generators take 15 seconds to start is that they comprise a Level 2 system (as defined in NFPA 110), and not the Level 1 system that hospitals require. Level 1 includes a whole bunch of additional requirements (ie...expense) that are simply not required where the outage will not potentially risk human life, ie, datacenters. Now i'm not sure about all the modifications that manufacturers must make to their gen sets to meet these requirements, but I can assure you that that 10 second start (which includes startup, sync and bus connection) is required by code. Also, I've been there at the monthly test that hospitals are required to perform and yep...they really are that quick.
Now again...it's not that your generators are bad...it's just that theirs no reason for a company to spend the extra cash on that sort of system when a longer startup time will do; typically the UPS is sized for 15 minutes of runtime and the HVAC equipment can go down for a few minutes without the room overheating.
Similarly, I'm not claiming that hospitals are more complex overall systems....just that their electrical distribution systems typically are.
-Chris