Server Room Smells Can Be an Early Warning 154
Barence writes "As embarrassing as it may seem, an eggy smell in a server room needn't mean broaching the delicate subject of hygiene with a colleague. It can actually be a signal that something is about to go wrong with your server setup, as this consultant discovered after days of assuming questionable personal habits were to blame. The culprit? An expiring UPS device, sending out its own unique warning signal."
Re:Ooooga Booooga oh S#!t (Score:5, Insightful)
And how exactly do you tell if the smell is gone or your nose just gave out?
Re:Oh my ... I thought it was just me ! (Score:4, Insightful)
If they're like some IT departments I've seen... (Score:5, Insightful)
If they're like some of the IT departments I've seen, they might be working by some rule from upper management that they need to justify their existence by writing internal invoices for everything they do. It tends to result in them doing nothing until you tell them to, so they can bill you for it. The UPS could have not only the error lights on, but a binking "RED ALLERT" sign and the accompanying acoustic blare, and verily be on fire and billowing smoke, and nobody would touch it until you fill the proper form requesting them to put it out.
Because, yes, that's another thing I've noticed that a lot of departments love, IT including: inventing bureaucracy and paperwork to discourage and delay actually having anything to do. You may need to fill in a 5 page form and draw powerpoint diagrams as to why you want the UPS doused and what are the architecture implications of that. And if you're unlucky a few meetings too, to convince some Mordac The Information Services Preventer why he should move his ass and turn that UPS off, and why his suggested workarounds (in which he'd not have to do anything) aren't quite solving the problem.
Re:This is interesting, can this happen? (Score:5, Insightful)
A triggered fire suppression system should trip the A/C interlock, shutting down ventilation and outside air (blowing air is stupid when FM-100 or whatever is used).
Normally, however, air may well be circulated in a fairly tight closed loop. You do not want to inject outside air without a lot of treatment; filtering and humidity are very large concerns. Drawing in extremely moist, hot air from outside and bringing it into your air supply may well be a lot more challenging than simply recycling the existing clean warm air that already has a roughly correct humidity, for example, and then what happens when it's winter and suddenly the outside air is cold and super-dry? You suddenly have a different HVAC challenge.
water cooling (Score:5, Insightful)
This is why I prefer to build my new server rooms with individually cooled racks - each rack having its own AC-circulation - as well as using centralized water cooling for its efficiency and reliability. Circulating all your cooling air around the server room is simply a bad idea. When you have 1 kilometer of rack space on a single building floor, one source of contaminant, be it chemical or metal particles, will get into all the enclosures in the hall and cost you everything. And BTW UPS maintenance is something that modern IT management, especially outsourced services, have forgotten. Any veteran admin knows you need to estimate the end-of-life for their electronics AND replace them BEFORE they fail - just like AC-filters - If allow those to fail, they will have already done some damage! There's no "RAID" for burning electronics or blocked cooling air!
Re:Funny, I routinely smell my servers... (Score:3, Insightful)
Sense of touch can be valuable too. You can get sub-audible vibration readings by touching a case, and touch is more sensitive to small amounts of temperature change than other senses. Likewise it can be a really exciting way to check for failing/floating ground.
Re:Can be? (Score:3, Insightful)
To be absolutely precise, MTBF is calculated as units*hours/failures.
So if you test 1000 widgets for one year and get 12 failures, you get a MTBF of 176640 hours, or roughly 20 years. But the widget could still very well have a 100% failure rate of 2 years.
MTBF is misleading as it uses hours as base units, but it is not a measurement of time! It is a somewhat arbitrary reliability index that is only useful for comparing like devices with similar expected lifespans.
My point being, you can't use it to compare hard drives to magnetic tapes.